The all-in-one AI productivity accelerator. On device and privacy first with no annoying setup or configuration.
A high-throughput and memory-efficient inference and serving engine for LLMs