A high-throughput and memory-efficient inference and serving engine for LLMs
The all-in-one AI productivity accelerator. On device and privacy first with no annoying setup or configuration.