A high-throughput and memory-efficient inference and serving engine for LLMs
⏩ Ship faster with Continuous AI. Build and run custom agents across your IDE, terminal, and CI
Get up and running with OpenAI gpt-oss, DeepSeek-R1, Gemma 3 and other models.