A high-throughput and memory-efficient inference and serving engine for LLMs
Get up and running with OpenAI gpt-oss, DeepSeek-R1, Gemma 3 and other models.