A high-throughput and memory-efficient inference and serving engine for LLMs
Get up and running with Llama 3.3, DeepSeek-R1, Phi-4, Gemma 3, and other large language models.