Distribute and run LLMs with a single file.
Run GGUF models easily with a KoboldAI UI. One File. Zero Install.
Maid is a free and open source application for interfacing with llama.cpp models locally, and with Anthropic, DeepSeek, Ollama, Mistral and OpenAI models remotely.
LLM inference with 7x longer context. Pure C, zero dependencies. Lossless KV cache compression + single-header library.
Go with your own intelligence - Go applications that directly integrate llama.cpp for local inference using hardware acceleration.
Interface for OuteTTS models.