Distribute and run LLMs with a single file.
Run GGUF models easily with a KoboldAI UI. One File. Zero Install.
Maid is a free and open source application for interfacing with llama.cpp models locally, and with Anthropic, DeepSeek, Ollama, Mistral and OpenAI models remotely.
Go with your own intelligence - Go applications that directly integrate llama.cpp for local inference using hardware acceleration.
LLM inference with 7x longer context. Pure C, zero dependencies. Lossless KV cache compression + single-header library.
Interface for OuteTTS models.