Run GGUF models easily with a KoboldAI UI. One File. Zero Install.
Distribute and run LLMs with a single file.
Go with your own intelligence - Go applications that directly integrate llama.cpp for local inference using hardware acceleration.
Maid is a free and open source application for interfacing with llama.cpp models locally, and with Anthropic, DeepSeek, Ollama, Mistral and OpenAI models remotely.
LLM inference with 7x longer context. Pure C, zero dependencies. Lossless KV cache compression + single-header library.
Interface for OuteTTS models.