llama-swap
- Ebuilds: 1, Testing: 217 Description:
llama-swap is an OpenAI/Anthropic-compatible HTTP proxy that
starts and stops local LLM inference servers (llama.cpp, vllm,
mlx-server, etc.) on demand based on the requested model. Lets
a single API endpoint serve many models without keeping them
all resident in GPU/NPU memory.
Homepage:
https://github.com/mostlygeek/llama-swap
License: MIT