All USE Flags
-
X - 1 package(s)
-
accelerate - 1 package(s)
ggml: enable Accelerate framework
-
amd - 2 package(s)
build for AMD GPU, ROCM-based
-
amd_mae - 2 package(s)
enable experimental memory efficient attention on some AMD GPUs
-
android - 2 package(s)
Enable if you build for android
-
apache - 1 package(s)
Create apache configuration.
-
blas - 1 package(s)
ggml: use BLAS; for using specific vendor check https://wiki.gentoo.org/wiki/Blas-lapack-switch
-
blis - 1 package(s)
ggml: use BLIS ( https://github.com/flame/blis )
-
buildozer - 1 package(s)
Cross-compile Kivy apps via dev-python/buildozer (recommended)
-
cann - 1 package(s)
ggml: use CANN ; This provides NPU acceleration using the AI cores of your Ascend NPU.
-
clang - 1 package(s)
-
comfyui - 1 package(s)
build the built-in ComfyUI backend. If unselected, you will have to manually install backend later on.
-
cpu - 3 package(s)
build for CPU-generation only
-
cpu-all-variants - 1 package(s)
ggml: enable CPU backend; please use together with cpu flag; ggml: build all variants of the CPU backend (requires GGML_BACKEND_DL)
-
cpu-native - 1 package(s)
ggml: enable CPU backend; please use together with cpu flag; it will add `-march=native` by itself. CPU_FLAGS_* consistency is expected to be user's problem
-
cpu_flags_loong_lasx - 1 package(s)
-
cpu_flags_loong_lsx - 1 package(s)
-
cpu_flags_riscv_rvv - 1 package(s)
-
cpu_flags_x86_amx_bf16 - 1 package(s)
-
cpu_flags_x86_amx_int8 - 1 package(s)
-
cpu_flags_x86_amx_tile - 1 package(s)
-
cpu_flags_x86_avx - 1 package(s)
-
cpu_flags_x86_avx2 - 1 package(s)
-
cpu_flags_x86_avx512 - 1 package(s)
-
cpu_flags_x86_avx512_bf16 - 1 package(s)
-
cpu_flags_x86_avx512_vbmi - 1 package(s)
-
cpu_flags_x86_avx512_vnni - 1 package(s)
-
cpu_flags_x86_avx_vnni - 1 package(s)
-
cpu_flags_x86_f16c - 1 package(s)
-
cpu_flags_x86_fma - 1 package(s)
-
cpu_flags_x86_sse - 1 package(s)
-
cpu_flags_x86_sse2 - 1 package(s)
-
cpu_flags_x86_sse3 - 1 package(s)
-
cpu_flags_x86_sse4 - 1 package(s)
-
cpu_flags_x86_sse41 - 1 package(s)
-
cpu_flags_x86_sse42 - 1 package(s)
-
cpu_flags_x86_sse4a - 1 package(s)
-
cpu_flags_x86_ssse3 - 1 package(s)
-
cpuonly - 1 package(s)
If using systemd, adjust the systemd service parameters to ignore the GPU and use only CPU.
-
cuda - 1 package(s)
ggml: use CUDA ; in order for it to compile also tag .... "cpu" has to be selected, don't ask why. Use CMAKE_EXTRA_CACHE_FILE env variable (check https://gitweb.gentoo.org/repo/gentoo.git/tree/eclass/cmake.eclass ) to specify next variables: CMAKE_CUDA_ARCHITECTURES ( use next command to find native architecture: `nvidia-smi --query-gpu=compute_cap --format=csv | tail -n 1 | sed -e 's/\.//g'` ) ; GGML_CUDA_PEER_MAX_BATCH_SIZE ggml: max. batch size for using peer access, default: 128
-
cuda-f16 - 1 package(s)
ggml: use 16 bit floats for some calculations
-
cuda-fa-all-quants - 1 package(s)
ggml: compile all quants for FlashAttention
-
cuda-force-cublas - 1 package(s)
ggml: always use cuBLAS instead of mmq kernels
-
cuda-force-mmq - 1 package(s)
ggml: use mmq kernels instead of cuBLAS
-
cuda-graphs - 1 package(s)
ggml: use CUDA graphs (llama.cpp only)
-
cuda-no-peer-copy - 1 package(s)
ggml: do not use peer to peer copies
-
cuda-no-vmm - 1 package(s)
ggml: do not try to use CUDA VMM
-
cuda-unified-memory - 1 package(s)
ggml: CUDA : unified memory: allow CUDA app use unified memory architecture (UMA) to share main memory between the CPU and integrated GPU
-
cuda12 - 1 package(s)
-
cuda13 - 1 package(s)
-
curl - 1 package(s)
-
cython - 1 package(s)
Enable Kivy C extensions via dev-python/cython (recommended)
-
debug - 1 package(s)
-
debugger - 1 package(s)
Install the CUDA debugger
-
desktop - 3 package(s)
Create a "desktop" file (browser launcher) and add an icon.
-
disable-arm-neon - 1 package(s)
Disable Arm Neon.
Arm Neon is an advanced single instruction multiple data (SIMD) architecture extension for the Arm Cortex-A and Arm Cortex-R series of processors.
Might help in case of CUDA related compilations errors: https://github.com/ggml-org/llama.cpp/issues/12826
-
doc - 2 package(s)
-
dynamic-backends - 1 package(s)
In most cases, it is possible to build and use multiple backends at the same time. For example, you can build llama.cpp with both CUDA and Vulkan support by using the -DGGML_CUDA=ON -DGGML_VULKAN=ON options with CMake. At runtime, you can specify which backend devices to use with the --device option. To see a list of available devices, use the --list-devices option.
Backends can be built as dynamic libraries that can be loaded dynamically at runtime. This allows you to use the same llama.cpp binary on different machines with different GPUs. To enable this feature, use the GGML_BACKEND_DL option when building.
GGML_NATIVE ("cpu-native" flag) is not compatible with GGML_BACKEND_DL, if you want to use also native flag, consider using otherwise GGML_CPU_ALL_VARIANTS
-
examples - 3 package(s)
Build and install example Kivy apps
-
gles2 - 1 package(s)
Enable GLES2 support
-
gstreamer - 1 package(s)
-
hbm - 1 package(s)
ggml: use memkind for CPU HBM ( High Bandwidth Memory , check wikipedia.) ; a hardware-related feature
-
highlight - 1 package(s)
Enable syntax highlighting support via dev-python/pygments
-
hip - 1 package(s)
ggml: use HIP
-
hip-graphs - 1 package(s)
ggml: use HIP graph, experimental, slow
-
hip-no-vmm - 1 package(s)
ggml: do not try to use HIP VMM
-
hip-uma - 1 package(s)
ggml: use HIP unified memory architecture : from docs/build.md : On Linux it is also possible to use unified memory architecture (UMA) to share main memory between the CPU and integrated GPU by setting -DGGML_HIP_UMA=ON. However, this hurts performance for non-integrated GPUs (but enables working with integrated GPUs).
-
imaging - 1 package(s)
Enable image manipulation support via dev-python/pillow (recommended)
-
intel - 2 package(s)
build for Intel GPU (XPU)
Compatible Hardware ( https://docs.pytorch.org/docs/main/notes/get_start_xpu.html ):
Intel® Arc A-Series Graphics (CodeName: Alchemist);
Intel® Arc B-Series Graphics (CodeName: Battlemage);
Intel® Core™ Ultra Processors with Intel® Arc™ Graphics (CodeName: Meteor Lake-H);
Intel® Core™ Ultra Desktop Processors (Series 2) with Intel® Arc™ Graphics (CodeName: Lunar Lake);
Intel® Core™ Ultra Mobile Processors (Series 2) with Intel® Arc™ Graphics (CodeName: Arrow Lake-H);
Intel® Data Center GPU Max Series (CodeName: Ponte Vecchio)
-
ios - 1 package(s)
Package for iOS via dev-python/kivy-ios (currently broken)
-
ipex - 2 package(s)
build for Intel GPU (IPEX)
Compatible Hardware:
Intel® Arc™ A-Series Graphics (Intel® Arc™ A770 [Verified], Intel® Arc™ A750, Intel® Arc™ A580, Intel® Arc™ A770M, Intel® Arc™ A730M, Intel® Arc™ A550M);
Intel® Arc™ B-Series Graphics (Intel® Arc™ B580 [Verified], Intel® Arc™ B570);
Intel® Data Center GPU Max Series [Verified];
For GPUs newer than Intel® Core™ Ultra Processors with Intel® Arc™ Graphics (Meteor Lake) or Intel® Arc™ A-Series Graphics that aren't listed, please check the AOT documentation ( https://www.intel.com/content/www/us/en/docs/dpcpp-cpp-compiler/developer-guide-reference/2025-0/ahead-of-time-compilation.html ) to see if it is supported. If so, follow instructions in the source section above to compile from source.
-
kleidiai - 1 package(s)
ggml: use kleidiai optimized kernels if applicable
-
kompute - 1 package(s)
ggml: use Kompute
-
llamafile - 1 package(s)
ggml: use LLAMAFILE
-
lto - 1 package(s)
-
metal - 1 package(s)
ggml: use Metal; use CMAKE_EXTRA_CACHE_FILE env variable (check https://gitweb.gentoo.org/repo/gentoo.git/tree/eclass/cmake.eclass ) to specify next variables: ggml: metal minimum macOS version GGML_METAL_MACOSX_VERSION_MIN ; ggml: metal standard version (-std flag) GGML_METAL_STD
-
metal-embed-library - 1 package(s)
ggml: embed Metal library
-
metal-ndebug - 1 package(s)
ggml: disable Metal debugging
-
metal-shader-debug - 1 package(s)
ggml: compile Metal with -fno-fast-math
-
metal-use-bf16 - 1 package(s)
ggml: use bfloat if available
-
msvc - 1 package(s)
Enable if you build with MSVC
-
musa - 1 package(s)
ggml: use MUSA ; This provides GPU acceleration using a Moore Threads GPU.
-
nginx - 2 package(s)
Create nginx configuration.
-
nginx_mainline - 1 package(s)
Use www-servers/nginx:mainline as a web server for peertube.
-
nls - 1 package(s)
-
nsight - 1 package(s)
Install profiling and optimizing tools (nsight-compute, nsight-systems)
-
nvidia - 2 package(s)
build for NVidia GPU, cuda-based
-
ollama - 1 package(s)
Install also Ollama to be used with this server.
-
opencl - 1 package(s)
ggml: use OpenCL; This provides GPU acceleration through OpenCL on recent Adreno GPU.
-
opencl-embed-kernels - 1 package(s)
ggml: embed kernels
-
opencl-profiling - 1 package(s)
ggml: use OpenCL profiling (increases overhead)
-
opencl-use-adreno-kernels - 1 package(s)
ggml: use optimized kernels for Adreno
-
opengl - 1 package(s)
-
openmp - 2 package(s)
ggml: use OpenMP
-
pango - 1 package(s)
Enable support for x11-libs/pango
-
profiler - 1 package(s)
Install the NVIDIA CUDA profiler (nvprof) and the related libraries
-
pygame - 1 package(s)
Enable SDL2 support via dev-python/pygame
-
pytest - 1 package(s)
Enable downstream "kivy.tests" Kivy app testing via dev-python/pytest
-
python_single_target_python3_10 - 3 package(s)
Build for Python3.10
-
python_single_target_python3_11 - 5 package(s)
Build for Python3.11
-
python_single_target_python3_12 - 5 package(s)
Build for Python3.12
-
python_single_target_python3_13 - 4 package(s)
Build for Python3.13
-
python_targets_python3_10 - 1 package(s)
-
python_targets_python3_11 - 1 package(s)
-
python_targets_python3_12 - 1 package(s)
-
rdma - 1 package(s)
Enable infiniband support via <pkg>sys-cluster/rdma-core</pkg>
-
rdna2 - 2 package(s)
build for AMD GPU (together with amd flag), for 6700, 6600 and maybe other RDNA2 or older
-
rdna3 - 2 package(s)
build for AMD GPU (together with amd flag), for AMD 7600 and maybe other RDNA3 cards
-
rpc - 1 package(s)
ggml: use RPC
-
rst - 1 package(s)
Enable reStructuredText (reST) support via dev-python/docutils
-
sanitizer - 1 package(s)
Install compute-sanitizer tool
-
sdl - 1 package(s)
-
server - 1 package(s)
ggml: build examples ; "llama: build server example"
-
spell - 1 package(s)
-
startup-notification - 1 package(s)
-
static - 1 package(s)
static build?
-
system-glib - 1 package(s)
-
systemd - 7 package(s)
Create a systemd service "swarmui.service"
-
test - 2 package(s)
ggml: build tests; "llama: build tests"
-
udev - 1 package(s)
-
utils - 1 package(s)
Build also llama.cpp utils (scripts) for performing operations on models.
-
vim-syntax - 1 package(s)
-
vis-profiler - 1 package(s)
Install the NVIDIA CUDA visual profiler (nvvp)
-
vulkan - 1 package(s)
ggml: use Vulkan ; use CMAKE_EXTRA_CACHE_FILE env variable (check https://gitweb.gentoo.org/repo/gentoo.git/tree/eclass/cmake.eclass ) to specify next variables: GGML_VULKAN_SHADERS_GEN_TOOLCHAIN "" "ggml: toolchain file for vulkan-shaders-gen"
-
vulkan-check-results - 1 package(s)
ggml: run Vulkan op checks
-
vulkan-debug - 1 package(s)
ggml: enable Vulkan debug output
-
vulkan-memory-debug - 1 package(s)
ggml: enable Vulkan memory debug output
-
vulkan-perf - 1 package(s)
ggml: enable Vulkan perf output
-
vulkan-run-tests - 1 package(s)
ggml: run Vulkan tests
-
vulkan-shader-debug-info - 1 package(s)
ggml: enable Vulkan shader debug info
-
vulkan-validate - 1 package(s)
ggml: enable Vulkan validation
-
wayland - 1 package(s)
-
webgpu - 1 package(s)
The WebGPU backend relies on Dawn: https://dawn.googlesource.com/dawn