sci-ml/llama-cpp (gentooplusplus)

Search

Package Information

Description:
Llama.cpp - LLM inference in C/C++.
Homepage:
https://github.com/ggml-org/llama.cpp
License:
MIT

Versions

Version EAPI Keywords Slot
9999 8 0
0.6335 8 0

Metadata

Maintainers

Upstream

Raw Metadata XML
<pkgmetadata>
	<maintainer type="person">
		<email>donotspamme@example.com</email>
		<name>eugeniusz-gienek</name>
		<description>If wannah reach - use Github issues system please.</description>
	</maintainer>
	<stabilize-allarches></stabilize-allarches>
	<use>
		<flag name="systemd">Create a systemd service "llama-cpp.service"</flag>
		<flag name="dynamic-backends">In most cases, it is possible to build and use multiple backends at the same time. For example, you can build llama.cpp with both CUDA and Vulkan support by using the -DGGML_CUDA=ON -DGGML_VULKAN=ON options with CMake. At runtime, you can specify which backend devices to use with the --device option. To see a list of available devices, use the --list-devices option.
Backends can be built as dynamic libraries that can be loaded dynamically at runtime. This allows you to use the same llama.cpp binary on different machines with different GPUs. To enable this feature, use the GGML_BACKEND_DL option when building.
GGML_NATIVE ("cpu-native" flag) is not compatible with GGML_BACKEND_DL, if you want to use also native flag, consider using otherwise GGML_CPU_ALL_VARIANTS</flag>
		<flag name="disable-arm-neon">Disable Arm Neon.
Arm Neon is an advanced single instruction multiple data (SIMD) architecture extension for the Arm Cortex-A and Arm Cortex-R series of processors.
Might help in case of CUDA related compilations errors: https://github.com/ggml-org/llama.cpp/issues/12826 </flag>
		<flag name="utils">Build also llama.cpp utils (scripts) for performing operations on models.</flag>
		<flag name="webgpu">The WebGPU backend relies on Dawn: https://dawn.googlesource.com/dawn</flag>
		<flag name="android">Enable if you build for android</flag>
		<flag name="msvc">Enable if you build with MSVC</flag>
		<flag name="static">static build?</flag>
		<flag name="cpu">ggml: enable CPU backend; if it is the only backend, please also select either "cpu-native" or "cpu-all-variants" - otherwise the package won't compile.</flag>
		<flag name="cpu-native">ggml: enable CPU backend; please use together with cpu flag; it will add `-march=native` by itself. CPU_FLAGS_* consistency is expected to be user's problem </flag>
		<flag name="cpu-all-variants">ggml: enable CPU backend; please use together with cpu flag; ggml: build all variants of the CPU backend (requires GGML_BACKEND_DL) </flag>
		<flag name="hbm">ggml: use memkind for CPU HBM ( High Bandwidth Memory , check wikipedia.) ; a hardware-related feature</flag>
		<flag name="accelerate">ggml: enable Accelerate framework</flag>
		<flag name="blas">ggml: use BLAS; for using specific vendor check https://wiki.gentoo.org/wiki/Blas-lapack-switch</flag>
		<flag name="blis">ggml: use BLIS ( https://github.com/flame/blis )</flag>
		<flag name="llamafile">ggml: use LLAMAFILE</flag>
		<flag name="cann">ggml: use CANN ; This provides NPU acceleration using the AI cores of your Ascend NPU.</flag>
		<flag name="cuda">ggml: use CUDA ; in order for it to compile also tag .... "cpu" has to be selected, don't ask why. Use CMAKE_EXTRA_CACHE_FILE env variable (check https://gitweb.gentoo.org/repo/gentoo.git/tree/eclass/cmake.eclass ) to specify next variables: CMAKE_CUDA_ARCHITECTURES ( use next command to find native architecture: `nvidia-smi --query-gpu=compute_cap --format=csv | tail -n 1 | sed -e 's/\.//g'` ) ; GGML_CUDA_PEER_MAX_BATCH_SIZE ggml: max. batch size for using peer access, default: 128</flag>
		<flag name="musa">ggml: use MUSA ; This provides GPU acceleration using a Moore Threads GPU.</flag>
		<flag name="cuda-unified-memory">ggml: CUDA : unified memory: allow CUDA app use unified memory architecture (UMA) to share main memory between the CPU and integrated GPU </flag>
		<flag name="cuda-force-mmq">ggml: use mmq kernels instead of cuBLAS</flag>
		<flag name="cuda-force-cublas">ggml: always use cuBLAS instead of mmq kernels</flag>
		<flag name="cuda-f16">ggml: use 16 bit floats for some calculations</flag>
		<flag name="cuda-no-peer-copy">ggml: do not use peer to peer copies</flag>
		<flag name="cuda-no-vmm">ggml: do not try to use CUDA VMM</flag>
		<flag name="cuda-fa-all-quants">ggml: compile all quants for FlashAttention</flag>
		<flag name="cuda-graphs">ggml: use CUDA graphs (llama.cpp only)</flag>
		<flag name="hip">ggml: use HIP</flag>
		<flag name="hip-graphs">ggml: use HIP graph, experimental, slow</flag>
		<flag name="hip-no-vmm">ggml: do not try to use HIP VMM</flag>
		<flag name="hip-uma">ggml: use HIP unified memory architecture : from docs/build.md : On Linux it is also possible to use unified memory architecture (UMA) to share main memory between the CPU and integrated GPU by setting -DGGML_HIP_UMA=ON. However, this hurts performance for non-integrated GPUs (but enables working with integrated GPUs).</flag>
		<flag name="vulkan">ggml: use Vulkan ; use CMAKE_EXTRA_CACHE_FILE env variable (check https://gitweb.gentoo.org/repo/gentoo.git/tree/eclass/cmake.eclass ) to specify next variables:  GGML_VULKAN_SHADERS_GEN_TOOLCHAIN "" "ggml: toolchain file for vulkan-shaders-gen"</flag>
		<flag name="vulkan-check-results">ggml: run Vulkan op checks</flag>
		<flag name="vulkan-debug">ggml: enable Vulkan debug output</flag>
		<flag name="vulkan-memory-debug">ggml: enable Vulkan memory debug output</flag>
		<flag name="vulkan-shader-debug-info">ggml: enable Vulkan shader debug info</flag>
		<flag name="vulkan-perf">ggml: enable Vulkan perf output</flag>
		<flag name="vulkan-validate">ggml: enable Vulkan validation</flag>
		<flag name="vulkan-run-tests">ggml: run Vulkan tests</flag>
		<flag name="kompute">ggml: use Kompute</flag>
		<flag name="metal">ggml: use Metal; use CMAKE_EXTRA_CACHE_FILE env variable (check https://gitweb.gentoo.org/repo/gentoo.git/tree/eclass/cmake.eclass ) to specify next variables: ggml: metal minimum macOS version GGML_METAL_MACOSX_VERSION_MIN ; ggml: metal standard version (-std flag) GGML_METAL_STD</flag>
		<flag name="metal-use-bf16">ggml: use bfloat if available</flag>
		<flag name="metal-ndebug">ggml: disable Metal debugging</flag>
		<flag name="metal-shader-debug">ggml: compile Metal with -fno-fast-math</flag>
		<flag name="metal-embed-library">ggml: embed Metal library</flag>
		<flag name="openmp">ggml: use OpenMP</flag>
		<flag name="rpc">ggml: use RPC</flag>
		<flag name="opencl">ggml: use OpenCL; This provides GPU acceleration through OpenCL on recent Adreno GPU.</flag>
		<flag name="opencl-profiling">ggml: use OpenCL profiling (increases overhead)</flag>
		<flag name="opencl-embed-kernels">ggml: embed kernels</flag>
		<flag name="opencl-use-adreno-kernels">ggml: use optimized kernels for Adreno</flag>
		<flag name="test">ggml: build tests; "llama: build tests"</flag>
		<flag name="examples">ggml: build examples; "llama: build examples"</flag>
		<flag name="server">ggml: build examples ; "llama: build server example"</flag>
		<flag name="kleidiai">ggml: use kleidiai optimized kernels if applicable</flag>
	</use>
	<upstream>
		<changelog>https://github.com/ggml-org/llama.cpp/releases</changelog>
		<doc>https://github.com/ggml-org/llama.cpp/wiki</doc>
		<bugs-to>https://github.com/ggml-org/llama.cpp/issues</bugs-to>
		<remote-id type="github">ggml-org/llama.cpp</remote-id>
	</upstream>
</pkgmetadata>

Lint Warnings

USE Flags

Flag Description 9999 0.6335
accelerate ggml: enable Accelerate framework
android Enable if you build for android
blas ggml: use BLAS; for using specific vendor check https://wiki.gentoo.org/wiki/Blas-lapack-switch
blis ggml: use BLIS ( https://github.com/flame/blis )
cann ggml: use CANN ; This provides NPU acceleration using the AI cores of your Ascend NPU.
cpu ggml: enable CPU backend; if it is the only backend, please also select either "cpu-native" or "cpu-all-variants" - otherwise the package won't compile.
cpu-all-variants ggml: enable CPU backend; please use together with cpu flag; ggml: build all variants of the CPU backend (requires GGML_BACKEND_DL)
cpu-native ggml: enable CPU backend; please use together with cpu flag; it will add `-march=native` by itself. CPU_FLAGS_* consistency is expected to be user's problem
cpu_flags_loong_lasx ⚠️
cpu_flags_loong_lsx ⚠️
cpu_flags_riscv_rvv ⚠️
cpu_flags_x86_amx_bf16 ⚠️
cpu_flags_x86_amx_int8 ⚠️
cpu_flags_x86_amx_tile ⚠️
cpu_flags_x86_avx ⚠️
cpu_flags_x86_avx2 ⚠️
cpu_flags_x86_avx512 ⚠️
cpu_flags_x86_avx512_bf16 ⚠️
cpu_flags_x86_avx512_vbmi ⚠️
cpu_flags_x86_avx512_vnni ⚠️
cpu_flags_x86_avx_vnni ⚠️
cpu_flags_x86_f16c ⚠️
cpu_flags_x86_fma ⚠️
cpu_flags_x86_sse ⚠️
cpu_flags_x86_sse2 ⚠️
cpu_flags_x86_sse3 ⚠️
cpu_flags_x86_sse4 ⚠️
cpu_flags_x86_sse41 ⚠️
cpu_flags_x86_sse42 ⚠️
cpu_flags_x86_sse4a ⚠️
cpu_flags_x86_ssse3 ⚠️
cuda ggml: use CUDA ; in order for it to compile also tag .... "cpu" has to be selected, don't ask why. Use CMAKE_EXTRA_CACHE_FILE env variable (check https://gitweb.gentoo.org/repo/gentoo.git/tree/eclass/cmake.eclass ) to specify next variables: CMAKE_CUDA_ARCHITECTURES ( use next command to find native architecture: `nvidia-smi --query-gpu=compute_cap --format=csv | tail -n 1 | sed -e 's/\.//g'` ) ; GGML_CUDA_PEER_MAX_BATCH_SIZE ggml: max. batch size for using peer access, default: 128
cuda-f16 ggml: use 16 bit floats for some calculations
cuda-fa-all-quants ggml: compile all quants for FlashAttention
cuda-force-cublas ggml: always use cuBLAS instead of mmq kernels
cuda-force-mmq ggml: use mmq kernels instead of cuBLAS
cuda-graphs ggml: use CUDA graphs (llama.cpp only)
cuda-no-peer-copy ggml: do not use peer to peer copies
cuda-no-vmm ggml: do not try to use CUDA VMM
cuda-unified-memory ggml: CUDA : unified memory: allow CUDA app use unified memory architecture (UMA) to share main memory between the CPU and integrated GPU
curl ⚠️
disable-arm-neon Disable Arm Neon. Arm Neon is an advanced single instruction multiple data (SIMD) architecture extension for the Arm Cortex-A and Arm Cortex-R series of processors. Might help in case of CUDA related compilations errors: https://github.com/ggml-org/llama.cpp/issues/12826
dynamic-backends In most cases, it is possible to build and use multiple backends at the same time. For example, you can build llama.cpp with both CUDA and Vulkan support by using the -DGGML_CUDA=ON -DGGML_VULKAN=ON options with CMake. At runtime, you can specify which backend devices to use with the --device option. To see a list of available devices, use the --list-devices option. Backends can be built as dynamic libraries that can be loaded dynamically at runtime. This allows you to use the same llama.cpp binary on different machines with different GPUs. To enable this feature, use the GGML_BACKEND_DL option when building. GGML_NATIVE ("cpu-native" flag) is not compatible with GGML_BACKEND_DL, if you want to use also native flag, consider using otherwise GGML_CPU_ALL_VARIANTS
examples ggml: build examples; "llama: build examples"
hbm ggml: use memkind for CPU HBM ( High Bandwidth Memory , check wikipedia.) ; a hardware-related feature
hip ggml: use HIP
hip-graphs ggml: use HIP graph, experimental, slow
hip-no-vmm ggml: do not try to use HIP VMM
hip-uma ggml: use HIP unified memory architecture : from docs/build.md : On Linux it is also possible to use unified memory architecture (UMA) to share main memory between the CPU and integrated GPU by setting -DGGML_HIP_UMA=ON. However, this hurts performance for non-integrated GPUs (but enables working with integrated GPUs).
kleidiai ggml: use kleidiai optimized kernels if applicable
kompute ggml: use Kompute
llamafile ggml: use LLAMAFILE
lto ⚠️
metal ggml: use Metal; use CMAKE_EXTRA_CACHE_FILE env variable (check https://gitweb.gentoo.org/repo/gentoo.git/tree/eclass/cmake.eclass ) to specify next variables: ggml: metal minimum macOS version GGML_METAL_MACOSX_VERSION_MIN ; ggml: metal standard version (-std flag) GGML_METAL_STD
metal-embed-library ggml: embed Metal library
metal-ndebug ggml: disable Metal debugging
metal-shader-debug ggml: compile Metal with -fno-fast-math
metal-use-bf16 ggml: use bfloat if available
msvc Enable if you build with MSVC
musa ggml: use MUSA ; This provides GPU acceleration using a Moore Threads GPU.
opencl ggml: use OpenCL; This provides GPU acceleration through OpenCL on recent Adreno GPU.
opencl-embed-kernels ggml: embed kernels
opencl-profiling ggml: use OpenCL profiling (increases overhead)
opencl-use-adreno-kernels ggml: use optimized kernels for Adreno
openmp ggml: use OpenMP
rpc ggml: use RPC
static static build?
systemd Create a systemd service "llama-cpp.service"
test ggml: build tests; "llama: build tests"
utils Build also llama.cpp utils (scripts) for performing operations on models.
vulkan ggml: use Vulkan ; use CMAKE_EXTRA_CACHE_FILE env variable (check https://gitweb.gentoo.org/repo/gentoo.git/tree/eclass/cmake.eclass ) to specify next variables: GGML_VULKAN_SHADERS_GEN_TOOLCHAIN "" "ggml: toolchain file for vulkan-shaders-gen"
vulkan-check-results ggml: run Vulkan op checks
vulkan-debug ggml: enable Vulkan debug output
vulkan-memory-debug ggml: enable Vulkan memory debug output
vulkan-perf ggml: enable Vulkan perf output
vulkan-run-tests ggml: run Vulkan tests
vulkan-shader-debug-info ggml: enable Vulkan shader debug info
vulkan-validate ggml: enable Vulkan validation
webgpu The WebGPU backend relies on Dawn: https://dawn.googlesource.com/dawn

Files

Manifest

Type File Size Versions
Unmatched Entries
Type File Size
DIST llama.cpp-b6335.tar.gz 25625771 bytes