| sci-ml/llama-cpp |
-
|
In most cases, it is possible to build and use multiple backends at the same time. For example, you can build llama.cpp with both CUDA and Vulkan support by using the -DGGML_CUDA=ON -DGGML_VULKAN=ON options with CMake. At runtime, you can specify which backend devices to use with the --device option. To see a list of available devices, use the --list-devices option.
Backends can be built as dynamic libraries that can be loaded dynamically at runtime. This allows you to use the same llama.cpp binary on different machines with different GPUs. To enable this feature, use the GGML_BACKEND_DL option when building.
GGML_NATIVE ("cpu-native" flag) is not compatible with GGML_BACKEND_DL, if you want to use also native flag, consider using otherwise GGML_CPU_ALL_VARIANTS
|