llama.cpp

Pure-C++ inference engine for LLaMA-family models.

llama.cpp is a portable C/C++ implementation of LLM inference focused on running quantised models efficiently on CPUs and consumer GPUs. It supports a wide range of quantisation formats (Q4_K_M, Q5_K_M, Q8_0, and many others) and runs on x86, ARM (including Apple Silicon), CUDA, Metal, Vulkan, and OpenCL backends.

It is the runtime used by Ollama, LM Studio, and most local-LLM tools, and is a reference implementation for the GGUF model format.

License: MIT

Category: AI / ML

Website: https://github.com/ggerganov/llama.cpp

Install

git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp && make

Authors

Georgi Gerganov (creator)
llama.cpp contributors

PreviousLinux Mint NextLLVM and Clang