Ollama
Local-first runtime for running large language models on Linux.
Ollama bundles llama.cpp and other inference engines behind a simple
ollama run <model> interface, automatically downloading quantised
open-weight models (LLaMA, Mistral, Qwen, Gemma, and many more) and
serving them over a local HTTP API compatible with the OpenAI chat
format.
It runs on Linux with CPU, CUDA, ROCm, or Metal backends, and has become the standard tool for running LLMs locally on a developer workstation.
Install
curl -fsSL https://ollama.com/install.sh | sh
Authors
- Ollama, Inc.
