A lightweight open-source platform for running, managing, and integrating large language models locally via a simple CLI and REST API.
Ollama lets developers pull, run, and customize state-of-the-art open-source LLMs such as Llama 3, Qwen, and Gemma directly on macOS, Linux, and Windows machines. Its Go-based runtime provides a command-line interface (ollama run
, ollama list
, etc.) and an OpenAI-compatible REST API, making local models drop-in replacements for cloud endpoints. Beyond basic chat completion, Ollama supports embeddings, tool/function calling, structured JSON outputs, streaming responses, and multi-modal vision models. The project ships pre-built binaries with GPU acceleration (NVIDIA, AMD, Apple Silicon) and can also run in Docker. A growing model library and Python/JavaScript client SDKs simplify integration into RAG pipelines, VS Code extensions, and other AI-powered apps. Founded by Jeffrey Morgan and Michael Chiang (YC W21), Ollama is fully open source under the MIT license and has an active community on GitHub and Discord.