Ollama

A lightweight open-source platform for running, managing, and integrating large language models locally via a simple CLI and REST API.

Visit Website

Introduction

Ollama lets developers pull, run, and customize state-of-the-art open-source LLMs such as Llama 3, Qwen, and Gemma directly on macOS, Linux, and Windows machines. Its Go-based runtime provides a command-line interface (ollama run, ollama list, etc.) and an OpenAI-compatible REST API, making local models drop-in replacements for cloud endpoints. Beyond basic chat completion, Ollama supports embeddings, tool/function calling, structured JSON outputs, streaming responses, and multi-modal vision models. The project ships pre-built binaries with GPU acceleration (NVIDIA, AMD, Apple Silicon) and can also run in Docker. A growing model library and Python/JavaScript client SDKs simplify integration into RAG pipelines, VS Code extensions, and other AI-powered apps. Founded by Jeffrey Morgan and Michael Chiang (YC W21), Ollama is fully open source under the MIT license and has an active community on GitHub and Discord.

Back

Information

Websiteollama.ai
AuthorsJeffrey Morgan, Michael Chiang
Published date2023/08/01

More Items

Ray

2017

RISELab (UC Berkeley), Anyscale Inc.

Ray is an open-source distributed compute engine that lets you scale Python and AI workloads—from data processing to model training and serving—without deep distributed-systems expertise.

ai-development ai-framework ai-train ai-serving

OpenVINO

2018

Intel

OpenVINO is an open-source toolkit from Intel that streamlines the optimization and deployment of AI inference models across a wide range of Intel® hardware.

ai-development ai-inference ai-serving

NVIDIA Dynamo

2025

NVIDIA

NVIDIA Dynamo is an open-source, high-throughput, low-latency inference framework that scales generative-AI and reasoning models across large, multi-node GPU clusters.

ai-development ai-inference ai-serving nvidia