AIAny - LEANN

LEANN: A Low-Storage Vector Index for Personal AI

LEANN is an innovative open-source vector database designed to democratize personal AI by transforming everyday laptops into powerful Retrieval-Augmented Generation (RAG) systems. It indexes and searches through millions of documents—spanning file systems, emails, browser histories, chat logs (WeChat, iMessage, ChatGPT, Claude), and even live data from platforms like Slack and Twitter—while using 97% less storage than traditional vector databases like FAISS, without sacrificing search accuracy.1613

Core Innovation

Traditional vector databases store full high-dimensional embeddings for every data chunk, leading to massive storage overhead (e.g., 201GB for 60M Wikipedia chunks). LEANN revolutionizes this with graph-based selective recomputation:

Stores a compact, pruned proximity graph instead of embeddings.
Recomputes embeddings on-demand only for nodes in the search path.
Uses high-degree preserving pruning to retain 'hub' nodes for accuracy.

Backends: HNSW (default, max savings) or DiskANN (superior speed-accuracy).0

Key Benefits

🔒 Privacy-First: Zero telemetry, no cloud—your data stays local.
🪶 Ultra-Lightweight: 60M chunks in 6GB vs. 201GB; emails (780K) in 79MB.
📦 Portable: Transfer indexes across devices effortlessly.
📈 Scalable: Handles messy personal data that crashes other DBs.
✨ Accurate: Matches SOTA quality with dynamic batching and two-level search.

Dataset	FAISS	LEANN	Savings
Wiki (60M)	201GB	6GB	97%

RAG on Everything

Documents: PDF/TXT/MD with AST-aware code chunking (Python/Java/etc.).
Emails: Apple Mail (780K → 78MB).
Browser: Chrome history (38K → 6MB).
Chats: WeChat/iMessage/ChatGPT/Claude.
Live Data: MCP integration for Slack/Twitter bookmarks.
Codebases: Semantic search for Claude Code workflows.

CLI & API: Simple declarative setup; supports Ollama/OpenAI/HF LLMs.15

Installation & Usage

uv pip install leann
leann build my-docs --docs ./docs
leann ask my-docs "Summarize key techniques?"

Advanced: Metadata filtering, grep search, GPU-ready.16

Backed by Research

Implementation of arXiv:2506.08276 from Berkeley Sky Computing Lab. Benchmarks reproducible via uv run benchmarks/run_evaluation.py.13

With 4.9K+ stars, LEANN powers private, portable AI assistants today.

LEANN

Introduction

LEANN: A Low-Storage Vector Index for Personal AI

Core Innovation

Key Benefits

RAG on Everything

Installation & Usage

Backed by Research

Information

Categories

Tags

More Items

DiffSynth-Studio

RAGFlow

openpilot