Docling — Document processing and understanding for gen-AI
Docling is an open-source toolkit and library that simplifies preparing documents for generative-AI applications. It focuses on robust parsing, structured representation, and integrations that let downstream systems (LLMs, retrieval systems, agents) consume documents reliably.
Key capabilities
- Parsing of many formats: PDF, DOCX, PPTX, XLSX, HTML, images (PNG, TIFF, JPEG, ...), audio (WAV, MP3), WebVTT, and more.
- Advanced PDF understanding: page layout & reading order, table structure recovery, code/formula detection, image classification and richer layout parsing.
- OCR support: processes scanned PDFs and images with OCR pipelines to extract text and layout.
- Audio/ASR support: transcription for audio inputs and support for WebVTT track parsing.
- Unified data model: a DoclingDocument representation that expresses structure, annotations and metadata in a consistent format and can be exported to Markdown, HTML, JSON, or DocTags.
- Local-first operation: supports local execution and air-gapped environments for sensitive data handling.
- MCP server: a lightweight server to connect Docling processing into agentic workflows and pipelines.
- Integrations: plug-and-play adapters for LangChain, LlamaIndex, Haystack and others to accelerate RAG and agent setups.
Highlights & design goals
- Developer-friendly CLI and Python API for quick conversion and experimentation.
- Extensible pipeline architecture to swap OCR engines, VLMs, or layout models.
- Focus on quality of document structure extraction (tables, code blocks, formulas), which improves downstream retrieval and LLM prompting.
- Works across major OSes (macOS, Linux, Windows) and CPU architectures (x86_64, arm64).
Typical usage
- Convert a PDF or URL to a DoclingDocument and export to Markdown/JSON for indexing into a retriever.
Example (Python):
from docling.document_converter import DocumentConverter
source = "https://arxiv.org/pdf/2408.09869"
converter = DocumentConverter()
result = converter.convert(source)
print(result.document.export_to_markdown())CLI example:
docling https://arxiv.org/pdf/2206.01062You can run with a visual-language model pipeline or enable OCR and other options via CLI flags.
Integrations & ecosystem
- Integrates with LangChain, LlamaIndex, Haystack and other agent/LLM frameworks for retrieval-augmented generation and agentic document use.
- Supports visual-language models (e.g., GraniteDocling) and can be combined with VLMs hosted on Hugging Face or local model runtimes.
- Provides output formats tailored for downstream pipelines (lossless JSON, Markdown, HTML, DocTags).
Origin, license & governance
- The project was started by the Deep Search Team / IBM Research Zurich and is hosted under the LF AI & Data Foundation.
- Licensed under MIT. Individual models used through the project retain their original licenses.
Technical report & docs
- A technical report documents inner workings and evaluation details (referenced in the project documentation).
- Full documentation, examples and integration guides are available on the project website and GitHub pages.
Roadmap & coming features
Planned improvements include richer metadata extraction (title, authors, references, language), chart understanding, and advanced domain-specific extractors (e.g., chemistry/molecular structures).
Who should use Docling
- Teams building RAG systems, document-centric agents, or search/indexing pipelines that need reliable structured document extraction.
- Organizations needing local/offline document processing for privacy-sensitive use cases.
