AIAny - Docling

Docling — Document processing and understanding for gen-AI

Docling is an open-source toolkit and library that simplifies preparing documents for generative-AI applications. It focuses on robust parsing, structured representation, and integrations that let downstream systems (LLMs, retrieval systems, agents) consume documents reliably.

Key capabilities

Parsing of many formats: PDF, DOCX, PPTX, XLSX, HTML, images (PNG, TIFF, JPEG, ...), audio (WAV, MP3), WebVTT, and more.
Advanced PDF understanding: page layout & reading order, table structure recovery, code/formula detection, image classification and richer layout parsing.
OCR support: processes scanned PDFs and images with OCR pipelines to extract text and layout.
Audio/ASR support: transcription for audio inputs and support for WebVTT track parsing.
Unified data model: a DoclingDocument representation that expresses structure, annotations and metadata in a consistent format and can be exported to Markdown, HTML, JSON, or DocTags.
Local-first operation: supports local execution and air-gapped environments for sensitive data handling.
MCP server: a lightweight server to connect Docling processing into agentic workflows and pipelines.
Integrations: plug-and-play adapters for LangChain, LlamaIndex, Haystack and others to accelerate RAG and agent setups.

Highlights & design goals

Developer-friendly CLI and Python API for quick conversion and experimentation.
Extensible pipeline architecture to swap OCR engines, VLMs, or layout models.
Focus on quality of document structure extraction (tables, code blocks, formulas), which improves downstream retrieval and LLM prompting.
Works across major OSes (macOS, Linux, Windows) and CPU architectures (x86_64, arm64).

Typical usage

Convert a PDF or URL to a DoclingDocument and export to Markdown/JSON for indexing into a retriever.

Example (Python):

from docling.document_converter import DocumentConverter
 
source = "https://arxiv.org/pdf/2408.09869"
converter = DocumentConverter()
result = converter.convert(source)
print(result.document.export_to_markdown())

CLI example:

docling https://arxiv.org/pdf/2206.01062

You can run with a visual-language model pipeline or enable OCR and other options via CLI flags.

Integrations & ecosystem

Integrates with LangChain, LlamaIndex, Haystack and other agent/LLM frameworks for retrieval-augmented generation and agentic document use.
Supports visual-language models (e.g., GraniteDocling) and can be combined with VLMs hosted on Hugging Face or local model runtimes.
Provides output formats tailored for downstream pipelines (lossless JSON, Markdown, HTML, DocTags).

Origin, license & governance

The project was started by the Deep Search Team / IBM Research Zurich and is hosted under the LF AI & Data Foundation.
Licensed under MIT. Individual models used through the project retain their original licenses.

Technical report & docs

A technical report documents inner workings and evaluation details (referenced in the project documentation).
Full documentation, examples and integration guides are available on the project website and GitHub pages.

Roadmap & coming features

Planned improvements include richer metadata extraction (title, authors, references, language), chart understanding, and advanced domain-specific extractors (e.g., chemistry/molecular structures).

Who should use Docling

Teams building RAG systems, document-centric agents, or search/indexing pipelines that need reliable structured document extraction.
Organizations needing local/offline document processing for privacy-sensitive use cases.

Docling

Introduction

Docling — Document processing and understanding for gen-AI

Key capabilities

Highlights & design goals

Typical usage

Integrations & ecosystem

Origin, license & governance

Technical report & docs

Roadmap & coming features

Who should use Docling

Information

Categories

Tags

More Items

NVIDIA NeMo Agent Toolkit

Awesome GitHub Copilot Customizations

MCP Atlassian