AIAny - PDF Document Layout Analysis

PDF Document Layout Analysis

Overview

PDF Document Layout Analysis is an open-source, Docker-powered microservice developed and maintained by HURIDOCS. It combines layout-aware machine learning models, OCR, and format conversion tools to analyze PDFs, segment and classify page elements (titles, text, tables, pictures, formulas, footnotes, headers/footers, etc.), determine reading order, and export results into JSON, Markdown, or HTML. The project exposes both a user-friendly Gradio web UI and a comprehensive REST API for automation and integration.

Key Capabilities

Layout segmentation and classification using two model families:
- VGT (Vision Grid Transformer) for high visual-accuracy layout understanding.
- LightGBM-based models for fast CPU processing and batch workloads.
OCR integration using Tesseract + ocrmypdf (150+ languages supported).
Table extraction (HTML), formula extraction (LaTeX), and caption/footnote detection.
Reading-order resolution and segmentation metadata (coordinates, page size, page number).
Format conversion endpoints: Markdown and HTML exports, with segmentation data packaged in zip files.
Automatic translation support using Ollama models (configurable translation model list).
Visual overlays and interactive analysis via Gradio UI.

API & Usage

Runs as a service (default API port 5060, UI at 7860) and provides endpoints such as / (POST analyze), /text, /markdown, /html, /ocr, /visualize, /toc, and utility endpoints like /info.
Example quick commands (service running locally):
- Analyze PDF: curl -X POST -F 'file=@document.pdf' http://localhost:5060
- Fast analysis: -F 'fast=true' (uses LightGBM)
- Convert to Markdown with translation: POST to /markdown with target_languages and translation_model.

Models & Performance

VGT provides strong visual-context performance (recommended when GPU available); LightGBM gives much faster CPU throughput for large batches.
Integrations with DocLayNet for training data and pre-built model configurations.

Deployment & Dev

Fully Dockerized (Docker Compose), with optional GPU support via NVIDIA container toolkit.
Development helpers: make start, make stop, make install, and test commands.
Configurable environment variables for OCR path, models path, ports, and Ollama endpoint.

Typical Use Cases

Digitizing scanned documents and extracting structured content (TOC, tables, figures).
Converting institutional PDFs to Markdown/HTML while preserving layout and structure.
Building document search/indexing pipelines that require segmented content and reading order.

Integrations & Extensibility

Works with Hugging Face (models/artifacts) and Docker Hub images provided by HURIDOCS.
Translation step is pluggable through Ollama model selection.
Clean Architecture codebase makes it straightforward to extend model adapters, add endpoints, or swap OCR engines.

License & Community

Open-source project (see repository LICENSE). Contributions, issues and PRs are welcomed; repository contains developer docs, tests and a contribution guide.

PDF Document Layout Analysis

Introduction

PDF Document Layout Analysis

Overview

Key Capabilities

API & Usage

Models & Performance

Deployment & Dev

Typical Use Cases

Integrations & Extensibility

License & Community

Information

Categories

Tags

More Items

Stable Diffusion web UI

Daytona

GeoAI: Artificial Intelligence for Geospatial Data