Overview
BabelDOC is an open-source library and command-line tool designed to translate PDFs (with a focus on scientific papers) and generate bilingual comparison outputs. It is built to be both embeddable in larger pipelines and usable directly by end users for straightforward translation tasks. The project emphasizes preserving document layout, offering flexible PDF-processing controls, and leveraging OpenAI-compatible LLMs for high-quality translation.
Key features
- PDF-first translator: parsing and rendering pipeline tailored for PDF documents, aiming to keep layout and reduce content loss.
- Bilingual/dual PDF output: supports monolingual or bilingual outputs (original + translated) with configurable page arrangement and watermark modes.
- LLM-based translation: works with OpenAI-compatible APIs and models (default config examples reference models like gpt-4o-mini), and can be pointed at alternative compatible endpoints.
- CLI and Python API: provides a command-line interface (
babeldoc) and guidance to call via higher-level functions (recommended integration with pdf2zh_next / PDFMathTranslate-next for async streaming usage). - Rich PDF processing options: OCR handling, scanned-document workarounds, short-line splitting, table/formula handling (experimental), font-family control for translated text, and more.
- Glossary & terminology: automatic term extraction with options to load custom glossary CSVs to control translation of domain-specific terms.
- Self-deployment & integration: compatible with PDFMathTranslate-next for self-deployment and WebUI; also used by Immersive Translate online service (BabelDOC beta).
Typical usage
- Quick CLI example (using an OpenAI-compatible service):
babeldoc --openai --openai-model "gpt-4o-mini" --openai-base-url "https://api.openai.com/v1" --openai-api-key "your-api-key" --files example.pdf- Install from PyPI or run via source; the project recommends using
uvto manage tools/environments for easy installation and execution.
Options & customization
BabelDOC exposes many options to control translation behaviour and PDF output, including:
- Page selection, splitting into parts, short-line splitting and thresholds.
- OCR/scan detection, OCR workaround and auto-enable behaviours for heavily scanned docs.
- Output control:
--no-dual,--no-mono, watermark modes, maximum pages-per-part, and output directory. - Translation service tuning: QPS limits, caching, model choice, custom system prompts, and worker/thread controls.
- Glossary loading and per-entry target-language constraints to ensure glossary entries apply only when appropriate.
Integration & deployment
- Online: Immersive Translate provides a beta web app for BabelDOC (with free quota details shown on their site).
- Self-host: recommended path is to integrate with PDFMathTranslate-next (a self-deployable pipeline) for full local control and extended translator/backends.
Limitations & known issues
- Primary focus is English→Chinese; other language directions are less tested though basic support exists.
- Known parsing errors in author/reference sections and some layout edge cases (lines, drop caps, very large pages) may lead to merged paragraphs or skipped pages.
- Some advanced features (table translation, formula detection patterns) are experimental and may produce imperfect results.
Roadmap & maintenance
The project lists ongoing goals such as improved line and table support, cross-page paragraph handling, more robust typesetting features, and outline support. The repository is actively maintained by the funstory-ai organization and provides contributor incentives (Immersive Translate Pro codes) for active contributors.
Links & resources
- Project repo: https://github.com/funstory-ai/BabelDOC
- Project homepage / docs: https://funstory-ai.github.io/BabelDOC/
- Online beta: Immersive Translate BabelDOC app (refer to project README for link)
- PyPI package: BabelDOC (badges and installation hints available in README)
License
BabelDOC includes a license file in the repository; check the repo for current license terms.
