LogoAIAny
Icon for item

ebook2audiobook

ebook2audiobook is an open-source tool to convert eBooks (epub/pdf/mobi/txt etc.) into organized audiobooks with chapters and metadata. It supports multiple TTS engines (XTTSv2, Bark, VITS, Fairseq, YourTTS, Tacotron2 and more), optional voice cloning, and up to 1,158 languages. Offers a Gradio GUI, headless mode, Docker support, and remote demos (Hugging Face, Colab). Intended for legally acquired, non-DRM eBooks.

Introduction

ebook2audiobook — Overview

ebook2audiobook is an open-source converter that turns eBooks into structured audiobooks. It integrates a variety of modern TTS engines (XTTSv2, Bark, VITS, Fairseq, Tacotron2, YourTTS, etc.) and supports optional voice cloning and custom fine-tuned models. The project aims to produce chapter-aware output files with metadata in common audio formats (m4b/m4a/mp3/flac/wav/ogg etc.).

Key features
  • Chapter splitting: extracts or infers chapters for organized audiobook output.
  • Multi-engine support: selectable TTS backends including XTTSv2, Bark, Fairseq, VITS, Tacotron2, YourTTS.
  • Voice cloning & custom models: optional use of your own reference audio or custom model archives (.zip) for personalized voices.
  • Wide language coverage: supports 1,158+ languages/dialects via compatible TTS models.
  • Multiple frontends: Gradio web GUI for interactive use, headless CLI mode for batch processing, and Docker images for reproducible setups.
  • Output formats & metadata: exports to audiobook-friendly formats (e.g., .m4b) and supports chapter metadata and different audio containers.
Usage & deployment
  • Local run: launch provided platform scripts (./ebook2audiobook.sh on mac/linux, ebook2audiobook.cmd on Windows) or run app.py in headless mode.
  • Docker: official Docker build + run instructions are included for CPU, CUDA, ROCm, XPU and Jetson targets.
  • Remote demos: project provides a Hugging Face Space and Colab/Kaggle notebooks to run without local setup.
  • CLI options: supports --headless, --ebook, --language, --voice, --tts_engine, --custom_model, --output_format, and many fine-grained TTS parameters.
Technical & hardware notes
  • Minimum recommended: 2 GB RAM, 1 GB VRAM (4 GB VRAM and 8 GB RAM recommended for comfortable performance).
  • CPU-only generation is supported but can be slow for modern TTS engines — GPU (CUDA/ROCm/XPU) or Apple MPS recommended for speed.
  • Designed to accept many ebook formats (.epub, .pdf, .mobi, .txt and more); .epub/.mobi give best automatic chapter detection.
  • The repository includes an explicit warning: use only with non-DRM, legally acquired eBooks. The authors disclaim responsibility for misuse.
Extensibility & community
  • Fine-tuning: documentation and notebooks are provided to fine-tune XTTSv2 models and to add custom voice models.
  • Contributions: project maintains issues, a wiki (GPU issues, troubleshooting), and a Discord for community help.
Metadata
  • Repository created: 2024-01-22.
  • Author/maintainer: Drew Thomasson (GitHub: DrewThomasson).

Information

  • Websitegithub.com
  • AuthorsDrew Thomasson (GitHub: DrewThomasson)
  • Published date2024/01/22

Categories