ebook2audiobook — Overview
ebook2audiobook is an open-source converter that turns eBooks into structured audiobooks. It integrates a variety of modern TTS engines (XTTSv2, Bark, VITS, Fairseq, Tacotron2, YourTTS, etc.) and supports optional voice cloning and custom fine-tuned models. The project aims to produce chapter-aware output files with metadata in common audio formats (m4b/m4a/mp3/flac/wav/ogg etc.).
Key features
- Chapter splitting: extracts or infers chapters for organized audiobook output.
- Multi-engine support: selectable TTS backends including XTTSv2, Bark, Fairseq, VITS, Tacotron2, YourTTS.
- Voice cloning & custom models: optional use of your own reference audio or custom model archives (.zip) for personalized voices.
- Wide language coverage: supports 1,158+ languages/dialects via compatible TTS models.
- Multiple frontends: Gradio web GUI for interactive use, headless CLI mode for batch processing, and Docker images for reproducible setups.
- Output formats & metadata: exports to audiobook-friendly formats (e.g., .m4b) and supports chapter metadata and different audio containers.
Usage & deployment
- Local run: launch provided platform scripts (./ebook2audiobook.sh on mac/linux, ebook2audiobook.cmd on Windows) or run app.py in headless mode.
- Docker: official Docker build + run instructions are included for CPU, CUDA, ROCm, XPU and Jetson targets.
- Remote demos: project provides a Hugging Face Space and Colab/Kaggle notebooks to run without local setup.
- CLI options: supports --headless, --ebook, --language, --voice, --tts_engine, --custom_model, --output_format, and many fine-grained TTS parameters.
Technical & hardware notes
- Minimum recommended: 2 GB RAM, 1 GB VRAM (4 GB VRAM and 8 GB RAM recommended for comfortable performance).
- CPU-only generation is supported but can be slow for modern TTS engines — GPU (CUDA/ROCm/XPU) or Apple MPS recommended for speed.
- Designed to accept many ebook formats (.epub, .pdf, .mobi, .txt and more); .epub/.mobi give best automatic chapter detection.
Legal & usage caution
- The repository includes an explicit warning: use only with non-DRM, legally acquired eBooks. The authors disclaim responsibility for misuse.
Extensibility & community
- Fine-tuning: documentation and notebooks are provided to fine-tune XTTSv2 models and to add custom voice models.
- Contributions: project maintains issues, a wiki (GPU issues, troubleshooting), and a Discord for community help.
Metadata
- Repository created: 2024-01-22.
- Author/maintainer: Drew Thomasson (GitHub: DrewThomasson).
