AIAny - ebook2audiobook

ebook2audiobook — Overview

ebook2audiobook is an open-source converter that turns eBooks into structured audiobooks. It integrates a variety of modern TTS engines (XTTSv2, Bark, VITS, Fairseq, Tacotron2, YourTTS, etc.) and supports optional voice cloning and custom fine-tuned models. The project aims to produce chapter-aware output files with metadata in common audio formats (m4b/m4a/mp3/flac/wav/ogg etc.).

Key features

Chapter splitting: extracts or infers chapters for organized audiobook output.
Multi-engine support: selectable TTS backends including XTTSv2, Bark, Fairseq, VITS, Tacotron2, YourTTS.
Voice cloning & custom models: optional use of your own reference audio or custom model archives (.zip) for personalized voices.
Wide language coverage: supports 1,158+ languages/dialects via compatible TTS models.
Multiple frontends: Gradio web GUI for interactive use, headless CLI mode for batch processing, and Docker images for reproducible setups.
Output formats & metadata: exports to audiobook-friendly formats (e.g., .m4b) and supports chapter metadata and different audio containers.

Usage & deployment

Local run: launch provided platform scripts (./ebook2audiobook.sh on mac/linux, ebook2audiobook.cmd on Windows) or run app.py in headless mode.
Docker: official Docker build + run instructions are included for CPU, CUDA, ROCm, XPU and Jetson targets.
Remote demos: project provides a Hugging Face Space and Colab/Kaggle notebooks to run without local setup.
CLI options: supports --headless, --ebook, --language, --voice, --tts_engine, --custom_model, --output_format, and many fine-grained TTS parameters.

Technical & hardware notes

Minimum recommended: 2 GB RAM, 1 GB VRAM (4 GB VRAM and 8 GB RAM recommended for comfortable performance).
CPU-only generation is supported but can be slow for modern TTS engines — GPU (CUDA/ROCm/XPU) or Apple MPS recommended for speed.
Designed to accept many ebook formats (.epub, .pdf, .mobi, .txt and more); .epub/.mobi give best automatic chapter detection.

Legal & usage caution

The repository includes an explicit warning: use only with non-DRM, legally acquired eBooks. The authors disclaim responsibility for misuse.

Extensibility & community

Fine-tuning: documentation and notebooks are provided to fine-tune XTTSv2 models and to add custom voice models.
Contributions: project maintains issues, a wiki (GPU issues, troubleshooting), and a Discord for community help.

Metadata

Repository created: 2024-01-22.
Author/maintainer: Drew Thomasson (GitHub: DrewThomasson).

ebook2audiobook

Introduction

ebook2audiobook — Overview

Key features

Usage & deployment

Technical & hardware notes

Legal & usage caution

Extensibility & community

Metadata

Information

Categories

Tags

More Items

Buzz

faster-whisper

LiveKit Agents