WAXAL addresses a major gap in speech resources for Sub‑Saharan African languages by releasing large, curated ASR and TTS collections suitable for model training and evaluation. The release bundles both natural, image‑prompted ASR recordings and studio‑quality single‑speaker TTS scripts, enabling a range of speech tasks from recognition to synthesis while foregrounding local partnerships and ethical considerations.
What Sets It Apart
- Scale and breadth: roughly 1,250 hours of transcribed, natural ASR audio and ~235 hours of high‑quality TTS across 24 languages, with metadata on speaker age, gender and recording environment. This makes it one of the largest open multilingual African speech resources.
- Dual modalities: includes both ASR (diverse, spontaneous speech; 10% of audio transcribed) and TTS (phonetically balanced scripts, single‑speaker studio recordings) designed for different downstream needs.
- Open licensing and curation: datasets are released under CC‑BY / CC‑BY‑SA variants, collected with local partners and paid annotators; quality control and PII removal were applied during curation.
Who It's For and Tradeoffs
Great fit if you need training or benchmarking data for ASR/TTS models in low‑resource African languages, multilingual transfer experiments, or linguistic analysis of speech patterns. Look elsewhere if you require fully transcribed corpora for every sample (only ~10% of collected ASR audio is transcribed) or if you need exhaustive dialectal coverage—dialectal and socio‑linguistic variation may be underrepresented. Also note mixed licensing across providers; check per‑language license before commercial use.
