Waxal NLP Datasets

Provides open ASR and TTS speech data for 24 Sub‑Saharan African languages to train and evaluate speech models. Includes ~1,250 hours of transcribed ASR and ~235 hours of single‑speaker TTS with train/validation/test/unlabeled splits and mixed CC-BY licenses.

Visit Website

Introduction

WAXAL addresses a major gap in speech resources for Sub‑Saharan African languages by releasing large, curated ASR and TTS collections suitable for model training and evaluation. The release bundles both natural, image‑prompted ASR recordings and studio‑quality single‑speaker TTS scripts, enabling a range of speech tasks from recognition to synthesis while foregrounding local partnerships and ethical considerations.

What Sets It Apart

Scale and breadth: roughly 1,250 hours of transcribed, natural ASR audio and ~235 hours of high‑quality TTS across 24 languages, with metadata on speaker age, gender and recording environment. This makes it one of the largest open multilingual African speech resources.
Dual modalities: includes both ASR (diverse, spontaneous speech; 10% of audio transcribed) and TTS (phonetically balanced scripts, single‑speaker studio recordings) designed for different downstream needs.
Open licensing and curation: datasets are released under CC‑BY / CC‑BY‑SA variants, collected with local partners and paid annotators; quality control and PII removal were applied during curation.

Who It's For and Tradeoffs

Great fit if you need training or benchmarking data for ASR/TTS models in low‑resource African languages, multilingual transfer experiments, or linguistic analysis of speech patterns. Look elsewhere if you require fully transcribed corpora for every sample (only ~10% of collected ASR audio is transcribed) or if you need exhaustive dialectal coverage—dialectal and socio‑linguistic variation may be underrepresented. Also note mixed licensing across providers; check per‑language license before commercial use.

Back

Information

Websitehuggingface.co
OrganizationsGoogle Research, Makerere University, University of Ghana, Digital Umuganda, Media Trust, Loud and Clear, AIMS Senegal, Bill & Melinda Gates Foundation
Published date2026/01/19

More Items

Joyo Kanji Yomi Benchmark

2026

sbintuitions

Provides kanji-level evaluation data for Japanese TTS: disambiguated sentence contexts targeting 4,378 kanji-reading pairs (2,136 Jōyō kanji) with 13,095 native-speaker–verified sentences and katakana-marked ground-truth readings for kanji-level error metrics.

tts speech evaluation huggingface nlp+4

SceneFun3D

2024

ETH Zurich, Google +2

Alexandros Delitzas, Ayca Takmaz +4

Provides point-accurate annotations of interactive parts in high-resolution indoor laser-scan point clouds, plus affordance labels, motion axes and natural-language task descriptions; includes aligned iPad RGB-D video slices with 2D projections for multimodal research.

robotics vision depth multimodal huggingface+1

xlangai/osworld_v2_tasks

2026

xlangai

Provides the gated, official OSWorld 2.0 Python task class files (task_*.py) required to run the benchmark; distributed via a Hugging Face gated dataset to reduce benchmark leakage. Download requires accepting gated access on Hugging Face.

huggingface evaluation agent-skills ai-agent json+2