Overview
fairseq is an extensible, high-performance sequence modeling toolkit developed by Facebook AI Research (FAIR) and implemented in Python with PyTorch. It aims to provide reference implementations of important sequence-modeling architectures and research papers, plus practical tooling for training and evaluating models for translation, language modeling, summarization, speech recognition/generation, and other text/audio sequence tasks.
Key features
- Reference implementations of many architectures and papers (Transformer, convolutional seq2seq, LSTM-based models, LightConv/DynamicConv, non-autoregressive methods, etc.).
- Speech and self-supervised audio support (wav2vec, wav2vec 2.0, XLS-R, wav2vec-U variants, and multilingual/speech-scaling examples).
- Multi-GPU and distributed training (data and model parallel workflows).
- Fast generation on CPU and GPU with multiple decoding strategies: beam search, diverse beam, top-k/top-p sampling, and lexically constrained decoding.
- Support for large-batch training via gradient accumulation and mixed-precision (FP16) for faster training and reduced memory footprint.
- Full parameter and optimizer state sharding and CPU offloading for training very large models.
- Extensible registry for models, criterions, tasks, optimizers and schedulers; Hydra-based flexible configuration for experiments.
Supported tasks and example models
fairseq provides examples and pretrained checkpoints for: machine translation (including WMT recipes), language modeling (transformer & conv LMs), bilingual and multilingual models (XLM-R, mBART), speech self-supervised pretraining and fine-tuning (wav2vec/wav2vec2, XLS-R), speech-to-speech translation, video-language models (VideoCLIP/VLM), and many research reproductions (RoBERTa, BART, Levenshtein Transformer, non-autoregressive methods, etc.). The repository organizes many paper-driven examples in the examples/ folder to help reproduce or extend published work.
Installation & requirements
- Requires PyTorch (recommended >= 1.10.0) and Python >= 3.8.
- NVIDIA GPU + NCCL recommended for training; optional libraries include NVIDIA apex for optimized mixed-precision, PyArrow for large dataset handling, and Docker configs (careful with shared memory settings).
- Typical developer install:
git clone https://github.com/pytorch/fairseqthenpip install --editable ./(orpip install fairseqfor released versions).
Ecosystem & integration
- Pretrained models are available via a convenient torch.hub interface and example scripts.
- Integrates with other projects (for example, xFormers for optimized attention implementations) and is widely used in research reproductions.
- Documentation is hosted on ReadTheDocs and the project maintains community channels (GitHub, Twitter, Google Group) for discussions and support.
License & citation
- fairseq is MIT-licensed (the license applies to code and released pretrained models).
- The recommended citation: Ott et al., "fairseq: A Fast, Extensible Toolkit for Sequence Modeling", NAACL-HLT 2019 (Demonstrations).
Notes
- The GitHub project was created on 2017-08-29 and has accumulated many community contributions; over time the repository has grown to include speech, multilingual, and multimodal components in addition to core NLP models.
- fairseq is designed both for research reproduction (reference implementations) and for practical model training at scale, making it a common toolkit in NLP and speech research pipelines.
