DeepSpeed is an Apache-2.0-licensed software suite created by Microsoft Research to push the limits of distributed deep learning.
- Extreme scale & speed – ZeRO-based data parallelism, pipeline/tensor/ expert parallelism and NVMe offload let you train dense or sparse Transformer models with billions to trillions of parameters on commodity or cloud hardware.
- Four innovation pillars –
- DeepSpeed-Training (ZeRO, 3D-Parallel, MoE, ZeRO-Infinity)
- DeepSpeed-Inference (custom kernels, heterogeneous memory, low-latency serving)
- DeepSpeed-Compression (quantisation, pruning, ZeroQuant, XTC)
- DeepSpeed4Science (AI-for-science system innovations)
- Drop-in for PyTorch – wrap any
torch.nn.Module
with a single API call; integrate with Megatron-LM, Hugging Face, AzureML and more. - Developer tooling – autotuner, FLOPs profiler, communication logger, rich tutorials and active GitHub community with fortnightly releases.
DeepSpeed has powered headline models such as MT-530B and BLOOM, achieving up to 15× cost reduction and single-digit-millisecond inference latencies compared with prior state-of-the-art systems.