Overview
Megatron-LM pioneered tensor and pipeline model parallelism, enabling training of GPT-style models up to hundreds of billions of parameters with high GPU efficiency.
Key Capabilities
- Tensor & pipeline parallel APIs with minimal code changes
- Fused layer-norm, bias-gelu and FlashAttention kernels
- Activation recomputation & distributed optimizer sharding
- Megatron Core library for plug-and-play integration
- Extensive examples and Docker images for quick start