LogoAIAny
Icon for item

Megatron-LM

NVIDIA’s model-parallel training library for GPT-like transformers at multi-billion-parameter scale.

Introduction

Overview

Megatron-LM pioneered tensor and pipeline model parallelism, enabling training of GPT-style models up to hundreds of billions of parameters with high GPU efficiency.

Key Capabilities
  • Tensor & pipeline parallel APIs with minimal code changes
  • Fused layer-norm, bias-gelu and FlashAttention kernels
  • Activation recomputation & distributed optimizer sharding
  • Megatron Core library for plug-and-play integration
  • Extensive examples and Docker images for quick start

Information

Categories