LogoAIAny
Icon for item

torchtitan

torchtitan is a PyTorch-native platform for rapid experimentation and large-scale training of generative AI models. It provides multi-dimensional composable parallelisms (FSDP2, tensor/pipeline/context parallel), distributed checkpointing, float8 and MXFP8 support, torch.compile integration, and out-of-the-box support for training Llama 3.1 models. It targets both research and production-scale LLM pretraining.

Introduction

torchtitan — PyTorch-native training platform

torchtitan is a minimal, clean-room PyTorch-native platform built to accelerate experimentation and production-scale pretraining of generative AI models. The project emphasizes clarity, extensibility, and composability of parallelism techniques so that researchers and engineers can apply multi-dimensional scaling with minimal changes to model code.

Core focus
  • Provide a simple, well-documented codebase demonstrating modern PyTorch distributed features for LLM pretraining and large-scale generative-model training.
  • Enable rapid experimentation via extension points and an "experiments" folder while maintaining production-oriented utilities for checkpointing, profiling, and performance measurement.
Key features
  • Multi-dimensional composable parallelisms:
    • FSDP2 with per-parameter sharding
    • Tensor Parallel (including async TP)
    • Pipeline Parallel (including optimizations to reduce pipeline bubble)
    • Context Parallel for very long context lengths
  • Meta device model initialization to avoid materializing full model weights on CPU/GPU during setup.
  • Selective and full activation checkpointing to trade compute for memory.
  • Distributed and async checkpointing, with interoperable checkpoint formats that can be loaded by other tools (e.g., torchtune).
  • float8 and MXFP8 training support for reduced-precision speedups on supported hardware.
  • Integration with torch.compile for optimized kernels when available.
  • Checkpointable data-loading and built-in C4 dataset configuration; supports custom datasets.
  • Built-in metrics (loss, throughput, TFLOPs, MFU, GPU memory) and logging via TensorBoard or Weights & Biases.
  • Debugging and profiling tools (CPU/GPU profiling, memory profiling, Flight Recorder).
  • Helper scripts for tokenizer download, Llama checkpoint conversion, FSDP/HSDP memory estimation, and distributed inference.
  • Verified performance and convergence reports (benchmarks up to 512 GPUs are provided by the project).
Model support
  • Out-of-the-box support for training Meta's Llama 3.1 (8B, 70B, 405B) with example train configs and helper scripts.
Installation & usage notes
  • torchtitan is developed against recent PyTorch nightly builds; for latest features the README recommends using the latest PyTorch nightly or a matching pinned nightly for stable releases.
  • Can be installed from source, via pre-release nightlies, or via stable pip/conda releases (each stable release pins compatible torch/nightly versions).
  • Provides run scripts and multi-node examples (Slurm/ParallelCluster), and simple commands to start training (e.g., launch an 8-GPU Llama 3 8B run).
Research & provenance
  • The project is accompanied by a paper accepted to ICLR 2025 and an arXiv submission (arXiv:2410.06511). The README includes citation information and links to the ICLR poster and openreview entry.
Community & license
  • Hosted under the pytorch GitHub organization and maintained as an open-source project (BSD-3-Clause). The repository contains contribution guidelines, an experiments folder for new ideas, and a community forum category for distributed/torchtitan discussions.
When to use
  • Use torchtitan if you want a PyTorch-native, minimally opinionated platform to experiment with large-scale LLM pretraining techniques (multi-dimensional parallelism, advanced checkpointing, low-precision formats) while keeping code changes to models small. It is suitable for both research exploration and as a base for production-ready pretraining pipelines.

Information

  • Websitegithub.com
  • AuthorsPyTorch
  • Published date2023/12/13