AIAny - xFormers

Overview

xFormers is a modular toolbox developed by Facebook Research (Meta) that provides customizable and optimized building blocks for Transformer architectures. It's designed for researchers and engineers who need flexible components beyond the primitives provided by mainstream libraries. The project emphasizes research-first design and high performance, including custom CUDA kernels and fused operators where beneficial.

Key Features

Memory-efficient exact attention (special implementation that can reduce memory and improve speed for certain workloads).
Sparse attention and block-sparse attention primitives for long-context models.
Fused operators such as fused linear layers, fused layer norm, fused dropout(activation(x+bias)), and fused SwiGLU to reduce kernel launches and improve throughput.
Custom CUDA kernels with dispatch to other high-performance libraries (when appropriate).
Modular, composable blocks suitable for vision, NLP, and other domains.

Installation (summary)

Recommended (prebuilt wheels, requires compatible PyTorch):
- pip install -U xformers --index-url https://download.pytorch.org/whl/cu126 (CUDA 12.6)
- pip install -U xformers --index-url https://download.pytorch.org/whl/cu128 (CUDA 12.8)
- pip install -U xformers --index-url https://download.pytorch.org/whl/cu129 (CUDA 12.9)
Development / pre-release:
- pip install --pre -U xformers
From source (for custom PyTorch versions or custom builds):
- pip install ninja (optional, speeds build)
- pip install -v --no-build-isolation -U git+https://github.com/facebookresearch/xformers.git@main#egg=xformers

Note: Building from source may require setting TORCH_CUDA_ARCH_LIST, compatible NVCC/GCC versions, and enough build memory.

Typical Use Cases

Research experiments that require non-standard or novel attention mechanisms.
Performance-sensitive Transformer training / fine-tuning where fused kernels and memory-efficient attention can reduce GPU memory and runtime.
Prototyping new Transformer blocks by composing xFormers components without boilerplate.

Benchmarks & Performance

The project provides benchmark plots (e.g., memory-efficient MHA vs. standard implementations) demonstrating notable speed and memory advantages for certain workloads (A100 tests referenced in repo). Performance gains depend on model configuration, dtype, hardware, and whether xFormers custom kernels are available/used.

Compatibility & Requirements

Built to be used with PyTorch (instructions reference specific PyTorch versions). Prebuilt wheels assume a matching CUDA / PyTorch runtime.
Provides guidance for troubleshooting builds (NVCC vs CUDA runtime, GCC compatibility, TORCH_CUDA_ARCH_LIST, MAX_JOBS for ninja builds, long path issues on Windows).

License & Citation

License: BSD-style license (see LICENSE in repo). The code reuses or is inspired by several other projects (e.g., Triton, Flash-Attention, CUTLASS).
Citation: the repository includes a BibTeX entry that authors can use when referencing xFormers in publications.

When to Choose xFormers

Choose xFormers when you need flexible, research-oriented Transformer blocks with performance optimizations that are not yet available in mainstream libraries, or when you want to experiment with alternative attention mechanisms (sparse, block-sparse, memory-efficient exact attention) with reduced engineering overhead.

xFormers

Introduction

Overview

Key Features

Installation (summary)

Typical Use Cases

Benchmarks & Performance

Compatibility & Requirements

License & Citation

When to Choose xFormers

Information

Categories

Tags

More Items

Memvid

Chef (by Convex)

Isaac Lab