Overview
CleanRL is a focused Deep Reinforcement Learning project that provides concise, readable, single-file implementations of popular RL algorithms. The project is intended as a reference and teaching resource for researchers and practitioners who want to understand every implementation detail of an algorithm variant without navigating a large modular codebase.
Key Features
- Single-file implementations: each algorithm variant is presented in a single standalone Python file (for example,
ppo_atari.py), making it easy to read and follow every implementation detail. - Wide algorithm coverage: implementations include PPO, DQN (and DQN variants), C51, SAC, DDPG, TD3, PPG, RND, QDagger and more, with both classic control and Atari/procgen variants.
- Research-friendly tooling: built-in TensorBoard logging, Weights & Biases (wandb) integration for experiment tracking, video capture for gameplay, and local seeding for reproducibility.
- Benchmarks: CleanRL participates in an Open RL Benchmark and provides tracked experiments (see benchmark.cleanrl.dev) to make experimental data transparent and reproducible.
- Multiple runtimes: some implementations include JAX/XLA variants and integrations with environment accelerators like envpool for faster sampling.
- Clear tradeoffs: CleanRL intentionally avoids heavy modularization — it duplicates code across files to keep each file self-contained, simplifying comprehension and debugging at the expense of code reuse.
Typical Use Cases
- Learning and teaching: researchers, students, and engineers can read single-file implementations to learn exactly how an algorithm is implemented in practice.
- Rapid prototyping: easy to modify a compact file to prototype algorithmic changes or new research ideas.
- Reproducible experiments: integrates with wandb and provides configs and scripts to reproduce experiments and capture metrics/videos.
Installation & Quick Start
- Requirements: Python >= 3.7.1 and other optional dependencies (Atari, MuJoCo, envpool, etc.).
- Quick run example (after cloning):
uv pip install .
uv run python cleanrl/ppo.py --seed 1 --env-id CartPole-v0 --total-timesteps 50000- Supports optional dependency groups for atari, procgen, envpool, JAX, and cloud.
Community, Docs & Citation
- Documentation: https://docs.cleanrl.dev provides algorithm docs, usage examples, and deployment guides.
- Paper: the CleanRL project is associated with a JMLR paper describing the project and motivations.
- Community: GitHub issues, PRs, a Discord server, and a YouTube channel with talks and demos.
Design Philosophy
CleanRL trades off modular API design for clarity: instead of hiding implementation details behind layers and classes, each file contains all necessary code for an algorithm variant. This makes it especially useful as a learning artifact and as a base for researchers who need to inspect or alter low-level behaviors.
Who Should Use It
- Students and newcomers who want to learn RL implementation details.
- Researchers prototyping novel algorithm tweaks.
- Engineers who need compact, debuggable reference implementations rather than production-ready modular libraries.
