LogoAIAny
Icon for item

Agent Reinforcement Trainer (ART)

ART (Agent Reinforcement Trainer) is an open-source reinforcement learning framework for training multi-step, LLM-driven agents using GRPO. It supports models such as Qwen2.5, Qwen3 and Llama, offers a serverless W&B Training integration to handle infrastructure, includes example notebooks and benchmarks, and provides a client/server training loop that saves LoRA checkpoints and enables instant inference deployment.

Introduction

Overview

ART (Agent Reinforcement Trainer) is an open-source RL framework designed to make it easy to train multi-step agents driven by large language models. ART focuses on letting language models learn from experience using GRPO, providing a reproducible and ergonomic harness that integrates into Python applications. It is built to work with vLLM/HuggingFace-transformers-compatible causal models and is demonstrated with Qwen2.5, Qwen3 and various LLaMA-family models.

Key features
  • Serverless W&B Training: a managed service integration that automates training and inference infrastructure, lowering cost and accelerating iteration (advertised benefits include lower cost, faster training, and instant deployment of checkpoints to W&B Inference).
  • GRPO-based training loop: a client/server architecture where the client runs agent rollouts and the server performs GRPO updates, saves LoRA checkpoints, and reloads updated weights into the inference engine.
  • Model and infra flexibility: works with vLLM and most HF-transformers compatible causal LMs; can run locally on GPUs or via the serverless backend.
  • Examples and notebooks: multiple ready-to-run notebooks (2048, email search agent ART•E, Tic Tac Toe, Codenames, MCP•RL, AutoRL, Temporal Clue, etc.) demonstrate tasks, benchmarks, and how to iterate quickly.
  • Observability and integrations: integrates with W&B and other observability tooling to simplify debugging and monitoring training progress.
How the training loop works
  1. Inference: the ART client performs agentic workflows and records Trajectories (system/user/assistant messages) while routing completion requests to the ART server.
  2. Rewarding: when rollouts finish, your code assigns rewards to Trajectories to indicate performance.
  3. Training: finished Trajectories are grouped and sent to the server; the server trains using GRPO (initializing LoRA from latest checkpoint or empty LoRA), saves the new LoRA, and loads it into vLLM so inference resumes with the updated policy.

This loop repeats until a configured number of iterations completes, letting the LLM improve from experience in an agentic setting.

Typical uses and strengths
  • Training LLM-based agents for real-world multi-step tasks (email search, game playing, tool use).
  • Rapid iteration via serverless/backed W&B Training to reduce DevOps overhead and accelerate feedback.
  • Research and prototyping of RL for LLMs with reproducible notebooks and benchmark visualizations.
Installation & quick start
  • Install via pip: pip install openpipe-art.
  • Example (serverless): register a TrainableModel with ServerlessBackend (W&B API key) to start experiments without manual GPU infra management.
Supported models & limits
  • Designed for vLLM/HF-transformers causal LMs; Gemma 3 is noted as unsupported at the moment in the repository docs. Users should verify compatibility with their chosen model.
Community, license & citation
  • The project is open-source under the Apache-2.0 License. Contributions are welcome and the repository includes contribution guidelines.
  • Citation provided in the repo (Hilton et al., 2025) for academic/reference use.
Where to learn more
  • Official docs and integrations: https://art.openpipe.ai
  • GitHub repo (source, code, examples, notebooks): the provided URL.

(Adapted from the project's README and documentation.)

Information

  • Websitegithub.com
  • AuthorsBrad Hilton, Kyle Corbitt, David Corbitt, Saumya Gandhi, Angky William, Bohdan Kovalenskyi, Andie Jones, OpenPipe
  • Published date2025/03/10