AIAny - RL

ReAct: Synergizing Reasoning and Acting in Language Models

2022

Shunyu Yao, Jeffrey Zhao +5

This paper introduces ReAct, an approach that integrates reasoning and acting in large language models (LLMs). ReAct enables LLMs to generate both reasoning traces and task-specific actions in an interleaved manner. This synergy allows reasoning to help induce, track, and update action plans, while actions interface with external sources like knowledge bases to gather more information, overcoming issues of hallucination and error propagation in prior methods.

paper LLM NLP ai-agent google+1

Agent Lightning

2025

Microsoft Research

Agent Lightning is an open-source framework developed by Microsoft Research for optimizing and training AI agents using reinforcement learning (RL) and other techniques, supporting integration with any agent framework with minimal code changes.

RL LLM ai-agent microsoft ai-train+3

MiniMind

2024

Jingyao Gong

MiniMind is an open-source GitHub project that enables users to train a 26M-parameter tiny LLM from scratch in just 2 hours with a cost of 3 RMB. It provides native PyTorch implementations for Tokenizer training, pretraining, supervised fine-tuning (SFT), LoRA, DPO, PPO/GRPO reinforcement learning, and MoE architecture with vision multimodal extensions. It includes high-quality open datasets, supports single-GPU training, and is compatible with Transformers, llama.cpp, and other frameworks, ideal for LLM beginners.

LLM tutorial github ai-train RL

CleanRL (Clean Implementation of RL Algorithms)

2019

vwxyzjn (GitHub owner), Shengyi Huang +6

CleanRL is a high-quality, single-file implementation library for deep reinforcement learning (Deep RL). It provides compact, research-friendly standalone implementations of many RL algorithms (PPO, DQN, C51, DDPG, TD3, SAC, PPG, etc.), benchmarks, TensorBoard logging, Weights & Biases integration, and cloud/run tooling. It emphasizes readability, reproducibility, and ease of understanding rather than being a modular importable framework.

RL pytorch github ai-library huggingface+3

Qlib

2020

Microsoft Research

Qlib is an open-source, AI-oriented quantitative investment platform from Microsoft that provides a full pipeline for quant research — data processing, feature engineering, model training, backtesting and serving. It supports supervised learning, market-dynamics modeling and reinforcement learning, and integrates tools (e.g., RD-Agent) for automated factor mining and model optimization.

microsoft github ai-library mlops ai-workflow+3

labml.ai Deep Learning Paper Implementations

2020

labml.ai (labmlai)

A curated collection of 60+ concise, well-documented PyTorch implementations of deep learning papers from labml.ai. It provides side-by-side notes and tutorials for transformers, optimizers, GANs, RL, diffusion models, vision models and more, intended as learning and reproduction resources.

pytorch paper github tutorial ai-coding+5

Isaac Lab

2022

NVIDIA (Isaac Sim / Omniverse team)

Isaac Lab is an open-source, GPU-accelerated robotics learning framework built on NVIDIA Isaac Sim. It provides high-fidelity physics and sensor simulation, ready-to-train environments and robot models, and integrations for reinforcement and imitation learning workflows to accelerate sim-to-real research and large-scale robot training.

nvidia RL physics ai-framework ai-train+3

Awesome-ML-SYS-Tutorial

2024

zhaochenyang20

A GitHub repository of learning notes and code dedicated to ML + SYS (machine learning systems). It collects tutorials, code walkthroughs and engineering notes on RLHF, distributed training (FSDP, Megatron), inference and scheduling (SGLang, vllm), quantization, CUDA/GPU optimization, system design, and practical engineering.

github mlops ai-train pytorch LLM+6

Verifiers: Environments for LLM Reinforcement Learning

2025

Prime Intellect, William Brown

Verifiers is an open-source library from Prime Intellect providing modular components to build, evaluate, and train reinforcement-learning environments for LLM agents. It includes SingleTurn/MultiTurn envs, ToolEnv for tool-enabled agents, rubric-based reward functions, parsers, and integrations with prime-rl and common inference stacks for both small-scale evaluation and large-scale RL training.

RL LLM ai-library ai-train github+1

Tinker Cookbook

2025

Thinking Machines Lab

Tinker Cookbook is an open-source library from Thinking Machines Lab for customizing language models via the Tinker API. It offers realistic fine-tuning examples for supervised learning, reinforcement learning, chat, math reasoning, preference learning, tool use, prompt distillation, and multi-agent setups, along with utilities for rendering, hyperparameters, and evaluation.

github ai-train LLM RL ai-library

DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models

2025

DeepSeek-AI, Aixin Liu +262

DeepSeek-V3.2 is an open large language model that balances high computational efficiency with superior reasoning and agent capabilities. Key innovations include DeepSeek Sparse Attention (DSA) for reduced complexity in long contexts, a scalable reinforcement learning framework achieving GPT-5-level performance, and a large-scale agentic task synthesis pipeline for improved generalization in tool-use scenarios.

deepseek LLM paper RL ai-agent

Playing Atari with Deep Reinforcement Learning

2013

Volodymyr Mnih, Koray Kavukcuoglu +5

The paper by DeepMind introduced Deep Q-Networks (DQN), the first deep learning model to learn control policies directly from raw pixel input using reinforcement learning. By combining Q-learning with convolutional neural networks and experience replay, DQN achieved superhuman performance on several Atari 2600 games without handcrafted features or game-specific tweaks. Its impact was profound: it proved deep learning could master complex tasks with sparse, delayed rewards, catalyzing the modern wave of deep reinforcement learning research and paving the way for later breakthroughs like AlphaGo.

RL deepmind paper

Tag

Explore by tags

All

30u30

ASR

ChatGPT

GNN

IDE

RAG

ai-agent

ai-api

ai-api-management

ai-client

ai-coding

ai-demos

ai-development

ai-framework

ai-image

ai-image-demos

ai-inference

ai-leaderboard

ai-library

ai-rank

ai-serving

ai-tools

ai-train

ai-video

ai-workflow

AIGC

alibaba

amazon

anthropic

audio

blog

book

bytedance

chatbot

chemistry

claude

course

deepmind

deepseek

engineering

foundation

foundation-model

gemini

github

google

gradient-booting

grok

huggingface

LLM

llm

math

mcp

mcp-client

mcp-server

meta-ai

microsoft

mlops

NLP

nvidia

ocr

ollama

openai

paper

physics

plugin

pytorch

RL

science

sora

translation

tutorial

vibe-coding

video

vision

xAI

xai

ReAct: Synergizing Reasoning and Acting in Language Models