AgentsMeetRL

AgentsMeetRL

An Awesome List of Agentic Model trained with Reinforcement Learning

Stars: 461

Visit
 screenshot

AgentsMeetRL is an awesome list that summarizes open-source repositories for training LLM Agents using reinforcement learning. The criteria for identifying an agent project are multi-turn interactions or tool use. The project is based on code analysis from open-source repositories using GitHub Copilot Agent. The focus is on reinforcement learning frameworks, RL algorithms, rewards, and environments that projects depend on, for everyone's reference on technical choices.

README:

NOVER Logo

Base Framework General Web GUI Tool Game
Code QA Memory Biomedical Environment

When LLM Agents Meet Reinforcement Learning

AgentsMeetRL is an awesome list that summarizes open-source repositories for training LLM Agents using reinforcement learning:

  • 🤖 The criteria for identifying an agent project are that it must have at least one of the following: multi-turn interactions or tool use (so TIR projects, Tool-Integrated Reasoning, are considered in this repo).
  • ⚠️ This project is based on code analysis from open-source repositories using GitHub Copilot Agent, which may contain unfaithful cases. Although manually reviewed, there may still be omissions. If you find any errors, please don't hesitate to let us know immediately through issues or PRs - we warmly welcome them!
  • 🚀 We particularly focus on the reinforcement learning frameworks, RL algorithms, rewards, and environments that projects depend on, for everyone's reference on how these excellent open-source projects make their technical choices. See [Click to view technical details] under each table.
  • 🤗 Feel free to submit your own projects anytime - we welcome contributions!

Some Enumeration:

  • Enumeration for Reward Type:
    • External Verifier: e.g., a compiler or math solver
    • Rule-Based: e.g., a LaTeX parser with exact match scoring
    • Model-Based: e.g., a trained verifier LLM or reward LLM
    • Custom

🔧 Base Framework

Github Repo 🌟 Stars Date Org Paper Link
siiRL Stars 2025.7 Shanghai Innovation Institute Paper
slime 2025.6 Tsinghua University (THUDM) blog
agent-lightning Stars 2025.6 Microsoft Research Paper
AReaL Stars 2025.6 AntGroup/Tsinghua Paper
ROLL Stars 2025.6 Alibaba Paper
MARTI Stars 2025.5 Tsinghua --
RL2 Stars 2025.4 Accio
verifiers Stars 2025.3 Individual --
oat Stars 2024.11 NUS/Sea AI Paper
veRL Stars 2024.10 ByteDance Paper
OpenRLHF Stars 2023.7 OpenRLHF Paper
trl Stars 2019.11 HuggingFace --
📋 Click to view technical details
Github Repo RL Algorithm Single/Multi Agent Outcome/Process Reward Single/Multi Turn Task Reward Type Tool usage
siiRL PPO/GRPO/CPGD/MARFT Multi Both Multi LLM/VLM/LLM-MAS PostTraining Model/Rule Planned
slime GRPO/GSPO/REINFORCE++ Single Both Both Math/Code External Verifier Yes
agent-lightning PPO/Custom/Automatic Prompt Optimization Multi Outcome Multi Calculator/SQL Model/External/Rule Yes
AReaL PPO Both Outcome Both Math/Code External Yes
ROLL PPO/GRPO/Reinforce++/TOPR/RAFT++ Multi Both Multi Math/QA/Code/Alignment All Yes
MARTI PPO/GRPO/REINFORCE++/TTRL Multi Both Multi Math All Yes
RL2 Dr. GRPO/PPO/DPO Single Both Both QA/Dialogue Rule/Model/External Yes
verifiers GRPO Multi Outcome Both Reasoning/Math/Code All Code
oat PPO/GRPO Single Outcome Multi Math/Alignment External No
veRL PPO/GRPO Single Outcome Both Math/QA/Reasoning/Search All Yes
OpenRLHF PPO/REINFORCE++/GRPO/DPO/IPO/KTO/RLOO Multi Both Both Dialogue/Chat/Completion Rule/Model/External Yes
trl PPO/GRPO/DPO Single Both Single QA Custom No

💪 General/MultiTask

Github Repo 🌟 Stars Date Org Paper Link RL Framework
AgentGym-RL Stars 2025.9 Fudan University Paper veRL
Agent_Foundation_Models Stars 2025.8 OPPO Personal AI Lab Paper veRL
SPA-RL-Agent Stars 2025.5 PolyU Paper TRL
verl-agent Stars 2025.5 NTU/Skywork Paper veRL
📋 Click to view technical details
Github Repo RL Algorithm Single/Multi Agent Outcome/Process Reward Single/Multi Turn Task Reward Type Tool usage
AgentGym-RL PPO/GRPO/RLOO/REINFORCE++ Single Outcome Multi Web/Search/Game/Embodied/Science Rule/Model/External Yes (Web, Search, Env APIs)
Agent_Foundation_Models DAPO/PPO Single Outcome Single QA/Code/Math Rule/External Yes
SPA-RL-Agent PPO Single Process Multi Navigation/Web/TextGame Model No
verl-agent PPO/GRPO/GiGPO/DAPO/RLOO/REINFORCE++ Multi Both Multi Phone Use/Math/Code/Web/TextGame All Yes

🔍 Search/Research/Web

Github Repo 🌟 Stars Date Org Paper Link RL Framework
ASearcher Stars 2025.8 Ant Research RL Lab
Tsinghua University & UW
Paper RealHF/AReaL
Kimi-Researcher Stars 2025.6 Moonshot AI blog Custom
TTI Stars 2025.6 CMU Paper Custom
R-Search Stars 2025.6 Individual -- veRL
R1-Searcher-plus Stars 2025.5 RUC Paper Custom
StepSearch Stars 2025.5 SenseTime Paper veRL
AutoRefine Stars 2025.5 USTC Paper veRL
ZeroSearch Stars 2025.5 Alibaba Paper veRL
WebThinker Stars 2025.4 RUC Paper Custom
DeepResearcher Stars 2025.4 SJTU Paper veRL
Search-R1 Stars 2025.3 UIUC/Google paper1, paper2 veRL
R1-Searcher Stars 2025.3 RUC Paper OpenRLHF
C-3PO Stars 2025.2 Alibaba Paper OpenRLHF
WebAgent Stars 2025.1 Alibaba paper1, paper2 LLaMA-Factory
📋 Click to view technical details
Github Repo RL Algorithm Single/Multi Agent Outcome/Process Reward Single/Multi Turn Task Reward Type Tool usage
ASearcher PPO/GRPO + Decoupled PPO Single Outcome Multi Math/Code/SearchQA External/Rule Yes
Kimi-Researcher REINFORCE Single Outcome Multi Research Outcome Search, Browse, Coding
TTI REINFORCE/BC Single Outcome Multi Web External Web Browsing
R-Search PPO/GRPO Single Both Multi QA/Search All Yes
R1-Searcher-plus Custom Single Outcome Multi Search Model Search
StepSearch PPO Single Process Multi QA Model Search
AutoRefine PPO/GRPO Multi Both Multi RAG QA Rule Search
ZeroSearch PPO/GRPO/REINFORCE Single Outcome Multi QA/Search Rule Yes
WebThinker DPO Single Outcome Multi Reasoning/QA/Research Model/External Web Browsing
DeepResearcher PPO/GRPO Multi Outcome Multi Research All Yes
Search-R1 PPO/GRPO Single Outcome Multi Search All Search
R1-Searcher PPO/DPO Single Both Multi Search All Yes
C-3PO PPO Multi Outcome Multi Search Model Yes
WebAgent DAPO Multi Process Multi Web Model Yes

📱 GUI

Github Repo 🌟 Stars Date Org Paper Link RL Framework
Grounding-R1 Stars 2025.6 Salesforce blog trl
AgentCPM-GUI Stars 2025.6 OpenBMB/Tsinghua/RUC Paper Huggingface
ARPO Stars 2025.5 CUHK/HKUST Paper veRL
GUI-G1 Stars 2025.5 RUC Paper TRL
GUI-R1 Stars 2025.4 CAS/NUS Paper veRL
UI-R1 Stars 2025.3 vivo/CUHK Paper TRL
📋 Click to view technical details
Github Repo RL Algorithm Single/Multi Agent Outcome/Process Reward Single/Multi Turn Task Reward Type Tool usage
Grounding-R1 GRPO Single Outcome Multi GUI Grounding Model Yes
AgentCPM-GUI GRPO Single Outcome Multi Mobile GUI Model Yes
ARPO GRPO Single Outcome Multi GUI External Computer Use
GUI-G1 GRPO Single Outcome Single GUI Rule/External No
GUI-R1 GRPO Single Outcome Multi GUI Rule No
UI-R1 GRPO Single Process Both GUI Rule Computer/Phone Use

🔨 Tool

Github Repo 🌟 Stars Date Org Paper Link RL Framework
MiroRL Stars 2025.8 MiroMindAI HF Repo veRL
verl-tool Stars 2025.6 TIGER-Lab X veRL
Multi-Turn-RL-Agent Stars 2025.5 University of Minnesota Paper Custom
Tool-N1 Stars 2025.5 NVIDIA Paper veRL
Tool-Star Stars 2025.5 RUC Paper LLaMA-Factory
RL-Factory Stars 2025.5 Simple-Efficient model veRL
ReTool Stars 2025.4 ByteDance Paper veRL
AWorld Stars 2025.3 Ant Group (inclusionAI) Paper veRL
Agent-R1 Stars 2025.3 USTC -- veRL
ReCall Stars 2025.3 BaiChuan Paper veRL
📋 Click to view technical details
Github Repo RL Algorithm Single/Multi Agent Outcome/Process Reward Single/Multi Turn Task Reward Type Tool usage
MiroRL GRPO Single Both Multi Reasoning/Planning/ToolUse Rule-based MCP
verl-tool PPO/GRPO Single Both Both Math/Code Rule/External Yes
Multi-Turn-RL-Agent GRPO Single Both Multi Tool-use/Math Rule/External Yes
Tool-N1 PPO Single Outcome Multi Math/Dialogue All Yes
Tool-Star PPO/DPO/ORPO/SimPO/KTO Single Outcome Multi Multi-modal/Tool Use/Dialogue Model/External Yes
RL-Factory GRPO Multi Both Multi Tool-use/NL2SQL All MCP
ReTool PPO Single Outcome Multi Math External Code
AWorld GRPO Both Outcome Multi Search/Web/Code External/Rule Yes
Agent-R1 PPO/GRPO Single Both Multi Tool-use/QA Model Yes
ReCall PPO/GRPO/RLOO/REINFORCE++/ReMax Single Outcome Multi Tool-use/Math/QA All Yes

🎮 TextGame

Github Repo 🌟 Stars Date Org Paper Link RL Framework
ARIA Stars 2025.6 Fudan University Paper Custom
AMPO Stars 2025.5 Tongyi Lab, Alibaba Paper veRL
Trinity-RFT Stars 2025.5 Alibaba Paper veRL
VAGEN Stars 2025.3 RAGEN-AI Paper veRL
ART Stars 2025.3 OpenPipe Paper TRL
OpenManus-RL Stars 2025.3 UIUC/MetaGPT -- Custom
RAGEN Stars 2025.1 RAGEN-AI Paper veRL
📋 Click to view technical details
Github Repo RL Algorithm Single/Multi Agent Outcome/Process Reward Single/Multi Turn Task Reward Type Tool usage
ARIA REINFORCE Both Process Multi Negotiation/Bargaining Other No
AMPO BC/AMPO(GRPO improvement) Multi Outcome Multi Social Interaction Model-based No
Trinity-RFT PPO/GRPO Single Outcome Both Math/TextGame/Web All Yes
VAGEN PPO/GRPO Single Both Multi TextGame/Navigation All Yes
ART GRPO Multi Both Multi TextGame All Yes
OpenManus-RL PPO/DPO/GRPO Multi Outcome Multi TextGame All Yes
RAGEN PPO/GRPO Single Both Multi TextGame All Yes

💻 Code

Github Repo 🌟 Stars Date Org Paper Link RL Framework
MedAgentGym Stars 2025.6 Emory/Georgia Tech Paper Hugginface
CURE Stars 2025.6 University of Chicago
Princeton/ByteDance
Paper Huggingface
MASLab Stars 2025.5 MASWorks Paper Custom
Time-R1 Stars 2025.5 UIUC Paper veRL
ML-Agent Stars 2025.5 MASWorks Paper Custom
SkyRL Stars 2025.4 NovaSky -- veRL
digitalhuman Stars 2025.4 Tencent Paper veRL
sweet_rl Stars 2025.3 Meta/UCB Paper OpenRLHF
rllm Stars 2025.1 Berkeley Sky Computing Lab
BAIR / Together AI
Notion Blog veRL
open-r1 Stars 2025.1 HuggingFace -- TRL
📋 Click to view technical details
Github Repo RL Algorithm Single/Multi Agent Outcome/Process Reward Single/Multi Turn Task Reward Type Tool usage
MedAgentGym SFT/DPO/PPO/GRPO Single Outcome Multi Medical/Code External Yes
CURE PPO Single Outcome Single Code External No
MASLab NO RL Multi Outcome Multi Code/Math/Reasoning External Yes
Time-R1 PPO/GRPO/DPO Multi Outcome Multi Temporal All Code
ML-Agent Custom Single Process Multi Code All Yes
SkyRL PPO/GRPO Single Outcome Multi Math/Code All Code
digitalhuman PPO/GRPO/ReMax/RLOO Multi Outcome Multi Empathy/Math/Code/MultimodalQA Rule/Model/External Yes
sweet_rl DPO Multi Process Multi Design/Code Model Web Browsing
rllm PPO/GRPO Single Outcome Multi Code Edit External Yes
open-r1 GRPO Single Outcome Single Math/Code All Yes

🤔 QA(Reasoning/Math)

Github Repo 🌟 Stars Date Org Paper Link RL Framework
ARPO Stars 2025.7 RUC, Kuaishou Paper veRL
terminal-bench-rl Stars 2025.7 Individual (Danau5tin) N/A rLLM
MOTIF Stars 2025.6 University of Maryland Paper trl
cmriat/l0 Stars 2025.6 CMRIAT Paper veRL
agent-distillation Stars 2025.5 KAIST Paper Custom
VDeepEyes Stars 2025.5 Xiaohongshu/XJTU Paper veRL
EasyR1 Stars 2025.4 Individual repo1/paper2 veRL
AutoCoA Stars 2025.3 BJTU Paper veRL
ToRL Stars 2025.3 SJTU Paper veRL
ReMA Stars 2025.3 SJTU, UCL Paper veRL
Agentic-Reasoning Stars 2025.2 Oxford Paper Custom
SimpleTIR Stars 2025.2 NTU, Bytedance Notion Blog veRL
openrlhf_async_pipline Stars 2024.5 OpenRLHF Paper OpenRLHF
📋 Click to view technical details
Github Repo RL Algorithm Single/Multi Agent Outcome/Process Reward Single/Multi Turn Task Reward Type Tool usage
ARPO GRPO Single Outcome Multi Math/Coding Model/Rule Yes
terminal-bench-rl GRPO Single Outcome Multi Coding/Terminal Model+External Verifier Yes
MOTIF GRPO Single Outcome Multi QA Rule No
cmriat/l0 PPO Multi Process Multi QA All Yes
agent-distillation PPO Single Process Multi QA/Math External Yes
VDeepEyes PPO/GRPO Multi Process Multi VQA All Yes
EasyR1 GRPO Single Process Multi Vision-Language Model Yes
AutoCoA GRPO Multi Outcome Multi Reasoning/Math/QA All Yes
ToRL GRPO Single Outcome Single Math Rule/External Yes
ReMA PPO Multi Outcome Multi Math Rule No
Agentic-Reasoning Custom Single Process Multi QA/Math External Web Browsing
SimpleTIR PPO/GRPO (with extensions) Single Outcome Multi Math, Coding All Yes
openrlhf_async_pipline PPO/REINFORCE++/DPO/RLOO Single Outcome Multi Dialogue/Reasoning/QA All No

🧠 Memory

Github Repo 🌟 Stars Date Org Paper Link RL Framework
MEM1 Stars 2025.7 MIT Paper veRL (based on Search-R1)
MemAgent Stars 2025.6 Bytedance, Tsinghua-SIA Paper veRL
📋 Click to view technical details
Github Repo RL Algorithm Single/Multi Agent Outcome/Process Reward Single/Multi Turn Task Reward Type Tool usage
MEM1 PPO/GRPO Single Outcome Multi WebShop/GSM8K/QA Rule/Model Yes
MemAgent PPO, GRPO, DPO Multi Outcome Multi Long-context QA Rule/Model/External Yes

🏥 Biomedical

Github Repo 🌟 Stars Date Org Paper Link RL Framework
MMedAgent-RL Stars 2025.8 Unknown paper Unknown
DoctorAgent-RL Stars 2025.5 UCAS/CAS/USTC Paper RAGEN
Biomni Stars 2025.3 Stanford University (SNAP) Paper Custom
📋 Click to view technical details
Github Repo RL Algorithm Single/Multi Agent Outcome/Process Reward Single/Multi Turn Task Reward Type Tool usage
MMedAgent-RL Unknown Multi Unknown Unknown Unknown Unknown Unknown
DoctorAgent-RL GRPO Multi Both Multi Consultation/Diagnosis Model/Rule No
Biomni TBD Single TBD Single scRNAseq/CRISPR/ADMET/Knowledge TBD Yes

⛰️ Environment

Github Repo 🌟 Stars Date Org Task
Mind2Web-2 Stars 2025.6 Ohio State University Web
gem Stars 2025.5 Sea AI Lab Math/Code/Game/QA
MLE-Dojo Stars 2025.5 GIT, Stanford MLE
atropos Stars 2025.4 Nous Research Game/Code/Tool
InternBootcamp Stars 2025.4 InternBootcamp Coding/QA/Game
loong Stars 2025.3 CAMEL-AI.org RLVR
reasoning-gym Stars 2025.1 open-thought Math/Game
llmgym Stars 2025.1 tensorzero TextGame/Tool
debug-gym Stars 2024.11 Microsoft Research Debugging/Game/Code
gym-llm Stars 2024.8 Rodrigo Sánchez Molina Control/Game
AgentGym Stars 2024.6 Fudan Web/Game
tau-bench Stars 2024.6 Sierra Tool
appworld Stars 2024.6 Stony Brook University Phone Use
android_world Stars 2024.5 Google Research Phone Use
TheAgentCompany Stars 2024.3 CMU, Duke Coding
LlamaGym Stars 2024.3 Rohan Pandey Game
visualwebarena Stars 2024.1 CMU Web
LMRL-Gym Stars 2023.12 UC Berkeley Game
OSWorld Stars 2023.10 HKU, CMU, Salesforce, Waterloo Computer Use
webarena Stars 2023.7 CMU Web
AgentBench Stars 2023.7 Tsinghua University Game/Web/QA/Tool
WebShop Stars 2022.7 Princeton-NLP Web
ScienceWorld Stars 2022.3 AllenAI TextGame/ScienceQA
alfworld Stars 2020.10 Microsoft, CMU, UW Embodied
factorio-learning-environment Stars 2021.6 JackHopkins Game
jericho Stars 2018.10 Microsoft, GIT TextGame
TextWorld Stars 2018.6 Microsoft Research TextGame

Under Review/Waiting for Open Source

Star History

Star History Chart

Citation

If you find this repository useful, please consider citing it:

@misc{agentsMeetRL,
  title={When LLM Agents Meet Reinforcement Learning: A Comprehensive Survey},
  author={AgentsMeetRL Contributors},
  year={2025},
  url={https://github.com/thinkwee/agentsMeetRL}
}

Made with ❤️ by the AgentsMeetRL community

For Tasks:

Click tags to check more tools for each tasks

For Jobs:

Alternative AI tools for AgentsMeetRL

Similar Open Source Tools

For similar tasks

For similar jobs