
AgentsMeetRL
An Awesome List of Agentic Model trained with Reinforcement Learning
Stars: 472

AgentsMeetRL is an awesome list that summarizes open-source repositories for training LLM Agents using reinforcement learning. The criteria for identifying an agent project are multi-turn interactions or tool use. The project is based on code analysis from open-source repositories using GitHub Copilot Agent. The focus is on reinforcement learning frameworks, RL algorithms, rewards, and environments that projects depend on, for everyone's reference on technical choices.
README:
AgentsMeetRL is an awesome list that summarizes open-source repositories for training LLM Agents using reinforcement learning:
- 🤖 The criteria for identifying an agent project are that it must have at least one of the following: multi-turn interactions or tool use (so TIR projects, Tool-Integrated Reasoning, are considered in this repo).
⚠️ This project is based on code analysis from open-source repositories using GitHub Copilot Agent, which may contain unfaithful cases. Although manually reviewed, there may still be omissions. If you find any errors, please don't hesitate to let us know immediately through issues or PRs - we warmly welcome them!- 🚀 We particularly focus on the reinforcement learning frameworks, RL algorithms, rewards, and environments that projects depend on, for everyone's reference on how these excellent open-source projects make their technical choices. See [Click to view technical details] under each table.
- 🤗 Feel free to submit your own projects anytime - we welcome contributions!
Some Enumeration:
- Enumeration for Reward Type:
- External Verifier: e.g., a compiler or math solver
- Rule-Based: e.g., a LaTeX parser with exact match scoring
- Model-Based: e.g., a trained verifier LLM or reward LLM
- Custom
Github Repo | 🌟 Stars | Date | Org | Paper Link |
---|---|---|---|---|
siiRL | 2025.7 | Shanghai Innovation Institute | Paper | |
slime | 2025.6 | Tsinghua University (THUDM) | blog | |
agent-lightning | 2025.6 | Microsoft Research | Paper | |
AReaL | 2025.6 | AntGroup/Tsinghua | Paper | |
ROLL | 2025.6 | Alibaba | Paper | |
MARTI | 2025.5 | Tsinghua | -- | |
RL2 | 2025.4 | Accio | – | |
verifiers | 2025.3 | Individual | -- | |
oat | 2024.11 | NUS/Sea AI | Paper | |
veRL | 2024.10 | ByteDance | Paper | |
OpenRLHF | 2023.7 | OpenRLHF | Paper | |
trl | 2019.11 | HuggingFace | -- |
📋 Click to view technical details
Github Repo | RL Algorithm | Single/Multi Agent | Outcome/Process Reward | Single/Multi Turn | Task | Reward Type | Tool usage |
---|---|---|---|---|---|---|---|
siiRL | PPO/GRPO/CPGD/MARFT | Multi | Both | Multi | LLM/VLM/LLM-MAS PostTraining | Model/Rule | Planned |
slime | GRPO/GSPO/REINFORCE++ | Single | Both | Both | Math/Code | External Verifier | Yes |
agent-lightning | PPO/Custom/Automatic Prompt Optimization | Multi | Outcome | Multi | Calculator/SQL | Model/External/Rule | Yes |
AReaL | PPO | Both | Outcome | Both | Math/Code | External | Yes |
ROLL | PPO/GRPO/Reinforce++/TOPR/RAFT++ | Multi | Both | Multi | Math/QA/Code/Alignment | All | Yes |
MARTI | PPO/GRPO/REINFORCE++/TTRL | Multi | Both | Multi | Math | All | Yes |
RL2 | Dr. GRPO/PPO/DPO | Single | Both | Both | QA/Dialogue | Rule/Model/External | Yes |
verifiers | GRPO | Multi | Outcome | Both | Reasoning/Math/Code | All | Code |
oat | PPO/GRPO | Single | Outcome | Multi | Math/Alignment | External | No |
veRL | PPO/GRPO | Single | Outcome | Both | Math/QA/Reasoning/Search | All | Yes |
OpenRLHF | PPO/REINFORCE++/GRPO/DPO/IPO/KTO/RLOO | Multi | Both | Both | Dialogue/Chat/Completion | Rule/Model/External | Yes |
trl | PPO/GRPO/DPO | Single | Both | Single | QA | Custom | No |
Github Repo | 🌟 Stars | Date | Org | Paper Link | RL Framework |
---|---|---|---|---|---|
AgentGym-RL | 2025.9 | Fudan University | Paper | veRL | |
Agent_Foundation_Models | 2025.8 | OPPO Personal AI Lab | Paper | veRL | |
SPA-RL-Agent | 2025.5 | PolyU | Paper | TRL | |
verl-agent | 2025.5 | NTU/Skywork | Paper | veRL |
📋 Click to view technical details
Github Repo | RL Algorithm | Single/Multi Agent | Outcome/Process Reward | Single/Multi Turn | Task | Reward Type | Tool usage |
---|---|---|---|---|---|---|---|
AgentGym-RL | PPO/GRPO/RLOO/REINFORCE++ | Single | Outcome | Multi | Web/Search/Game/Embodied/Science | Rule/Model/External | Yes (Web, Search, Env APIs) |
Agent_Foundation_Models | DAPO/PPO | Single | Outcome | Single | QA/Code/Math | Rule/External | Yes |
SPA-RL-Agent | PPO | Single | Process | Multi | Navigation/Web/TextGame | Model | No |
verl-agent | PPO/GRPO/GiGPO/DAPO/RLOO/REINFORCE++ | Multi | Both | Multi | Phone Use/Math/Code/Web/TextGame | All | Yes |
Github Repo | 🌟 Stars | Date | Org | Paper Link | RL Framework |
---|---|---|---|---|---|
ASearcher | 2025.8 | Ant Research RL Lab Tsinghua University & UW |
Paper | RealHF/AReaL | |
Kimi-Researcher | 2025.6 | Moonshot AI | blog | Custom | |
TTI | 2025.6 | CMU | Paper | Custom | |
R-Search | 2025.6 | Individual | -- | veRL | |
R1-Searcher-plus | 2025.5 | RUC | Paper | Custom | |
StepSearch | 2025.5 | SenseTime | Paper | veRL | |
AutoRefine | 2025.5 | USTC | Paper | veRL | |
ZeroSearch | 2025.5 | Alibaba | Paper | veRL | |
WebThinker | 2025.4 | RUC | Paper | Custom | |
DeepResearcher | 2025.4 | SJTU | Paper | veRL | |
Search-R1 | 2025.3 | UIUC/Google | paper1, paper2 | veRL | |
R1-Searcher | 2025.3 | RUC | Paper | OpenRLHF | |
C-3PO | 2025.2 | Alibaba | Paper | OpenRLHF | |
WebAgent | 2025.1 | Alibaba | paper1, paper2 | LLaMA-Factory |
📋 Click to view technical details
Github Repo | RL Algorithm | Single/Multi Agent | Outcome/Process Reward | Single/Multi Turn | Task | Reward Type | Tool usage |
---|---|---|---|---|---|---|---|
ASearcher | PPO/GRPO + Decoupled PPO | Single | Outcome | Multi | Math/Code/SearchQA | External/Rule | Yes |
Kimi-Researcher | REINFORCE | Single | Outcome | Multi | Research | Outcome | Search, Browse, Coding |
TTI | REINFORCE/BC | Single | Outcome | Multi | Web | External | Web Browsing |
R-Search | PPO/GRPO | Single | Both | Multi | QA/Search | All | Yes |
R1-Searcher-plus | Custom | Single | Outcome | Multi | Search | Model | Search |
StepSearch | PPO | Single | Process | Multi | QA | Model | Search |
AutoRefine | PPO/GRPO | Multi | Both | Multi | RAG QA | Rule | Search |
ZeroSearch | PPO/GRPO/REINFORCE | Single | Outcome | Multi | QA/Search | Rule | Yes |
WebThinker | DPO | Single | Outcome | Multi | Reasoning/QA/Research | Model/External | Web Browsing |
DeepResearcher | PPO/GRPO | Multi | Outcome | Multi | Research | All | Yes |
Search-R1 | PPO/GRPO | Single | Outcome | Multi | Search | All | Search |
R1-Searcher | PPO/DPO | Single | Both | Multi | Search | All | Yes |
C-3PO | PPO | Multi | Outcome | Multi | Search | Model | Yes |
WebAgent | DAPO | Multi | Process | Multi | Web | Model | Yes |
Github Repo | 🌟 Stars | Date | Org | Paper Link | RL Framework |
---|---|---|---|---|---|
MobileAgent | 2025.9 | X-PLUG (TongyiQwen) | paper | veRL | |
InfiGUI-G1 | 2025.8 | InfiX AI | Paper | veRL | |
Grounding-R1 | 2025.6 | Salesforce | blog | trl | |
AgentCPM-GUI | 2025.6 | OpenBMB/Tsinghua/RUC | Paper | Huggingface | |
ARPO | 2025.5 | CUHK/HKUST | Paper | veRL | |
GUI-G1 | 2025.5 | RUC | Paper | TRL | |
GUI-R1 | 2025.4 | CAS/NUS | Paper | veRL | |
UI-R1 | 2025.3 | vivo/CUHK | Paper | TRL |
📋 Click to view technical details
Github Repo | RL Algorithm | Single/Multi Agent | Outcome/Process Reward | Single/Multi Turn | Task | Reward Type | Tool usage |
---|---|---|---|---|---|---|---|
MobileAgent | semi-online RL | Single | Both | Multi | MobileGUI/Automation | Rule | Yes |
InfiGUI-G1 | AEPO | Single | Outcome | Single | GUI/Grounding | Rule | No |
Grounding-R1 | GRPO | Single | Outcome | Multi | GUI Grounding | Model | Yes |
AgentCPM-GUI | GRPO | Single | Outcome | Multi | Mobile GUI | Model | Yes |
ARPO | GRPO | Single | Outcome | Multi | GUI | External | Computer Use |
GUI-G1 | GRPO | Single | Outcome | Single | GUI | Rule/External | No |
GUI-R1 | GRPO | Single | Outcome | Multi | GUI | Rule | No |
UI-R1 | GRPO | Single | Process | Both | GUI | Rule | Computer/Phone Use |
Github Repo | 🌟 Stars | Date | Org | Paper Link | RL Framework |
---|---|---|---|---|---|
MiroRL | 2025.8 | MiroMindAI | HF Repo | veRL | |
verl-tool | 2025.6 | TIGER-Lab | X | veRL | |
Multi-Turn-RL-Agent | 2025.5 | University of Minnesota | Paper | Custom | |
Tool-N1 | 2025.5 | NVIDIA | Paper | veRL | |
Tool-Star | 2025.5 | RUC | Paper | LLaMA-Factory | |
RL-Factory | 2025.5 | Simple-Efficient | model | veRL | |
ReTool | 2025.4 | ByteDance | Paper | veRL | |
AWorld | 2025.3 | Ant Group (inclusionAI) | Paper | veRL | |
Agent-R1 | 2025.3 | USTC | -- | veRL | |
ReCall | 2025.3 | BaiChuan | Paper | veRL |
📋 Click to view technical details
Github Repo | RL Algorithm | Single/Multi Agent | Outcome/Process Reward | Single/Multi Turn | Task | Reward Type | Tool usage |
---|---|---|---|---|---|---|---|
MiroRL | GRPO | Single | Both | Multi | Reasoning/Planning/ToolUse | Rule-based | MCP |
verl-tool | PPO/GRPO | Single | Both | Both | Math/Code | Rule/External | Yes |
Multi-Turn-RL-Agent | GRPO | Single | Both | Multi | Tool-use/Math | Rule/External | Yes |
Tool-N1 | PPO | Single | Outcome | Multi | Math/Dialogue | All | Yes |
Tool-Star | PPO/DPO/ORPO/SimPO/KTO | Single | Outcome | Multi | Multi-modal/Tool Use/Dialogue | Model/External | Yes |
RL-Factory | GRPO | Multi | Both | Multi | Tool-use/NL2SQL | All | MCP |
ReTool | PPO | Single | Outcome | Multi | Math | External | Code |
AWorld | GRPO | Both | Outcome | Multi | Search/Web/Code | External/Rule | Yes |
Agent-R1 | PPO/GRPO | Single | Both | Multi | Tool-use/QA | Model | Yes |
ReCall | PPO/GRPO/RLOO/REINFORCE++/ReMax | Single | Outcome | Multi | Tool-use/Math/QA | All | Yes |
Github Repo | 🌟 Stars | Date | Org | Paper Link | RL Framework |
---|---|---|---|---|---|
ARIA | 2025.6 | Fudan University | Paper | Custom | |
AMPO | 2025.5 | Tongyi Lab, Alibaba | Paper | veRL | |
Trinity-RFT | 2025.5 | Alibaba | Paper | veRL | |
VAGEN | 2025.3 | RAGEN-AI | Paper | veRL | |
ART | 2025.3 | OpenPipe | Paper | TRL | |
OpenManus-RL | 2025.3 | UIUC/MetaGPT | -- | Custom | |
RAGEN | 2025.1 | RAGEN-AI | Paper | veRL |
📋 Click to view technical details
Github Repo | RL Algorithm | Single/Multi Agent | Outcome/Process Reward | Single/Multi Turn | Task | Reward Type | Tool usage |
---|---|---|---|---|---|---|---|
ARIA | REINFORCE | Both | Process | Multi | Negotiation/Bargaining | Other | No |
AMPO | BC/AMPO(GRPO improvement) | Multi | Outcome | Multi | Social Interaction | Model-based | No |
Trinity-RFT | PPO/GRPO | Single | Outcome | Both | Math/TextGame/Web | All | Yes |
VAGEN | PPO/GRPO | Single | Both | Multi | TextGame/Navigation | All | Yes |
ART | GRPO | Multi | Both | Multi | TextGame | All | Yes |
OpenManus-RL | PPO/DPO/GRPO | Multi | Outcome | Multi | TextGame | All | Yes |
RAGEN | PPO/GRPO | Single | Both | Multi | TextGame | All | Yes |
Github Repo | 🌟 Stars | Date | Org | Paper Link | RL Framework |
---|---|---|---|---|---|
RepoDeepSearch | 2025.8 | PKU, Bytedance, BIT | Paper | veRL | |
MedAgentGym | 2025.6 | Emory/Georgia Tech | Paper | Hugginface | |
CURE | 2025.6 | University of Chicago Princeton/ByteDance |
Paper | Huggingface | |
MASLab | 2025.5 | MASWorks | Paper | Custom | |
Time-R1 | 2025.5 | UIUC | Paper | veRL | |
ML-Agent | 2025.5 | MASWorks | Paper | Custom | |
SkyRL | 2025.4 | NovaSky | -- | veRL | |
digitalhuman | 2025.4 | Tencent | Paper | veRL | |
sweet_rl | 2025.3 | Meta/UCB | Paper | OpenRLHF | |
rllm | 2025.1 | Berkeley Sky Computing Lab BAIR / Together AI |
Notion Blog | veRL | |
open-r1 | 2025.1 | HuggingFace | -- | TRL |
📋 Click to view technical details
Github Repo | RL Algorithm | Single/Multi Agent | Outcome/Process Reward | Single/Multi Turn | Task | Reward Type | Tool usage |
---|---|---|---|---|---|---|---|
RepoDeepSearch | GRPO | Single | Both | Multi | Search/Repair | Rule/External | Yes |
MedAgentGym | SFT/DPO/PPO/GRPO | Single | Outcome | Multi | Medical/Code | External | Yes |
CURE | PPO | Single | Outcome | Single | Code | External | No |
MASLab | NO RL | Multi | Outcome | Multi | Code/Math/Reasoning | External | Yes |
Time-R1 | PPO/GRPO/DPO | Multi | Outcome | Multi | Temporal | All | Code |
ML-Agent | Custom | Single | Process | Multi | Code | All | Yes |
SkyRL | PPO/GRPO | Single | Outcome | Multi | Math/Code | All | Code |
digitalhuman | PPO/GRPO/ReMax/RLOO | Multi | Outcome | Multi | Empathy/Math/Code/MultimodalQA | Rule/Model/External | Yes |
sweet_rl | DPO | Multi | Process | Multi | Design/Code | Model | Web Browsing |
rllm | PPO/GRPO | Single | Outcome | Multi | Code Edit | External | Yes |
open-r1 | GRPO | Single | Outcome | Single | Math/Code | All | Yes |
Github Repo | 🌟 Stars | Date | Org | Paper Link | RL Framework |
---|---|---|---|---|---|
ARPO | 2025.7 | RUC, Kuaishou | Paper | veRL | |
terminal-bench-rl | 2025.7 | Individual (Danau5tin) | N/A | rLLM | |
MOTIF | 2025.6 | University of Maryland | Paper | trl | |
cmriat/l0 | 2025.6 | CMRIAT | Paper | veRL | |
agent-distillation | 2025.5 | KAIST | Paper | Custom | |
VDeepEyes | 2025.5 | Xiaohongshu/XJTU | Paper | veRL | |
EasyR1 | 2025.4 | Individual | repo1/paper2 | veRL | |
AutoCoA | 2025.3 | BJTU | Paper | veRL | |
ToRL | 2025.3 | SJTU | Paper | veRL | |
ReMA | 2025.3 | SJTU, UCL | Paper | veRL | |
Agentic-Reasoning | 2025.2 | Oxford | Paper | Custom | |
SimpleTIR | 2025.2 | NTU, Bytedance | Notion Blog | veRL | |
openrlhf_async_pipline | 2024.5 | OpenRLHF | Paper | OpenRLHF |
📋 Click to view technical details
Github Repo | RL Algorithm | Single/Multi Agent | Outcome/Process Reward | Single/Multi Turn | Task | Reward Type | Tool usage |
---|---|---|---|---|---|---|---|
ARPO | GRPO | Single | Outcome | Multi | Math/Coding | Model/Rule | Yes |
terminal-bench-rl | GRPO | Single | Outcome | Multi | Coding/Terminal | Model+External Verifier | Yes |
MOTIF | GRPO | Single | Outcome | Multi | QA | Rule | No |
cmriat/l0 | PPO | Multi | Process | Multi | QA | All | Yes |
agent-distillation | PPO | Single | Process | Multi | QA/Math | External | Yes |
VDeepEyes | PPO/GRPO | Multi | Process | Multi | VQA | All | Yes |
EasyR1 | GRPO | Single | Process | Multi | Vision-Language | Model | Yes |
AutoCoA | GRPO | Multi | Outcome | Multi | Reasoning/Math/QA | All | Yes |
ToRL | GRPO | Single | Outcome | Single | Math | Rule/External | Yes |
ReMA | PPO | Multi | Outcome | Multi | Math | Rule | No |
Agentic-Reasoning | Custom | Single | Process | Multi | QA/Math | External | Web Browsing |
SimpleTIR | PPO/GRPO (with extensions) | Single | Outcome | Multi | Math, Coding | All | Yes |
openrlhf_async_pipline | PPO/REINFORCE++/DPO/RLOO | Single | Outcome | Multi | Dialogue/Reasoning/QA | All | No |
Github Repo | 🌟 Stars | Date | Org | Paper Link | RL Framework |
---|---|---|---|---|---|
MEM1 | 2025.7 | MIT | Paper | veRL (based on Search-R1) | |
Memento | 2025.6 | UCL, Huawei | Paper | Custom | |
MemAgent | 2025.6 | Bytedance, Tsinghua-SIA | Paper | veRL |
📋 Click to view technical details
Github Repo | RL Algorithm | Single/Multi Agent | Outcome/Process Reward | Single/Multi Turn | Task | Reward Type | Tool usage |
---|---|---|---|---|---|---|---|
MEM1 | PPO/GRPO | Single | Outcome | Multi | WebShop/GSM8K/QA | Rule/Model | Yes |
Memento | soft Q-Learning | Single | Outcome | Multi | Research/QA/Code/Web | External/Rule | Yes |
MemAgent | PPO, GRPO, DPO | Multi | Outcome | Multi | Long-context QA | Rule/Model/External | Yes |
Github Repo | 🌟 Stars | Date | Org | Paper Link | RL Framework |
---|---|---|---|---|---|
Embodied-R1 | 2025.6 | Tianjing University | Paper | veRL |
📋 Click to view technical details
Github Repo | RL Algorithm | Single/Multi Agent | Outcome/Process Reward | Single/Multi Turn | Task | Reward Type | Tool usage |
---|---|---|---|---|---|---|---|
Embodied-R1 | GRPO | Single | Outcome | Single | Grounding/Waypoint | Rule | No |
Github Repo | 🌟 Stars | Date | Org | Paper Link | RL Framework |
---|---|---|---|---|---|
MMedAgent-RL | 2025.8 | Unknown | paper | Unknown | |
DoctorAgent-RL | 2025.5 | UCAS/CAS/USTC | Paper | RAGEN | |
Biomni | 2025.3 | Stanford University (SNAP) | Paper | Custom |
📋 Click to view technical details
Github Repo | RL Algorithm | Single/Multi Agent | Outcome/Process Reward | Single/Multi Turn | Task | Reward Type | Tool usage |
---|---|---|---|---|---|---|---|
MMedAgent-RL | Unknown | Multi | Unknown | Unknown | Unknown | Unknown | Unknown |
DoctorAgent-RL | GRPO | Multi | Both | Multi | Consultation/Diagnosis | Model/Rule | No |
Biomni | TBD | Single | TBD | Single | scRNAseq/CRISPR/ADMET/Knowledge | TBD | Yes |
Github Repo | 🌟 Stars | Date | Org | Task |
---|---|---|---|---|
CompassVerifier | 2025.7 | Shanghai AI Lab | Knowledge/Math/Science/GeneralReasoning | |
Mind2Web-2 | 2025.6 | Ohio State University | Web | |
gem | 2025.5 | Sea AI Lab | Math/Code/Game/QA | |
MLE-Dojo | 2025.5 | GIT, Stanford | MLE | |
atropos | 2025.4 | Nous Research | Game/Code/Tool | |
InternBootcamp | 2025.4 | InternBootcamp | Coding/QA/Game | |
loong | 2025.3 | CAMEL-AI.org | RLVR | |
reasoning-gym | 2025.1 | open-thought | Math/Game | |
llmgym | 2025.1 | tensorzero | TextGame/Tool | |
debug-gym | 2024.11 | Microsoft Research | Debugging/Game/Code | |
gym-llm | 2024.8 | Rodrigo Sánchez Molina | Control/Game | |
AgentGym | 2024.6 | Fudan | Web/Game | |
tau-bench | 2024.6 | Sierra | Tool | |
appworld | 2024.6 | Stony Brook University | Phone Use | |
android_world | 2024.5 | Google Research | Phone Use | |
TheAgentCompany | 2024.3 | CMU, Duke | Coding | |
LlamaGym | 2024.3 | Rohan Pandey | Game | |
visualwebarena | 2024.1 | CMU | Web | |
LMRL-Gym | 2023.12 | UC Berkeley | Game | |
OSWorld | 2023.10 | HKU, CMU, Salesforce, Waterloo | Computer Use | |
webarena | 2023.7 | CMU | Web | |
AgentBench | 2023.7 | Tsinghua University | Game/Web/QA/Tool | |
WebShop | 2022.7 | Princeton-NLP | Web | |
ScienceWorld | 2022.3 | AllenAI | TextGame/ScienceQA | |
alfworld | 2020.10 | Microsoft, CMU, UW | Embodied | |
factorio-learning-environment | 2021.6 | JackHopkins | Game | |
jericho | 2018.10 | Microsoft, GIT | TextGame | |
TextWorld | 2018.6 | Microsoft Research | TextGame |
- JoyAgents-R1: Joint Evolution Dynamics for Versatile Multi-LLM Agents with Reinforcement Learning
- Shop-R1: Rewarding LLMs to Simulate Human Behavior in Online Shopping via Reinforcement Learning
- Training Long-Context, Multi-Turn Software Engineering Agents with Reinforcement Learning
- Acting Less is Reasoning More! Teaching Model to Act Efficiently
- Agentic Reasoning and Tool Integration for LLMs via Reinforcement Learning
- ComputerRL: Scaling End-to-End Online Reinforcement Learning for Computer Use Agents
- Atom-Searcher: Enhancing Agentic Deep Research via Fine-Grained Atomic Thought Reward
- MUA-RL: MULTI-TURN USER-INTERACTING AGENTREINFORCEMENT LEARNING FOR AGENTIC TOOL USE
- Understanding Tool-Integrated Reasoning
- Memory-R1: Enhancing Large Language Model Agents to Manage and Utilize Memories via Reinforcement Learning
- Encouraging Good Processes Without the Need for Good Answers: Reinforcement Learning for LLM Agent Planning
- SFR-DeepResearch: Towards Effective Reinforcement Learning for Autonomously Reasoning Single Agents
- WebExplorer: Explore and Evolve for Training Long-Horizon Web Agents
- EnvX: Agentize Everything with Agentic AI
- UI-TARS-2 Technical Report: Advancing GUI Agent with Multi-Turn Reinforcement Learning
- UI-Venus Technical Report: Building High-performance UI Agents with RFT
If you find this repository useful, please consider citing it:
@misc{agentsMeetRL,
title={When LLM Agents Meet Reinforcement Learning: A Comprehensive Survey},
author={AgentsMeetRL Contributors},
year={2025},
url={https://github.com/thinkwee/agentsMeetRL}
}
Made with ❤️ by the AgentsMeetRL community
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for AgentsMeetRL
Similar Open Source Tools

AgentsMeetRL
AgentsMeetRL is an awesome list that summarizes open-source repositories for training LLM Agents using reinforcement learning. The criteria for identifying an agent project are multi-turn interactions or tool use. The project is based on code analysis from open-source repositories using GitHub Copilot Agent. The focus is on reinforcement learning frameworks, RL algorithms, rewards, and environments that projects depend on, for everyone's reference on technical choices.

trae-agent
Trae-agent is a Python library for building and training reinforcement learning agents. It provides a simple and flexible framework for implementing various reinforcement learning algorithms and experimenting with different environments. With Trae-agent, users can easily create custom agents, define reward functions, and train them on a variety of tasks. The library also includes utilities for visualizing agent performance and analyzing training results, making it a valuable tool for both beginners and experienced researchers in the field of reinforcement learning.

docs
This repository contains the documentation for the Strands Agents SDK, a simple yet powerful framework for building and running AI agents. The documentation is built using MkDocs and provides guides, examples, and API references. The official documentation is available online at: https://strandsagents.com.

OpenManus-RL
OpenManus-RL is an open-source initiative focused on enhancing reasoning and decision-making capabilities of large language models (LLMs) through advanced reinforcement learning (RL)-based agent tuning. The project explores novel algorithmic structures, diverse reasoning paradigms, sophisticated reward strategies, and extensive benchmark environments. It aims to push the boundaries of agent reasoning and tool integration by integrating insights from leading RL tuning frameworks and continuously updating progress in a dynamic, live-streaming fashion.

ms-agent
MS-Agent is a lightweight framework designed to empower agents with autonomous exploration capabilities. It provides a flexible and extensible architecture for creating agents capable of tasks like code generation, data analysis, and tool calling with MCP support. The framework supports multi-agent interactions, deep research, code generation, and is lightweight and extensible for various applications.

MaxKB
MaxKB is a knowledge base Q&A system based on the LLM large language model. MaxKB = Max Knowledge Base, which aims to become the most powerful brain of the enterprise.

atomic-agents
The Atomic Agents framework is a modular and extensible tool designed for creating powerful applications. It leverages Pydantic for data validation and serialization. The framework follows the principles of Atomic Design, providing small and single-purpose components that can be combined. It integrates with Instructor for AI agent architecture and supports various APIs like Cohere, Anthropic, and Gemini. The tool includes documentation, examples, and testing features to ensure smooth development and usage.

pentest-agent
Pentest Agent is a lightweight and versatile tool designed for conducting penetration testing on network systems. It provides a user-friendly interface for scanning, identifying vulnerabilities, and generating detailed reports. The tool is highly customizable, allowing users to define specific targets and parameters for testing. Pentest Agent is suitable for security professionals and ethical hackers looking to assess the security posture of their systems and networks.

youtu-graphrag
Youtu-GraphRAG is a vertically unified agentic paradigm that connects the entire framework based on graph schema, allowing seamless domain transfer with minimal intervention. It introduces key innovations like schema-guided hierarchical knowledge tree construction, dually-perceived community detection, agentic retrieval, advanced construction and reasoning capabilities, fair anonymous dataset 'AnonyRAG', and unified configuration management. The framework demonstrates robustness with lower token cost and higher accuracy compared to state-of-the-art methods, enabling enterprise-scale deployment with minimal manual intervention for new domains.

llama.ui
llama.ui is an open-source desktop application that provides a beautiful, user-friendly interface for interacting with large language models powered by llama.cpp. It is designed for simplicity and privacy, allowing users to chat with powerful quantized models on their local machine without the need for cloud services. The project offers multi-provider support, conversation management with indexedDB storage, rich UI components including markdown rendering and file attachments, advanced features like PWA support and customizable generation parameters, and is privacy-focused with all data stored locally in the browser.

milvus
Milvus is an open-source vector database built to power embedding similarity search and AI applications. Milvus makes unstructured data search more accessible, and provides a consistent user experience regardless of the deployment environment. Milvus 2.0 is a cloud-native vector database with storage and computation separated by design. All components in this refactored version of Milvus are stateless to enhance elasticity and flexibility. For more architecture details, see Milvus Architecture Overview. Milvus was released under the open-source Apache License 2.0 in October 2019. It is currently a graduate project under LF AI & Data Foundation.

raft
RAFT (Reusable Accelerated Functions and Tools) is a C++ header-only template library with an optional shared library that contains fundamental widely-used algorithms and primitives for machine learning and information retrieval. The algorithms are CUDA-accelerated and form building blocks for more easily writing high performance applications.

holmesgpt
HolmesGPT is an open-source DevOps assistant powered by OpenAI or any tool-calling LLM of your choice. It helps in troubleshooting Kubernetes, incident response, ticket management, automated investigation, and runbook automation in plain English. The tool connects to existing observability data, is compliance-friendly, provides transparent results, supports extensible data sources, runbook automation, and integrates with existing workflows. Users can install HolmesGPT using Brew, prebuilt Docker container, Python Poetry, or Docker. The tool requires an API key for functioning and supports OpenAI, Azure AI, and self-hosted LLMs.

authed
Authed is an identity and authentication system designed for AI agents, providing unique identities, secure agent-to-agent authentication, and dynamic access policies. It eliminates the need for static credentials and human intervention in authentication workflows. The protocol is developer-first, open-source, and scalable, enabling AI agents to interact securely across different ecosystems and organizations.

deepflow
DeepFlow is an open-source project that provides deep observability for complex cloud-native and AI applications. It offers Zero Code data collection with eBPF for metrics, distributed tracing, request logs, and function profiling. DeepFlow is integrated with SmartEncoding to achieve Full Stack correlation and efficient access to all observability data. With DeepFlow, cloud-native and AI applications automatically gain deep observability, removing the burden of developers continually instrumenting code and providing monitoring and diagnostic capabilities covering everything from code to infrastructure for DevOps/SRE teams.

traceroot
TraceRoot is a tool that helps engineers debug production issues 10× faster using AI-powered analysis of traces, logs, and code context. It accelerates the debugging process with AI-powered insights, integrates seamlessly into the development workflow, provides real-time trace and log analysis, code context understanding, and intelligent assistance. Features include ease of use, LLM flexibility, distributed services, AI debugging interface, and integration support. Users can get started with TraceRoot Cloud for a 7-day trial or self-host the tool. SDKs are available for Python and JavaScript/TypeScript.
For similar tasks

AgentsMeetRL
AgentsMeetRL is an awesome list that summarizes open-source repositories for training LLM Agents using reinforcement learning. The criteria for identifying an agent project are multi-turn interactions or tool use. The project is based on code analysis from open-source repositories using GitHub Copilot Agent. The focus is on reinforcement learning frameworks, RL algorithms, rewards, and environments that projects depend on, for everyone's reference on technical choices.
For similar jobs

weave
Weave is a toolkit for developing Generative AI applications, built by Weights & Biases. With Weave, you can log and debug language model inputs, outputs, and traces; build rigorous, apples-to-apples evaluations for language model use cases; and organize all the information generated across the LLM workflow, from experimentation to evaluations to production. Weave aims to bring rigor, best-practices, and composability to the inherently experimental process of developing Generative AI software, without introducing cognitive overhead.

LLMStack
LLMStack is a no-code platform for building generative AI agents, workflows, and chatbots. It allows users to connect their own data, internal tools, and GPT-powered models without any coding experience. LLMStack can be deployed to the cloud or on-premise and can be accessed via HTTP API or triggered from Slack or Discord.

VisionCraft
The VisionCraft API is a free API for using over 100 different AI models. From images to sound.

kaito
Kaito is an operator that automates the AI/ML inference model deployment in a Kubernetes cluster. It manages large model files using container images, avoids tuning deployment parameters to fit GPU hardware by providing preset configurations, auto-provisions GPU nodes based on model requirements, and hosts large model images in the public Microsoft Container Registry (MCR) if the license allows. Using Kaito, the workflow of onboarding large AI inference models in Kubernetes is largely simplified.

PyRIT
PyRIT is an open access automation framework designed to empower security professionals and ML engineers to red team foundation models and their applications. It automates AI Red Teaming tasks to allow operators to focus on more complicated and time-consuming tasks and can also identify security harms such as misuse (e.g., malware generation, jailbreaking), and privacy harms (e.g., identity theft). The goal is to allow researchers to have a baseline of how well their model and entire inference pipeline is doing against different harm categories and to be able to compare that baseline to future iterations of their model. This allows them to have empirical data on how well their model is doing today, and detect any degradation of performance based on future improvements.

tabby
Tabby is a self-hosted AI coding assistant, offering an open-source and on-premises alternative to GitHub Copilot. It boasts several key features: * Self-contained, with no need for a DBMS or cloud service. * OpenAPI interface, easy to integrate with existing infrastructure (e.g Cloud IDE). * Supports consumer-grade GPUs.

spear
SPEAR (Simulator for Photorealistic Embodied AI Research) is a powerful tool for training embodied agents. It features 300 unique virtual indoor environments with 2,566 unique rooms and 17,234 unique objects that can be manipulated individually. Each environment is designed by a professional artist and features detailed geometry, photorealistic materials, and a unique floor plan and object layout. SPEAR is implemented as Unreal Engine assets and provides an OpenAI Gym interface for interacting with the environments via Python.

Magick
Magick is a groundbreaking visual AIDE (Artificial Intelligence Development Environment) for no-code data pipelines and multimodal agents. Magick can connect to other services and comes with nodes and templates well-suited for intelligent agents, chatbots, complex reasoning systems and realistic characters.