OpenManus-RL

A live stream development of RL tunning for LLM agents

Stars: 1956

Visit

OpenManus-RL is an open-source initiative focused on enhancing reasoning and decision-making capabilities of large language models (LLMs) through advanced reinforcement learning (RL)-based agent tuning. The project explores novel algorithmic structures, diverse reasoning paradigms, sophisticated reward strategies, and extensive benchmark environments. It aims to push the boundaries of agent reasoning and tool integration by integrating insights from leading RL tuning frameworks and continuously updating progress in a dynamic, live-streaming fashion.

README:

OpenManus-RL

🤗 Dataset (OpenManus-RL)

OpenManus-RL is an open-source initiative collaboratively led by Ulab-UIUC and MetaGPT .

This project is an extended version of the original @OpenManus initiative. Inspired by successful RL tunning for reasoning LLM such as Deepseek-R1, QwQ-32B, we will explore new paradigms for RL-based LLM agent tuning, particularly building upon foundations.

We are committed to regularly updating our exploration directions and results in a dynamic, live-streaming fashion. All progress, including rigorous testing on agent benchmarks such as GAIA, AgentBench, WebShop, and OSWorld, and tuned models, will be openly shared and continuously updated.

We warmly welcome contributions from the broader community—join us in pushing the boundaries of agent reasoning and tool integration!

Code and dataset coming soon! Stay tuned!

🔔 News

[2025-03-09] 🍺 We collect and opensource our Agent SFT dataset at Huggingface, go try it!
[2025-03-08] 🎉 We are collaborating with @OpenManus from Metagpt to work on this project together!
[2025-03-06] 🥳 We(UIUC-Ulab) are announcing our live-streaming project, OpenManus-RL.

Current Team Members

@Kunlun Zhu(Ulab-UIUC), @Jiayi Zhang(MetaGPT), @Xinbing Liang,@Xiangxin Zhou, @Yanfei Zhang, @Yingxuan Yang, @Zeping Chen,@Weijia Zhang, @Muxin Tian, @Haofei Yu(Ulab-UIUC), @Jinyu Xiang, @Yifan Wu, @Bowen Jin

How to Contribute

We wholeheartedly welcome suggestions, feedback, and contributions from the community! Feel free to:

We welcome contributions, including fine-tuning codebase, tuning dataset, environment setup, and computing resources. Create issues for feature requests, bug reports, or ideas. Submit pull requests to help improve OpenManus-RL. Or simply reach out to us for direct collaboration. Important contributors will be listed as co-authors to our paper.

Roadmap

Agent Environment Support Setting up LLM agent environment for online RL tunning.
Agent Trajectories Data Collection Connect to specialized reasoning models such as deepseek-r1, QwQ-32B for more complex inference tasks to collect comprehensive agent trajectories.
RL-Tuning Model Paradigm Provide an RL fine-tuning approach for customizing the agent’s behavior in our agent environment.
Test on Agent Benchmarks Evaluate our framework on agentic benchmark such as Webshop, GAIA, OSWorld, AgentBench

Method

Our method proposes an advanced reinforcement learning (RL)-based agent tuning framework designed to significantly enhance reasoning and decision-making capabilities of large language models (LLMs). Drawing inspiration from RAGEN's Reasoning-Interaction Chain Optimization (RICO), our approach further explores novel algorithmic structures, diverse reasoning paradigms, sophisticated reward strategies, and extensive benchmark environments.

Reasoning Models Exploration

To benchmark the reasoning capabilities effectively, we evaluate multiple state-of-the-art reasoning models:

GPT-O1
Deepseek-R1
QwQ-32B

Each model provides unique reasoning capabilities that inform downstream optimization and training strategies.

Alternative Rollout Strategies

We experiment with a variety of rollout strategies to enhance agent planning efficiency and reasoning robustness, including:

Tree-of-Thoughts (ToT): Employs tree-based reasoning paths, enabling agents to explore branching possibilities systematically.
Graph-of-Thoughts (GoT): Utilizes graph structures to represent complex reasoning dependencies effectively.
DFSDT (Depth-First Search Decision Trees): Optimizes action selection through depth-first search, enhancing long-horizon planning.
Monte Carlo Tree Search (MCTS): Explores reasoning and decision paths probabilistically, balancing exploration and exploitation effectively.

These methods help identify optimal rollout techniques for various reasoning tasks.

Diverse Reasoning Formats

We specifically analyze and compare several reasoning output formats, notably:

ReAct: Integrates reasoning and action explicitly, encouraging structured decision-making.
Outcome-based Reasoning: Optimizes toward explicit outcome predictions, driving focused goal alignment.

These formats are rigorously compared to derive the most effective reasoning representation for various tasks.

Post-Training Strategies

We investigate multiple post-training methodologies to fine-tune agent reasoning effectively:

Supervised Fine-Tuning (SFT): Initializes reasoning capabilities using human-annotated instructions.
Generalized Reward-based Policy Optimization (GRPO): Incorporates:
- Format-based Rewards: Rewards adherence to specified reasoning structures.
- Outcome-based Rewards: Rewards accurate task completion and goal attainment.
Proximal Policy Optimization (PPO): Enhances agent stability through proximal updates.
Direct Preference Optimization (DPO): Leverages explicit human preferences to optimize agent outputs directly.
Preference-based Reward Modeling (PRM): Uses learned reward functions derived from human preference data.

Training of Agent Reward Model

We train specialized agent reward models using annotated data to accurately quantify nuanced reward signals. These models are then leveraged to guide agent trajectory selection during both training and evaluation phases.

Test-time Scaling of Trajectories

During the inference phase, trajectory scaling methods are implemented, allowing agents to flexibly adapt to varying task complexities, thus enhancing robustness and performance in real-world scenarios.

Action Space Awareness and Strategic Exploration

Agents are equipped with action-space awareness, employing systematic exploration strategies designed to navigate complex action spaces effectively, ultimately maximizing expected rewards.

Integration with RL Tuning Frameworks

We integrate insights and methodologies from leading RL tuning frameworks, including:

Verl
TinyZero
OpenR1
Trlx

Through these frameworks, agents can effectively balance exploration and exploitation, optimize reasoning processes, and adapt dynamically to novel environments.

In summary, our method systematically integrates advanced reasoning paradigms, diverse rollout strategies, sophisticated reward modeling, and robust RL frameworks, significantly advancing the capability and adaptability of reasoning-enhanced LLM agents.

Dataset

OpenManusRL-Dataset combines agent trajectories from AgentInstruct, Agent-FLAN and [AgentTraj-L(AgentGym)] with features:

🔍 ReAct Framework - Reasoning-Acting integration
🧠 Structured Training - Separate format/reasoning learning
🚫 Anti-Hallucination - Negative samples + environment grounding
🌐 6 Domains - OS, DB, Web, KG, Household, E-commerce

Dataset Overview

Source	Trajectories	Avg Turns	Key Features
AgentInstruct	1,866	5.24	Multi-task QA, CoT reasoning
Agent-FLAN	34442	3-35	Error recovery patterns, diverse real-world tasks
AgentTraj-L	14485	3-35	interactive environments and tasks
Combined	50793	4-20	Enhanced generalization, uniform format with broader task coverage

Supported Tasks

text-generation: ReAct-style instruction following
conversational-ai: Tool-augmented dialogues

Languages

English

Data Instances

ReAct Pattern Example:

{
  "id": "os_0",
  "conversations": [
    {"role": "user", "content": "Count files in /etc"},
    {"role": "assistant", "content": "Think: Need reliable counting method\nAct: bash\n```bash\nls -1 /etc | wc -l\n```"},
    {"role": "user", "content": "OS Output: 220"},
    {"role": "assistant", "content": "Think: Verified through execution\nAct: answer(220)"}
  ]
}

Running

OpenManus-RL

A simplified library for Supervised Fine-Tuning (SFT) and GRPO tunning of language models for agentic system. (developed upon Verl from Bytedance) We are still laboriously developing this part, welcome feedback.

Installation

First, create a conda environment and activate it:

conda create -n openmanus-rl python=3.10
conda activate openmanus-rl

Then, install the required dependencies:

# install torch [or you can skip this step and let vllm to install the correct version for you]
pip install torch==2.4.0 --index-url https://download.pytorch.org/whl/cu121
# install vllm
pip3 install vllm==0.6.3 # or you can install 0.5.4, 0.4.2 and 0.3.1

# verl
pip install -e .

# flash attention 2
pip3 install flash-attn --no-build-isolation
pip install wandb

Quick start

Train a reasoning + search LLM on NQ dataset with e5 as the retriever and wikipedia as the corpus.

(1) Download the indexing and corpus.

From https://huggingface.co/datasets/CharlieDreemur/OpenManus-RL

(3) Launch a local AgentGym server.

todo here

(4) Run RL training (PPO) with Llama-3.2-3b-base.

conda activate openmanus-rl
bash train_ppo.sh

Related Work

Agent tuning

Offline Training of Language Model Agents with Functions as Learnable Weights. [paper]
FIREACT : TOWARD LANGUAGE AGENT FINE-TUNING. [paper]
AgentTuning: Enabling Generalized Agent Abilities for LLMs. [paper]
ReAct Meets ActRe: When Language Agents Enjoy Training Data Autonomy. [paper]
UI-TARS: Pioneering Automated GUI Interaction with Native Agents. [paper]
ATLAS: Agent Tuning via Learning Critical Steps. [paper]

Tool using

Toolformer: Language Models Can Teach Themselves to Use Tools. [paper]
ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs. [paper]

Agent tuning instruction dataset

Agent-FLAN: Designing Data and Methods of Effective Agent Tuning for Large Language Models. [paper]
AgentOhana: Design Unified Data and Training Pipeline for Effective Agent Learning. [paper]

RL tuning

Training Language Models to Follow Instructions with Human Feedback. [paper]
Deepseekmath: Pushing the Limits of Mathematical Reasoning in Open Language Models. [paper]
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning. [paper]

Benchmark:

AgentBench: Evaluating LLMs as Agents. paper
OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments. paper
AndroidWorld: A Dynamic Benchmarking Environment for Autonomous Agents. paper
WebShop: Towards Scalable Real-World Web Interaction with Autonomous Agents. paper
GAIA: a benchmark for General AI Assistants. paper
TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks. paper

Similar Code

RAGEN: Training Agents by Reinforcing Reasoning. [code]

Acknowledgement

We extend our thanks to ulab-uiuc (https://ulab-uiuc.github.io/) and Openmanus (https://github.com/mannaandpoem/OpenManus)) team from MetaGPT for their support and shared knowledge. Their mission and community contributions help drive innovations like OpenManus forward.

We also want to thank AgentGym(https://agentgym.github.io/) and Verl (https://github.com/volcengine/verl) for their opensource.

We welcome all developers who are interested in this project can reach out to ([email protected])

Stay tuned for updates and the official release of our repository. Together, let's build a thriving open-source agent ecosystem!

Community Group

Join our networking group on Feishu and share your experience with other developers!

Citation

Please cite the following paper if you find OpenManus helpful!

@misc{OpenManus,
  author       = {OpenManus-RL Team},
  title        = {OpenManus-RL: Open Platform for Generalist LLM Reasoning Agents with RL optimization},
  year         = {2025},
  organization = {GitHub},
  url          = {https://github.com/OpenManus/OpenManus-RL},
}

For Tasks:

Click tags to check more tools for each tasks

fine-tune language models optimize agent behavior evaluate reasoning capabilities integrate with rl frameworks enhance decision-making

For Jobs:

machine learning engineer data scientist research scientist ai engineer software developer

Alternative AI tools for OpenManus-RL

Similar Open Source Tools

OpenManus-RL

github

: 2.0k

MM-RLHF

MM-RLHF is a comprehensive project for aligning Multimodal Large Language Models (MLLMs) with human preferences. It includes a high-quality MLLM alignment dataset, a Critique-Based MLLM reward model, a novel alignment algorithm MM-DPO, and benchmarks for reward models and multimodal safety. The dataset covers image understanding, video understanding, and safety-related tasks with model-generated responses and human-annotated scores. The reward model generates critiques of candidate texts before assigning scores for enhanced interpretability. MM-DPO is an alignment algorithm that achieves performance gains with simple adjustments to the DPO framework. The project enables consistent performance improvements across 10 dimensions and 27 benchmarks for open-source MLLMs.

github

: 116

griptape

Griptape is a modular Python framework for building AI-powered applications that securely connect to your enterprise data and APIs. It offers developers the ability to maintain control and flexibility at every step. Griptape's core components include Structures (Agents, Pipelines, and Workflows), Tasks, Tools, Memory (Conversation Memory, Task Memory, and Meta Memory), Drivers (Prompt and Embedding Drivers, Vector Store Drivers, Image Generation Drivers, Image Query Drivers, SQL Drivers, Web Scraper Drivers, and Conversation Memory Drivers), Engines (Query Engines, Extraction Engines, Summary Engines, Image Generation Engines, and Image Query Engines), and additional components (Rulesets, Loaders, Artifacts, Chunkers, and Tokenizers). Griptape enables developers to create AI-powered applications with ease and efficiency.

github

: 2.2k

shandu

Shandu is an advanced AI research system that automates comprehensive research processes using language models, web scraping, and iterative exploration to generate well-structured reports with citations. It features intelligent state-based workflow, deep exploration, multi-source information synthesis, enhanced web scraping, smart source evaluation, content analysis pipeline, comprehensive report generation, parallel processing, adaptive search strategy, and full citation management.

github

: 426

llms-interview-questions

This repository contains a comprehensive collection of 63 must-know Large Language Models (LLMs) interview questions. It covers topics such as the architecture of LLMs, transformer models, attention mechanisms, training processes, encoder-decoder frameworks, differences between LLMs and traditional statistical language models, handling context and long-term dependencies, transformers for parallelization, applications of LLMs, sentiment analysis, language translation, conversation AI, chatbots, and more. The readme provides detailed explanations, code examples, and insights into utilizing LLMs for various tasks.

github

: 56

JamAIBase

JamAI Base is an open-source platform integrating SQLite and LanceDB databases with managed memory and RAG capabilities. It offers built-in LLM, vector embeddings, and reranker orchestration accessible through a spreadsheet-like UI and REST API. Users can transform static tables into dynamic entities, facilitate real-time interactions, manage structured data, and simplify chatbot development. The tool focuses on ease of use, scalability, flexibility, declarative paradigm, and innovative RAG techniques, making complex data operations accessible to users with varying technical expertise.

github

: 192

MathVerse

MathVerse is an all-around visual math benchmark designed to evaluate the capabilities of Multi-modal Large Language Models (MLLMs) in visual math problem-solving. It collects high-quality math problems with diagrams to assess how well MLLMs can understand visual diagrams for mathematical reasoning. The benchmark includes 2,612 problems transformed into six versions each, contributing to 15K test samples. It also introduces a Chain-of-Thought (CoT) Evaluation strategy for fine-grained assessment of output answers.

github

: 115

Bobble-AI

AmbuFlow is a mobile application developed using HTML, CSS, JavaScript, and Google API to notify patients of nearby hospitals and provide estimated ambulance arrival times. It offers critical details like patient's location and enhances GPS route management with real-time traffic data for efficient navigation. The app helps users find nearby hospitals, track ambulances in real-time, and manage ambulance routes based on traffic and distance. It ensures quick emergency response, real-time tracking, enhanced communication, resource management, and a user-friendly interface for seamless navigation in high-stress situations.

github

: 68

motia

Motia is an AI agent framework designed for software engineers to create, test, and deploy production-ready AI agents quickly. It provides a code-first approach, allowing developers to write agent logic in familiar languages and visualize execution in real-time. With Motia, developers can focus on business logic rather than infrastructure, offering zero infrastructure headaches, multi-language support, composable steps, built-in observability, instant APIs, and full control over AI logic. Ideal for building sophisticated agents and intelligent automations, Motia's event-driven architecture and modular steps enable the creation of GenAI-powered workflows, decision-making systems, and data processing pipelines.

github

: 1.5k

aibrix

AIBrix is an open-source initiative providing essential building blocks for scalable GenAI inference infrastructure. It delivers a cloud-native solution optimized for deploying, managing, and scaling large language model (LLM) inference, tailored to enterprise needs. Key features include High-Density LoRA Management, LLM Gateway and Routing, LLM App-Tailored Autoscaler, Unified AI Runtime, Distributed Inference, Distributed KV Cache, Cost-efficient Heterogeneous Serving, and GPU Hardware Failure Detection.

github

: 3.4k

multi-agent-orchestrator

Multi-Agent Orchestrator is a flexible and powerful framework for managing multiple AI agents and handling complex conversations. It intelligently routes queries to the most suitable agent based on context and content, supports dual language implementation in Python and TypeScript, offers flexible agent responses, context management across agents, extensible architecture for customization, universal deployment options, and pre-built agents and classifiers. It is suitable for various applications, from simple chatbots to sophisticated AI systems, accommodating diverse requirements and scaling efficiently.

github

: 4.6k

DriveLM

DriveLM is a multimodal AI model that enables autonomous driving by combining computer vision and natural language processing. It is designed to understand and respond to complex driving scenarios using visual and textual information. DriveLM can perform various tasks related to driving, such as object detection, lane keeping, and decision-making. It is trained on a massive dataset of images and text, which allows it to learn the relationships between visual cues and driving actions. DriveLM is a powerful tool that can help to improve the safety and efficiency of autonomous vehicles.

github

: 917

FloTorch

github

: 69

ComfyUI-Copilot

ComfyUI-Copilot is an intelligent assistant built on the Comfy-UI framework that simplifies and enhances the AI algorithm debugging and deployment process through natural language interactions. It offers intuitive node recommendations, workflow building aids, and model querying services to streamline development processes. With features like interactive Q&A bot, natural language node suggestions, smart workflow assistance, and model querying, ComfyUI-Copilot aims to lower the barriers to entry for beginners, boost development efficiency with AI-driven suggestions, and provide real-time assistance for developers.

github

: 949

Linguflex

Linguflex is a project that aims to simulate engaging, authentic, human-like interaction with AI personalities. It offers voice-based conversation with custom characters, alongside an array of practical features such as controlling smart home devices, playing music, searching the internet, fetching emails, displaying current weather information and news, assisting in scheduling, and searching or generating images.

github

: 125

veScale

veScale is a PyTorch Native LLM Training Framework. It provides a set of tools and components to facilitate the training of large language models (LLMs) using PyTorch. veScale includes features such as 4D parallelism, fast checkpointing, and a CUDA event monitor. It is designed to be scalable and efficient, and it can be used to train LLMs on a variety of hardware platforms.

github

: 531

For similar tasks

Mortal

Mortal (凡夫) is a free and open source AI for Japanese mahjong, powered by deep reinforcement learning. It provides a comprehensive solution for playing Japanese mahjong with AI assistance. The project focuses on utilizing deep reinforcement learning techniques to enhance gameplay and decision-making in Japanese mahjong. Mortal offers a user-friendly interface and detailed documentation to assist users in understanding and utilizing the AI effectively. The project is actively maintained and welcomes contributions from the community to further improve the AI's capabilities and performance.

github

: 929

Smart-Connections-Visualizer

The Smart Connections Visualizer Plugin is a tool designed to enhance note-taking and information visualization by creating dynamic force-directed graphs that represent connections between notes or excerpts. Users can customize visualization settings, preview notes, and interact with the graph to explore relationships and insights within their notes. The plugin aims to revolutionize communication with AI and improve decision-making processes by visualizing complex information in a more intuitive and context-driven manner.

github

: 76

OpenManus-RL

github

: 2.0k

RLHF-Reward-Modeling

This repository contains code for training reward models for Deep Reinforcement Learning-based Reward-modulated Hierarchical Fine-tuning (DRL-based RLHF), Iterative Selection Fine-tuning (Rejection sampling fine-tuning), and iterative Decision Policy Optimization (DPO). The reward models are trained using a Bradley-Terry model based on the Gemma and Mistral language models. The resulting reward models achieve state-of-the-art performance on the RewardBench leaderboard for reward models with base models of up to 13B parameters.

github

: 83

h2o-llmstudio

H2O LLM Studio is a framework and no-code GUI designed for fine-tuning state-of-the-art large language models (LLMs). With H2O LLM Studio, you can easily and effectively fine-tune LLMs without the need for any coding experience. The GUI is specially designed for large language models, and you can finetune any LLM using a large variety of hyperparameters. You can also use recent finetuning techniques such as Low-Rank Adaptation (LoRA) and 8-bit model training with a low memory footprint. Additionally, you can use Reinforcement Learning (RL) to finetune your model (experimental), use advanced evaluation metrics to judge generated answers by the model, track and compare your model performance visually, and easily export your model to the Hugging Face Hub and share it with the community.

github

: 4.1k

MathCoder

MathCoder is a repository focused on enhancing mathematical reasoning by fine-tuning open-source language models to use code for modeling and deriving math equations. It introduces MathCodeInstruct dataset with solutions interleaving natural language, code, and execution results. The repository provides MathCoder models capable of generating code-based solutions for challenging math problems, achieving state-of-the-art scores on MATH and GSM8K datasets. It offers tools for model deployment, inference, and evaluation, along with a citation for referencing the work.

github

: 173

Awesome-Text2SQL

Awesome Text2SQL is a curated repository containing tutorials and resources for Large Language Models, Text2SQL, Text2DSL, Text2API, Text2Vis, and more. It provides guidelines on converting natural language questions into structured SQL queries, with a focus on NL2SQL. The repository includes information on various models, datasets, evaluation metrics, fine-tuning methods, libraries, and practice projects related to Text2SQL. It serves as a comprehensive resource for individuals interested in working with Text2SQL and related technologies.

github

: 1.5k

Awesome-LLM

Awesome-LLM is a curated list of resources related to large language models, focusing on papers, projects, frameworks, tools, tutorials, courses, opinions, and other useful resources in the field. It covers trending LLM projects, milestone papers, other papers, open LLM projects, LLM training frameworks, LLM evaluation frameworks, tools for deploying LLM, prompting libraries & tools, tutorials, courses, books, and opinions. The repository provides a comprehensive overview of the latest advancements and resources in the field of large language models.

github

: 22.1k

For similar jobs

sweep

Sweep is an AI junior developer that turns bugs and feature requests into code changes. It automatically handles developer experience improvements like adding type hints and improving test coverage.

github

: 7.1k

teams-ai

The Teams AI Library is a software development kit (SDK) that helps developers create bots that can interact with Teams and Microsoft 365 applications. It is built on top of the Bot Framework SDK and simplifies the process of developing bots that interact with Teams' artificial intelligence capabilities. The SDK is available for JavaScript/TypeScript, .NET, and Python.

github

: 502

ai-guide

This guide is dedicated to Large Language Models (LLMs) that you can run on your home computer. It assumes your PC is a lower-end, non-gaming setup.

github

: 159

classifai

Supercharge WordPress Content Workflows and Engagement with Artificial Intelligence. Tap into leading cloud-based services like OpenAI, Microsoft Azure AI, Google Gemini and IBM Watson to augment your WordPress-powered websites. Publish content faster while improving SEO performance and increasing audience engagement. ClassifAI integrates Artificial Intelligence and Machine Learning technologies to lighten your workload and eliminate tedious tasks, giving you more time to create original content that matters.

github

: 620

chatbot-ui

Chatbot UI is an open-source AI chat app that allows users to create and deploy their own AI chatbots. It is easy to use and can be customized to fit any need. Chatbot UI is perfect for businesses, developers, and anyone who wants to create a chatbot.

github

: 27.7k

BricksLLM

BricksLLM is a cloud native AI gateway written in Go. Currently, it provides native support for OpenAI, Anthropic, Azure OpenAI and vLLM. BricksLLM aims to provide enterprise level infrastructure that can power any LLM production use cases. Here are some use cases for BricksLLM: * Set LLM usage limits for users on different pricing tiers * Track LLM usage on a per user and per organization basis * Block or redact requests containing PIIs * Improve LLM reliability with failovers, retries and caching * Distribute API keys with rate limits and cost limits for internal development/production use cases * Distribute API keys with rate limits and cost limits for students

github

: 953

uAgents

uAgents is a Python library developed by Fetch.ai that allows for the creation of autonomous AI agents. These agents can perform various tasks on a schedule or take action on various events. uAgents are easy to create and manage, and they are connected to a fast-growing network of other uAgents. They are also secure, with cryptographically secured messages and wallets.

github

: 1.3k

griptape

github

: 2.2k