
openrl
Unified Reinforcement Learning Framework
Stars: 577

OpenRL is an open-source general reinforcement learning research framework that supports training for various tasks such as single-agent, multi-agent, offline RL, self-play, and natural language. Developed based on PyTorch, the goal of OpenRL is to provide a simple-to-use, flexible, efficient and sustainable platform for the reinforcement learning research community. It supports a universal interface for all tasks/environments, single-agent and multi-agent tasks, offline RL training with expert dataset, self-play training, reinforcement learning training for natural language tasks, DeepSpeed, Arena for evaluation, importing models and datasets from Hugging Face, user-defined environments, models, and datasets, gymnasium environments, callbacks, visualization tools, unit testing, and code coverage testing. It also supports various algorithms like PPO, DQN, SAC, and environments like Gymnasium, MuJoCo, Atari, and more.
README:
OpenRL-v0.2.1 is updated on Dec 20, 2023
The main branch is the latest version of OpenRL, which is under active development. If you just want to have a try with OpenRL, you can switch to the stable branch.
Documentation | 中文介绍 | 中文文档
OpenRL is an open-source general reinforcement learning research framework that supports training for various tasks such as single-agent, multi-agent, offline RL, self-play, and natural language. Developed based on PyTorch, the goal of OpenRL is to provide a simple-to-use, flexible, efficient and sustainable platform for the reinforcement learning research community.
Currently, the features supported by OpenRL include:
-
A simple-to-use universal interface that supports training for all tasks/environments
-
Support for both single-agent and multi-agent tasks
-
Support for offline RL training with expert dataset
-
Support self-play training
-
Reinforcement learning training support for natural language tasks (such as dialogue)
-
Support DeepSpeed
-
Support Arena , which allows convenient evaluation of various agents (even submissions for JiDi) in a competitive environment.
-
Importing models and datasets from Hugging Face. Supports loading Stable-baselines3 models from Hugging Face for testing and training.
-
Tutorial on how to integrate user-defined environments into OpenRL.
-
Support for models such as LSTM, GRU, Transformer etc.
-
Multiple training acceleration methods including automatic mixed precision training and data collecting wth half precision policy network
-
User-defined training models, reward models, training data and environment support
-
Support for gymnasium environments
-
Support for Callbacks, which can be used to implement various functions such as logging, saving, and early stopping
-
Dictionary observation space support
-
Popular visualization tools such as wandb, tensorboardX are supported
-
Serial or parallel environment training while ensuring consistent results in both modes
-
Chinese and English documentation
-
Provides unit testing and code coverage testing
-
Compliant with Black Code Style guidelines and type checking
Algorithms currently supported by OpenRL (for more details, please refer to Gallery):
- Proximal Policy Optimization (PPO)
- Dual-clip PPO
- Multi-agent PPO (MAPPO)
- Joint-ratio Policy Optimization (JRPO)
- Generative Adversarial Imitation Learning (GAIL)
- Behavior Cloning (BC)
- Advantage Actor-Critic (A2C)
- Self-Play
- Deep Q-Network (DQN)
- Multi-Agent Transformer (MAT)
- Value-Decomposition Network (VDN)
- Soft Actor Critic (SAC)
- Deep Deterministic Policy Gradient (DDPG)
Environments currently supported by OpenRL (for more details, please refer to Gallery):
- Gymnasium
- MuJoCo
- PettingZoo
- MPE
- Chat Bot
- Atari
- StarCraft II
- SMACv2
- Omniverse Isaac Gym
- DeepMind Control
- Snake
- gym-pybullet-drones
- EnvPool
- GridWorld
- Super Mario Bros
- Gym Retro
- Crafter
This framework has undergone multiple iterations by the OpenRL-Lab team which has applied it in academic research. It has now become a mature reinforcement learning framework.
OpenRL-Lab will continue to maintain and update OpenRL, and we welcome everyone to join our open-source community to contribute towards the development of reinforcement learning.
For more information about OpenRL, please refer to the documentation.
- Welcome to OpenRL
- Outline
- Why OpenRL?
- Installation
- Use Docker
- Quick Start
- Gallery
- Projects Using OpenRL
- Feedback and Contribution
- Maintainers
- Supporters
- Citing OpenRL
- License
- Acknowledgments
Here we provide a table for the comparison of OpenRL and existing popular RL libraries. OpenRL employs a modular design and high-level abstraction, allowing users to accomplish training for various tasks through a unified and user-friendly interface.
Library | NLP/RLHF | Multi-agent | Self-Play Training | Offline RL | DeepSpeed |
---|---|---|---|---|---|
OpenRL | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ |
Stable Baselines3 | ❌ | ❌ | ❌ | ❌ | ❌ |
Ray/RLlib | ❌ | ✔️ | ✔️ | ✔️ | ❌ |
DI-engine | ❌ | ✔️ | not fullly supported | ✔️ | ❌ |
Tianshou | ❌ | not fullly supported | not fullly supported | ✔️ | ❌ |
MARLlib | ❌ | ✔️ | not fullly supported | ❌ | ❌ |
MAPPO Benchmark | ❌ | ✔️ | ❌ | ❌ | ❌ |
RL4LMs | ✔️ | ❌ | ❌ | ❌ | ❌ |
trlx | ✔️ | ❌ | ❌ | ❌ | ✔️ |
trl | ✔️ | ❌ | ❌ | ❌ | ✔️ |
TimeChamber | ❌ | ❌ | ✔️ | ❌ | ❌ |
Users can directly install OpenRL via pip:
pip install openrl
If users are using Anaconda or Miniconda, they can also install OpenRL via conda:
conda install -c openrl openrl
Users who want to modify the source code can also install OpenRL from the source code:
git clone https://github.com/OpenRL-Lab/openrl.git && cd openrl
pip install -e .
After installation, users can check the version of OpenRL through command line:
openrl --version
Tips: No installation required, try OpenRL online through
Colab:
OpenRL currently provides Docker images with and without GPU support. If the user's computer does not have an NVIDIA GPU, they can obtain an image without the GPU plugin using the following command:
sudo docker pull openrllab/openrl-cpu
If the user wants to accelerate training with a GPU, they can obtain it using the following command:
sudo docker pull openrllab/openrl
After successfully pulling the image, users can run OpenRL's Docker image using the following commands:
# Without GPU acceleration
sudo docker run -it openrllab/openrl-cpu
# With GPU acceleration
sudo docker run -it --gpus all --net host openrllab/openrl
Once inside the Docker container, users can check OpenRL's version and then run test cases using these commands:
# Check OpenRL version in Docker container
openrl --version
# Run test case
openrl --mode train --env CartPole-v1
OpenRL provides a simple and easy-to-use interface for beginners in reinforcement learning.
Below is an example of using the PPO algorithm to train the CartPole
environment:
# train_ppo.py
from openrl.envs.common import make
from openrl.modules.common import PPONet as Net
from openrl.runners.common import PPOAgent as Agent
env = make("CartPole-v1", env_num=9) # Create an environment and set the environment parallelism to 9.
net = Net(env) # Create neural network.
agent = Agent(net) # Initialize the agent.
agent.train(
total_time_steps=20000) # Start training and set the total number of steps to 20,000 for the running environment.
Training an agent using OpenRL only requires four simple steps: Create Environment => Initialize Model => Initialize Agent => Start Training!
For a well-trained agent, users can also easily test the agent:
# train_ppo.py
from openrl.envs.common import make
from openrl.modules.common import PPONet as Net
from openrl.runners.common import PPOAgent as Agent
agent = Agent(Net(make("CartPole-v1", env_num=9))) # Initialize trainer.
agent.train(total_time_steps=20000)
# Create an environment for test, set the parallelism of the environment to 9, and set the rendering mode to group_human.
env = make("CartPole-v1", env_num=9, render_mode="group_human")
agent.set_env(env) # The agent requires an interactive environment.
obs, info = env.reset() # Initialize the environment to obtain initial observations and environmental information.
while True:
action, _ = agent.act(obs) # The agent predicts the next action based on environmental observations.
# The environment takes one step according to the action, obtains the next observation, reward, whether it ends and environmental information.
obs, r, done, info = env.step(action)
if any(done): break
env.close() # Close test environment
Executing the above code on a regular laptop only takes a few seconds to complete the training. Below shows the visualization of the agent:
Tips: Users can also quickly train the CartPole
environment by executing a command line in the terminal.
openrl --mode train --env CartPole-v1
For training tasks such as multi-agent and natural language processing, OpenRL also provides a similarly simple and easy-to-use interface.
For information on how to perform multi-agent training, set hyperparameters for training, load training configurations, use wandb, save GIF animations, etc., please refer to:
For information on natural language task training, loading models/datasets on Hugging Face, customizing training models/reward models, etc., please refer to:
For more information about OpenRL, please refer to the documentation.
In order to facilitate users' familiarity with the framework, we provide more examples and demos of using OpenRL in Gallery. Users are also welcome to contribute their own training examples and demos to the Gallery.
We have listed research projects that use OpenRL in the OpenRL Project. If you are using OpenRL in your research project, you are also welcome to join this list.
- If you have any questions or find bugs, you can check or ask in the Issues.
- Join the QQ group: OpenRL Official Communication Group
- Join the slack group to discuss OpenRL usage and development with us.
- Join the Discord group to discuss OpenRL usage and development with us.
- Send an E-mail to: [email protected]
- Join the GitHub Discussion.
The OpenRL framework is still under continuous development and documentation. We welcome you to join us in making this project better:
- How to contribute code: Read the Contributors' Guide
- OpenRL Roadmap
At present, OpenRL is maintained by the following maintainers:
- Shiyu Huang(@huangshiyu13)
- Wenze Chen(@WentseChen)
- Yiwen Sun(@YiwenAI)
Welcome more contributors to join our maintenance team (send an E-mail to [email protected] to apply for joining the OpenRL team).
If our work has been helpful to you, please feel free to cite us:
@article{huang2023openrl,
title={OpenRL: A Unified Reinforcement Learning Framework},
author={Huang, Shiyu and Chen, Wentse and Sun, Yiwen and Bie, Fuqing and Tu, Wei-Wei},
journal={arXiv preprint arXiv:2312.16189},
year={2023}
}
OpenRL under the Apache 2.0 license.
The development of the OpenRL framework has drawn on the strengths of other reinforcement learning frameworks:
- Stable-baselines3: https://github.com/DLR-RM/stable-baselines3
- pytorch-a2c-ppo-acktr-gail: https://github.com/ikostrikov/pytorch-a2c-ppo-acktr-gail
- MAPPO: https://github.com/marlbenchmark/on-policy
- Gymnasium: https://github.com/Farama-Foundation/Gymnasium
- DI-engine: https://github.com/opendilab/DI-engine/
- Tianshou: https://github.com/thu-ml/tianshou
- RL4LMs: https://github.com/allenai/RL4LMs
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for openrl
Similar Open Source Tools

openrl
OpenRL is an open-source general reinforcement learning research framework that supports training for various tasks such as single-agent, multi-agent, offline RL, self-play, and natural language. Developed based on PyTorch, the goal of OpenRL is to provide a simple-to-use, flexible, efficient and sustainable platform for the reinforcement learning research community. It supports a universal interface for all tasks/environments, single-agent and multi-agent tasks, offline RL training with expert dataset, self-play training, reinforcement learning training for natural language tasks, DeepSpeed, Arena for evaluation, importing models and datasets from Hugging Face, user-defined environments, models, and datasets, gymnasium environments, callbacks, visualization tools, unit testing, and code coverage testing. It also supports various algorithms like PPO, DQN, SAC, and environments like Gymnasium, MuJoCo, Atari, and more.

AIOS
AIOS, a Large Language Model (LLM) Agent operating system, embeds large language model into Operating Systems (OS) as the brain of the OS, enabling an operating system "with soul" -- an important step towards AGI. AIOS is designed to optimize resource allocation, facilitate context switch across agents, enable concurrent execution of agents, provide tool service for agents, maintain access control for agents, and provide a rich set of toolkits for LLM Agent developers.

FlagEmbedding
FlagEmbedding focuses on retrieval-augmented LLMs, consisting of the following projects currently: * **Long-Context LLM** : Activation Beacon * **Fine-tuning of LM** : LM-Cocktail * **Embedding Model** : Visualized-BGE, BGE-M3, LLM Embedder, BGE Embedding * **Reranker Model** : llm rerankers, BGE Reranker * **Benchmark** : C-MTEB

inference
Xorbits Inference (Xinference) is a powerful and versatile library designed to serve language, speech recognition, and multimodal models. With Xorbits Inference, you can effortlessly deploy and serve your or state-of-the-art built-in models using just a single command. Whether you are a researcher, developer, or data scientist, Xorbits Inference empowers you to unleash the full potential of cutting-edge AI models.

opik
Comet Opik is a repository containing two main services: a frontend and a backend. It provides a Python SDK for easy installation. Users can run the full application locally with minikube, following specific installation prerequisites. The repository structure includes directories for applications like Opik backend, with detailed instructions available in the README files. Users can manage the installation using simple k8s commands and interact with the application via URLs for checking the running application and API documentation. The repository aims to facilitate local development and testing of Opik using Kubernetes technology.

biochatter
Generative AI models have shown tremendous usefulness in increasing accessibility and automation of a wide range of tasks. This repository contains the `biochatter` Python package, a generic backend library for the connection of biomedical applications to conversational AI. It aims to provide a common framework for deploying, testing, and evaluating diverse models and auxiliary technologies in the biomedical domain. BioChatter is part of the BioCypher ecosystem, connecting natively to BioCypher knowledge graphs.

openlit
OpenLIT is an OpenTelemetry-native GenAI and LLM Application Observability tool. It's designed to make the integration process of observability into GenAI projects as easy as pie – literally, with just **a single line of code**. Whether you're working with popular LLM Libraries such as OpenAI and HuggingFace or leveraging vector databases like ChromaDB, OpenLIT ensures your applications are monitored seamlessly, providing critical insights to improve performance and reliability.

Consistency_LLM
Consistency Large Language Models (CLLMs) is a family of efficient parallel decoders that reduce inference latency by efficiently decoding multiple tokens in parallel. The models are trained to perform efficient Jacobi decoding, mapping any randomly initialized token sequence to the same result as auto-regressive decoding in as few steps as possible. CLLMs have shown significant improvements in generation speed on various tasks, achieving up to 3.4 times faster generation. The tool provides a seamless integration with other techniques for efficient Large Language Model (LLM) inference, without the need for draft models or architectural modifications.

NSMusicS
NSMusicS is a local music software that is expected to support multiple platforms with AI capabilities and multimodal features. The goal of NSMusicS is to integrate various functions (such as artificial intelligence, streaming, music library management, cross platform, etc.), which can be understood as similar to Navidrome but with more features than Navidrome. It wants to become a plugin integrated application that can almost have all music functions.

LitServe
LitServe is a high-throughput serving engine designed for deploying AI models at scale. It generates an API endpoint for models, handles batching, streaming, and autoscaling across CPU/GPUs. LitServe is built for enterprise scale with a focus on minimal, hackable code-base without bloat. It supports various model types like LLMs, vision, time-series, and works with frameworks like PyTorch, JAX, Tensorflow, and more. The tool allows users to focus on model performance rather than serving boilerplate, providing full control and flexibility.

pixeltable
Pixeltable is a Python library designed for ML Engineers and Data Scientists to focus on exploration, modeling, and app development without the need to handle data plumbing. It provides a declarative interface for working with text, images, embeddings, and video, enabling users to store, transform, index, and iterate on data within a single table interface. Pixeltable is persistent, acting as a database unlike in-memory Python libraries such as Pandas. It offers features like data storage and versioning, combined data and model lineage, indexing, orchestration of multimodal workloads, incremental updates, and automatic production-ready code generation. The tool emphasizes transparency, reproducibility, cost-saving through incremental data changes, and seamless integration with existing Python code and libraries.

Qwen
Qwen is a series of large language models developed by Alibaba DAMO Academy. It outperforms the baseline models of similar model sizes on a series of benchmark datasets, e.g., MMLU, C-Eval, GSM8K, MATH, HumanEval, MBPP, BBH, etc., which evaluate the models’ capabilities on natural language understanding, mathematic problem solving, coding, etc. Qwen models outperform the baseline models of similar model sizes on a series of benchmark datasets, e.g., MMLU, C-Eval, GSM8K, MATH, HumanEval, MBPP, BBH, etc., which evaluate the models’ capabilities on natural language understanding, mathematic problem solving, coding, etc. Qwen-72B achieves better performance than LLaMA2-70B on all tasks and outperforms GPT-3.5 on 7 out of 10 tasks.

pytorch-grad-cam
This repository provides advanced AI explainability for PyTorch, offering state-of-the-art methods for Explainable AI in computer vision. It includes a comprehensive collection of Pixel Attribution methods for various tasks like Classification, Object Detection, Semantic Segmentation, and more. The package supports high performance with full batch image support and includes metrics for evaluating and tuning explanations. Users can visualize and interpret model predictions, making it suitable for both production and model development scenarios.

pipeline
Pipeline is a Python library designed for constructing computational flows for AI/ML models. It supports both development and production environments, offering capabilities for inference, training, and finetuning. The library serves as an interface to Mystic, enabling the execution of pipelines at scale and on enterprise GPUs. Users can also utilize this SDK with Pipeline Core on a private hosted cluster. The syntax for defining AI/ML pipelines is reminiscent of sessions in Tensorflow v1 and Flows in Prefect.

SimAI
SimAI is the industry's first full-stack, high-precision simulator for AI large-scale training. It provides detailed modeling and simulation of the entire LLM training process, encompassing framework, collective communication, network layers, and more. This comprehensive approach offers end-to-end performance data, enabling researchers to analyze training process details, evaluate time consumption of AI tasks under specific conditions, and assess performance gains from various algorithmic optimizations.

superduperdb
SuperDuperDB is a Python framework for integrating AI models, APIs, and vector search engines directly with your existing databases, including hosting of your own models, streaming inference and scalable model training/fine-tuning. Build, deploy and manage any AI application without the need for complex pipelines, infrastructure as well as specialized vector databases, and moving our data there, by integrating AI at your data's source: - Generative AI, LLMs, RAG, vector search - Standard machine learning use-cases (classification, segmentation, regression, forecasting recommendation etc.) - Custom AI use-cases involving specialized models - Even the most complex applications/workflows in which different models work together SuperDuperDB is **not** a database. Think `db = superduper(db)`: SuperDuperDB transforms your databases into an intelligent platform that allows you to leverage the full AI and Python ecosystem. A single development and deployment environment for all your AI applications in one place, fully scalable and easy to manage.
For similar tasks

openrl
OpenRL is an open-source general reinforcement learning research framework that supports training for various tasks such as single-agent, multi-agent, offline RL, self-play, and natural language. Developed based on PyTorch, the goal of OpenRL is to provide a simple-to-use, flexible, efficient and sustainable platform for the reinforcement learning research community. It supports a universal interface for all tasks/environments, single-agent and multi-agent tasks, offline RL training with expert dataset, self-play training, reinforcement learning training for natural language tasks, DeepSpeed, Arena for evaluation, importing models and datasets from Hugging Face, user-defined environments, models, and datasets, gymnasium environments, callbacks, visualization tools, unit testing, and code coverage testing. It also supports various algorithms like PPO, DQN, SAC, and environments like Gymnasium, MuJoCo, Atari, and more.

AgentGym
AgentGym is a framework designed to help the AI community evaluate and develop generally-capable Large Language Model-based agents. It features diverse interactive environments and tasks with real-time feedback and concurrency. The platform supports 14 environments across various domains like web navigating, text games, house-holding tasks, digital games, and more. AgentGym includes a trajectory set (AgentTraj) and a benchmark suite (AgentEval) to facilitate agent exploration and evaluation. The framework allows for agent self-evolution beyond existing data, showcasing comparable results to state-of-the-art models.

synthora
Synthora is a lightweight and extensible framework for LLM-driven Agents and ALM research. It aims to simplify the process of building, testing, and evaluating agents by providing essential components. The framework allows for easy agent assembly with a single config, reducing the effort required for tuning and sharing agents. Although in early development stages with unstable APIs, Synthora welcomes feedback and contributions to enhance its stability and functionality.

ai-on-gke
This repository contains assets related to AI/ML workloads on Google Kubernetes Engine (GKE). Run optimized AI/ML workloads with Google Kubernetes Engine (GKE) platform orchestration capabilities. A robust AI/ML platform considers the following layers: Infrastructure orchestration that support GPUs and TPUs for training and serving workloads at scale Flexible integration with distributed computing and data processing frameworks Support for multiple teams on the same infrastructure to maximize utilization of resources

ray
Ray is a unified framework for scaling AI and Python applications. It consists of a core distributed runtime and a set of AI libraries for simplifying ML compute, including Data, Train, Tune, RLlib, and Serve. Ray runs on any machine, cluster, cloud provider, and Kubernetes, and features a growing ecosystem of community integrations. With Ray, you can seamlessly scale the same code from a laptop to a cluster, making it easy to meet the compute-intensive demands of modern ML workloads.

labelbox-python
Labelbox is a data-centric AI platform for enterprises to develop, optimize, and use AI to solve problems and power new products and services. Enterprises use Labelbox to curate data, generate high-quality human feedback data for computer vision and LLMs, evaluate model performance, and automate tasks by combining AI and human-centric workflows. The academic & research community uses Labelbox for cutting-edge AI research.

djl
Deep Java Library (DJL) is an open-source, high-level, engine-agnostic Java framework for deep learning. It is designed to be easy to get started with and simple to use for Java developers. DJL provides a native Java development experience and allows users to integrate machine learning and deep learning models with their Java applications. The framework is deep learning engine agnostic, enabling users to switch engines at any point for optimal performance. DJL's ergonomic API interface guides users with best practices to accomplish deep learning tasks, such as running inference and training neural networks.

mojo
Mojo is a new programming language that bridges the gap between research and production by combining Python syntax and ecosystem with systems programming and metaprogramming features. Mojo is still young, but it is designed to become a superset of Python over time.
For similar jobs

weave
Weave is a toolkit for developing Generative AI applications, built by Weights & Biases. With Weave, you can log and debug language model inputs, outputs, and traces; build rigorous, apples-to-apples evaluations for language model use cases; and organize all the information generated across the LLM workflow, from experimentation to evaluations to production. Weave aims to bring rigor, best-practices, and composability to the inherently experimental process of developing Generative AI software, without introducing cognitive overhead.

LLMStack
LLMStack is a no-code platform for building generative AI agents, workflows, and chatbots. It allows users to connect their own data, internal tools, and GPT-powered models without any coding experience. LLMStack can be deployed to the cloud or on-premise and can be accessed via HTTP API or triggered from Slack or Discord.

VisionCraft
The VisionCraft API is a free API for using over 100 different AI models. From images to sound.

kaito
Kaito is an operator that automates the AI/ML inference model deployment in a Kubernetes cluster. It manages large model files using container images, avoids tuning deployment parameters to fit GPU hardware by providing preset configurations, auto-provisions GPU nodes based on model requirements, and hosts large model images in the public Microsoft Container Registry (MCR) if the license allows. Using Kaito, the workflow of onboarding large AI inference models in Kubernetes is largely simplified.

PyRIT
PyRIT is an open access automation framework designed to empower security professionals and ML engineers to red team foundation models and their applications. It automates AI Red Teaming tasks to allow operators to focus on more complicated and time-consuming tasks and can also identify security harms such as misuse (e.g., malware generation, jailbreaking), and privacy harms (e.g., identity theft). The goal is to allow researchers to have a baseline of how well their model and entire inference pipeline is doing against different harm categories and to be able to compare that baseline to future iterations of their model. This allows them to have empirical data on how well their model is doing today, and detect any degradation of performance based on future improvements.

tabby
Tabby is a self-hosted AI coding assistant, offering an open-source and on-premises alternative to GitHub Copilot. It boasts several key features: * Self-contained, with no need for a DBMS or cloud service. * OpenAPI interface, easy to integrate with existing infrastructure (e.g Cloud IDE). * Supports consumer-grade GPUs.

spear
SPEAR (Simulator for Photorealistic Embodied AI Research) is a powerful tool for training embodied agents. It features 300 unique virtual indoor environments with 2,566 unique rooms and 17,234 unique objects that can be manipulated individually. Each environment is designed by a professional artist and features detailed geometry, photorealistic materials, and a unique floor plan and object layout. SPEAR is implemented as Unreal Engine assets and provides an OpenAI Gym interface for interacting with the environments via Python.

Magick
Magick is a groundbreaking visual AIDE (Artificial Intelligence Development Environment) for no-code data pipelines and multimodal agents. Magick can connect to other services and comes with nodes and templates well-suited for intelligent agents, chatbots, complex reasoning systems and realistic characters.