RLinf

RLinf

RLinf is a flexible and scalable open-source infrastructure designed for post-training foundation models (LLMs, VLMs, VLAs) via reinforcement learning.

Stars: 340

Visit
 screenshot

RLinf is a flexible and scalable open-source infrastructure designed for post-training foundation models via reinforcement learning. It provides a robust backbone for next-generation training, supporting open-ended learning, continuous generalization, and limitless possibilities in intelligence development. The tool offers unique features like Macro-to-Micro Flow, flexible execution modes, auto-scheduling strategy, embodied agent support, and fast adaptation for mainstream VLA models. RLinf is fast with hybrid mode and automatic online scaling strategy, achieving significant throughput improvement and efficiency. It is also flexible and easy to use with multiple backend integrations, adaptive communication, and built-in support for popular RL methods. The roadmap includes system-level enhancements and application-level extensions to support various training scenarios and models. Users can get started with complete documentation, quickstart guides, key design principles, example gallery, advanced features, and guidelines for extending the framework. Contributions are welcome, and users are encouraged to cite the GitHub repository and acknowledge the broader open-source community.

README:

RLinf-logo
Hugging Face Ask DeepWiki

RLinf: Reinforcement Learning Infrastructure for Agentic AI

RLinf is a flexible and scalable open-source infrastructure designed for post-training foundation models via reinforcement learning. The 'inf' in RLinf stands for Infrastructure, highlighting its role as a robust backbone for next-generation training. It also stands for Infinite, symbolizing the system’s support for open-ended learning, continuous generalization, and limitless possibilities in intelligence development.

RLinf-overview

What's NEW!

Key Features

RLinf is unique with:

  • Macro-to-Micro Flow: a new paradigm M2Flow, which executes macro-level logical flows through micro-level execution flows, decoupling logical workflow construction (programmable) from physical communication and scheduling (efficiency).

  • Flexible Execution Modes

    • Collocated mode: shares all GPUs across all workers.
    • Disaggregated mode: enables fine-grained pipelining.
    • Hybrid mode: a customizable combination of different placement modes, integrating both collocated and disaggregated modes.
  • Auto-scheduling Strategy: automatically selects the most suitable execution mode based on the training workload, without the need for manual resource allocation.

  • Embodied Agent Support

    • Fast adaptation support for mainstream VLA models: OpenVLA, OpenVLA-OFT, and π₀.
    • Support for mainstream CPU & GPU-based simulators via standardized RL interfaces: ManiSkill3, LIBERO.
    • Enabling the first RL fine-tuning of the $\pi_0$ model family with a flow-matching action expert.

RLinf is fast with:

  • Hybrid mode with fine-grained pipelining: achieves a 120%+ throughput improvement compared to other frameworks.
  • Automatic Online Scaling Strategy: dynamically scales training resources, with GPU switching completed within seconds, further improving efficiency by 20–40% while preserving the on-policy nature of RL algorithms.

RLinf is flexible and easy to use with:

  • Multiple Backend Integrations

    • FSDP + Hugging Face: rapid adaptation to new models and algorithms, ideal for beginners and fast prototyping.
    • Megatron + SGLang: optimized for large-scale training, delivering maximum efficiency for expert users with demanding workloads.
  • Adaptive communication via the asynchronous communication channel

  • Built-in support for popular RL methods, including PPO, GRPO, DAPO, Reinforce++, and more.

Roadmap

1. System-Level Enhancements

  • [ ] Support for heterogeneous GPUs
  • [ ] Support for asynchronous pipeline execution
  • [ ] Support for Mixture of Experts (MoE)
  • [ ] Support for vLLM inference backend

2. Application-Level Extensions

  • [ ] Support for Vision-Language Models (VLMs) training
  • [ ] Support for deep searcher agent training
  • [ ] Support for multi-agent training
  • [ ] Support for integration with more embodied simulators (e.g., Meta-World, GENESIS)
  • [ ] Support for more Vision Language Action models (VLAs), such as GR00T
  • [ ] Support for world model
  • [ ] Support for real-world RL embodied intelligence

Getting Started

Complete documentation for RLinf can be found Here.

Quickstart

Key Design

Example Gallery

Advanced Features

Extending The Framework:

Blogs

Build Status

Type Status
Reasoning RL-MATH Build Status
Embodied RL-VLA Build Status

Contribution Guidelines

We welcome contributions to RLinf. Please read contribution guide before taking action.

Citation and Acknowledgement

If you find RLinf helpful, please cite the GitHub repository:

@misc{RLinf_repo,
  title        = {RLinf: Reinforcement Learning Infrastructure for Agentic AI},
  howpublished = {\url{https://github.com/RLinf/RLinf}},
  note         = {GitHub repository},
  year         = {2025}
}

Paper: A full paper describing RLinf will be released by September 20, 2025. We will update this section with the official citation and BibTeX when they become available.

Acknowledgements RLinf has been inspired by, and benefits from, the ideas and tooling of the broader open-source community. In particular, we would like to thank the teams and contributors behind VeRL, AReaL, Megatron-LM, SGLang, and PyTorch Fully Sharded Data Parallel (FSDP), and if we have inadvertently missed your project or contribution, please open an issue or a pull request so we can properly credit you.

Contact: We welcome applications from Postdocs, PhD/Master's students, and interns. Join us in shaping the future of RL infrastructure and embodied AI!

For Tasks:

Click tags to check more tools for each tasks

For Jobs:

Alternative AI tools for RLinf

Similar Open Source Tools

For similar tasks

For similar jobs