ReaLHF

ReaLHF

Super-Efficient RLHF Training of LLMs with Parameter Reallocation

Stars: 86

Visit
 screenshot

ReaLHF is a distributed system designed for efficient RLHF training with Large Language Models (LLMs). It introduces a novel approach called parameter reallocation to dynamically redistribute LLM parameters across the cluster, optimizing allocations and parallelism for each computation workload. ReaL minimizes redundant communication while maximizing GPU utilization, achieving significantly higher Proximal Policy Optimization (PPO) training throughput compared to other systems. It supports large-scale training with various parallelism strategies and enables memory-efficient training with parameter and optimizer offloading. The system seamlessly integrates with HuggingFace checkpoints and inference frameworks, allowing for easy launching of local or distributed experiments. ReaLHF offers flexibility through versatile configuration customization and supports various RLHF algorithms, including DPO, PPO, RAFT, and more, while allowing the addition of custom algorithms for high efficiency.

README:

ReaL

| Documentation | Paper |

ReaL: Efficient RLHF Training for LLMs
with Parameter Reallocation

ReaL (short for ReaLlocation) is a distributed system designed for efficient RLHF training with LLMs. This is the library used to run experiments for the ICML 2024 Oral Paper Is DPO Superior to PPO for LLM Alignment? A Comprehensive Study.

ReaL introduces a novel approach called parameter reallocation, which dynamically redistributes LLM parameters across the cluster and adapts parallelization strategies during training. By optimizing allocations and parallelism for each computation workload, ReaL achieves significantly higher PPO training throughput compared to state-of-the-art open-source systems.

(In the following figure, as the number of GPUs increases, the model size scales up from LLaMA 7B, LLaMA 13B, and CodeLLaMA 34B, to the largest LLaMA 70B.)

Throughput Comparison

News πŸ“’

  • [2024/09/05] Releasing ReaL v0.3.0 - MoE RLHF, CUDAGraph generation, mini-batched execution, and more customized algorithms.

Features

  • Large-scale and high-throughput SFT/reward modeling/DPO/PPO/generation.
  • MoE model training and generation.
  • PPO tricks, e.g. GAE, advantage/value normalization, and reference EMA.
  • State-of-the-art RLHF algorithms, e.g., GRPO.

Highlights

πŸš€ Efficiency

  • Achieves state-of-the-art training throughput for RLHF using parameter reallocation.
  • Supports high-throughput generation with CUDAGraph and large-scale training with 3D parallelism.
  • Enables memory-efficient training with parameter and optimizer offloading.

✨ Ease of Use

  • Seamlessly integrates with HuggingFace checkpoints and inference frameworks like vLLM. No checkpoint conversion required.
  • Allows launching local or distributed experiments via Ray or SLURM with a single command.

Check out our tutorial to reproduce the full RLHF procedure (SFT/RW/PPO) with 4Γ—LLaMA-7B in just 30 minutes.

🎯 Flexibility

  • Offers versatile configuration customization with Hydra structured config.
  • Supports many commonly used RLHF algorithms, including DPO, PPO, RAFT, and more.
  • Allows the addition of custom algorithms (e.g, ReMax, GRPO, Reference Model EMA or external reward signal) while maintaining high efficiency with ReaL's infrastructure.

Refer to our customization guide for hands-on examples.

Getting Started

We provide pre-built Docker images and PyPI packages. To use the latest version of our code, please install from the source (see detailed installation instructions here):

git clone https://github.com/openpsi-project/ReaLHF
cd ReaLHF
pip install -r requirements.txt
export MAX_JOBS=8

# GPU dependencies, not required on the launcher node.
pip install git+https://github.com/NVIDIA/[email protected] --no-deps --no-build-isolation
pip install flash_attn==2.4.2 --no-build-isolation 
pip3 install git+https://github.com/tgale96/[email protected] --no-build-isolation --no-deps  # For MoE

REAL_CUDA=1 pip install -e . --no-build-isolation

For detailed information, please visit our documentation site.

Acknowledgement

We would like to thank the authors of our paper and the following individuals for their contributions: Shusheng Xu and Jiaxuan Gao from Tsinghua University, and Weilin Liu, Wenjie Ye, and Chuyi He from OpenPsi Inc, for thoroughly testing and using ReaL in their research, and for providing valuable suggestions that greatly improved the system.

We also extend our gratitude to following open-source LLM projects for providing references for our implementation:

  • Megatron-LM for TP/EP modules and the distributed optimizer

  • DeepSpeed for ZeRO and ZeRO-offload

  • vLLM for custom all-reduce and CUDA graph

Citation

If you find our system useful for your research or production, please cite our papers.

@article{mei2024realhf,
  title={ReaLHF: Optimized RLHF Training for Large Language Models through Parameter Reallocation},
  author={Mei, Zhiyu and Fu, Wei and Li, Kaiwei and Wang, Guangju and Zhang, Huanchen and Wu, Yi},
  journal={arXiv preprint arXiv:2406.14088},
  year={2024}
}
@article{xu2024dpo,
  title={Is dpo superior to ppo for llm alignment? a comprehensive study},
  author={Xu, Shusheng and Fu, Wei and Gao, Jiaxuan and Ye, Wenjie and Liu, Weilin and Mei, Zhiyu and Wang, Guangju and Yu, Chao and Wu, Yi},
  journal={arXiv preprint arXiv:2404.10719},
  year={2024}
}

For Tasks:

Click tags to check more tools for each tasks

For Jobs:

Alternative AI tools for ReaLHF

Similar Open Source Tools

For similar tasks

For similar jobs