AceCoder

AceCoder

The official repo for "AceCoder: Acing Coder RL via Automated Test-Case Synthesis"

Stars: 61

Visit
 screenshot

AceCoder is a tool that introduces a fully automated pipeline for synthesizing large-scale reliable tests used for reward model training and reinforcement learning in the coding scenario. It curates datasets, trains reward models, and performs RL training to improve coding abilities of language models. The tool aims to unlock the potential of RL training for code generation models and push the boundaries of LLM's coding abilities.

README:

🂡 AceCoder


Authors: Huaye Zeng, Dongfu Jiang, HaoZhe Wang, Ping Nie, Xiaotong Chen, Wenhu Chen  @ TIGER-Lab  

🔥News

Overview

./assets/images/ac_overview.png

Abstract
  • We introduce AceCoder, the first work to propose a fully automated pipeline for synthesizing large-scale reliable tests used for the reward model training and reinforcement learning in the coding scenario. To do this, we curated the dataset AceCode-87K, where we start from a seed code dataset and prompt powerful LLMs to "imagine" proper test cases for the coding question and filter the noisy ones.

  • We trained two reward model AceCodeRM-7B and AceCodeRM-32B on the constructed preference pairs. Best-of-N sampling results on HumanEval(+), MBPP(+), BigCodeBench, LiveCodeBench (V4) show consistent improvement.

  • We perform RL training from three policy models: Qwen2.5-7B-Instruct and Qwen2.5-Coder-7B-Base and Qwen2.5-Coder-7B-Instruct. Two types of reward can be used, i.e. the trained reward model RM-7B and the rule-based reward, i.e. binary pass rate over the test cases in dataset. Additionaly, we also experiment with RL from the base model like DeepSeek-R1. Results show that directly RL from the Base Qwen2.5-Coder model can get 25% improvement on HumanEval-plus and 6% on MBPP-plus within just 80 optimization steps.

  • To our knowledge, this is the first work to propose a fully automated pipeline for synthesizing large-scale reliable tests used for the reward model training and reinforcement learning in the coding scenario. We believe our \dataset{} will unlock the potential of RL training for code generation models and help the community to further push the boundaries of LLM's coding abilities.

📚Dataset

  • AceCode-87K: The first large-scale coding dataset with an average of 16 test cases per prompt, synthesized by GPT-4o-mini
  • AceCodePair-300K: Constructed preference pairs from AceCode-87K for training reward model.
  • AceCode-87K-hard: where you can create sample 25% of the hard examples following commands here

🤗Model

AceCodeRM (Reward Model)

  • AceCodeRM-7B: A reward model trained on AceCodePair-300K from Qwen2.5-Coder-7B-Instruct
  • AceCodeRM-32B: A reward model trained on AceCodePair-300K from Qwen2.5-Coder-32B-Instruct

AceCoder (RL Model)

Initial Policy Model Reward Type Training dataset Final RL Model
Qwen2.5-7B-Instruct AceCodeRM-7B AceCode-87K-hard (22k) TIGER-Lab/AceCoder-Qwen2.5-7B-Ins-RM
Qwen2.5-7B-Instruct Rule AceCode-87K-hard (22k) TIGER-Lab/AceCoder-Qwen2.5-7B-Ins-Rule
Qwen2.5-Coder-7B-Instruct AceCodeRM-7B AceCode-87K-hard (22k) TIGER-Lab/AceCoder-Qwen2.5-Coder-7B-Ins-RM
Qwen2.5-Coder-7B-Instruct Rule AceCode-87K-hard (22k) TIGER-Lab/AceCoder-Qwen2.5-Coder-7B-Ins-Rule
Qwen2.5-Coder-7B AceCodeRM-7B AceCode-87K-hard (22k) TIGER-Lab/AceCoder-Qwen2.5-Coder-7B-Base-RM
Qwen2.5-Coder-7B Rule AceCode-87K-hard (22k) TIGER-Lab/AceCoder-Qwen2.5-Coder-7B-Base-Rule

📈 Performance

See our website or paper for detailed performance report.

🚀Quick Start

git submodule init
git submodule update

Use AceCodrRM

First install acecoder as a package:

pip install https://github.com/TIGER-AI-Lab/AceCoder.git

Then see examples/run_acecoderm.py for how to use AceCoderRM. Quick command python examples/run_acecoderm.py will run the example.

Training Reward Model

See train/train_rm/README.md for detailed instructions.

Training RL Model

See train/train_rl/README.md for detailed instructions.

Evaluation

We use Evalplus, bigcodebench, LiveCodeBench for evaluation of HumanEval(+), MBPP(+), BigCodeBench, LiveCodeBench (V4) respectively.

Citation

If you find this work helpful, please consider citing:

@article{AceCoder,
    title={AceCoder: Acing Coder RL via Automated Test-Case Synthesis},
    author={Zeng, Huaye and Jiang, Dongfu and Wang, Haozhe and Nie, Ping and Chen, Xiaotong and Chen, Wenhu},
    journal={ArXiv},
    year={2025},
    volume={2502.01718}
}

For Tasks:

Click tags to check more tools for each tasks

For Jobs:

Alternative AI tools for AceCoder

Similar Open Source Tools

For similar tasks

For similar jobs