FoR

FoR

Flow of Reasoning: Training LLMs for Divergent Problem Solving with Minimal Examples

Stars: 74

Visit
 screenshot

FoR is the official code repository for the 'Flow of Reasoning: Training LLMs for Divergent Problem Solving with Minimal Examples' project. It formulates multi-step reasoning tasks as a flow, involving designing reward functions, collecting trajectories, and training LLM policies with trajectory balance loss. The code provides tools for training and inference in a reproducible experiment environment using conda. Users can choose from 5 tasks to run, each with detailed instructions in the respective branches.

README:

Flow of Reasoning: Training LLMs for Divergent Problem Solving with Minimal Examples

Official code for "Flow of Reasoning:Training LLMs for Divergent Problem Solving with Minimal Examples" Also check our [Project Page]

plot

Training & Inference

plot

Our FoR formulates multi-step reasoning tasks as flow:

  1. Design reward $R(s_n)$ of terminal states for different tasks.
  2. Collect trajectories with the local search technique.
  3. Training LLM policy $P_{F}$ with trajectory balance loss.

Code

1) Download this GitHub

git clone https://github.com/Yu-Fangxu/FoR.git

2) Prepare the environment

We recommend conda for setting up a reproducible experiment environment. We include environment.yaml for creating a working environment:

bash install.sh

3) Choose 1 of 5 tasks to run

cd BlocksWorld|Game24|prontoqa|1D-ARC|Rubik's_Cube|GSM8K

Check more detailed instructions in each branch.

Citation

@article{yu2024flow,
  title={Flow of Reasoning: Efficient Training of LLM Policy with Divergent Thinking},
  author={Yu, Fangxu and Jiang, Lai and Kang, Haoqiang and Hao, Shibo and Qin, Lianhui},
  journal={arXiv preprint arXiv:2406.05673},
  year={2024}
}

For Tasks:

Click tags to check more tools for each tasks

For Jobs:

Alternative AI tools for FoR

Similar Open Source Tools

For similar tasks

For similar jobs