Awesome-System2-Reasoning-LLM

Awesome-System2-Reasoning-LLM

None

Stars: 59

Visit
 screenshot

The Awesome-System2-Reasoning-LLM repository is dedicated to a survey paper titled 'From System 1 to System 2: A Survey of Reasoning Large Language Models'. It explores the development of reasoning Large Language Models (LLMs), their foundational technologies, benchmarks, and future directions. The repository provides resources and updates related to the research, tracking the latest developments in the field of reasoning LLMs.

README:

Awesome-System2-Reasoning-LLM

arXiv Maintenance Last Commit Contribution Welcome

📢 Updates

👀 Introduction

Welcome to the repository for our survey paper, "From System 1 to System 2: A Survey of Reasoning Large Language Models". This repository provides resources and updates related to our research. For a detailed introduction, please refer to our survey paper.

Achieving human-level intelligence requires enhancing the transition from System 1 (fast, intuitive) to System 2 (slow, deliberate) reasoning. While foundational Large Language Models (LLMs) have made significant strides, they still fall short of human-like reasoning in complex tasks. Recent reasoning LLMs, like OpenAI’s o1, have demonstrated expert-level performance in domains such as mathematics and coding, resembling System 2 thinking. This survey explores the development of reasoning LLMs, their foundational technologies, benchmarks, and future directions. We maintain an up-to-date GitHub repository to track the latest developments in this rapidly evolving field.

image

This image highlights the progression of AI systems, emphasizing the shift from rapid, intuitive approaches to deliberate, reasoning-driven models. It shows how AI has evolved to handle a broader range of real-world challenges.

image

This timeline tracks the development of reasoning LLMs, focusing on the evolution of datasets, foundational technologies, and the release of both commercial and open-source projects.

📒 Table of Contents

Part 1: O1 Replication

  • Open-Reasoner-Zero [Paper]
  • X-R1 [github]
  • Unlock-Deepseek [Blog]
  • Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning [Paper]
  • LLM-R1 [github]
  • mini-deepseek-r1 [Blog]
  • Run DeepSeek R1 Dynamic 1.58-bit [Blog]
  • Simple Reinforcement Learning for Reasoning [Notion]
  • TinyZero [github]
  • Open R1 [github]
  • Search-o1: Agentic Search-Enhanced Large Reasoning Models [Paper]
  • Virgo: A Preliminary Exploration on Reproducing o1-like MLLM [Paper]
  • Imitate, Explore, and Self-Improve: A Reproduction Report on Slow-thinking Reasoning Systems [Paper]
  • o1-Coder: an o1 Replication for Coding [Paper]
  • HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs [Paper]
  • DRT: Deep Reasoning Translation via Long Chain-of-Thought [Paper]
  • Enhancing LLM Reasoning with Reward-guided Tree Search [Paper]
  • Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions [Paper]
  • O1 Replication Journey--Part 2: Surpassing O1-preview through Simple Distillation, Big Progress or Bitter Lesson? [Paper]
  • O1 Replication Journey: A Strategic Progress Report -- Part 1 [Paper]

Part 2: Process Reward Models

  • PRMBench: A Fine-grained and Challenging Benchmark for Process-Level Reward Models. [Paper]
  • ReARTeR: Retrieval-Augmented Reasoning with Trustworthy Process Rewarding [Paper]
  • The Lessons of Developing Process Reward Models in Mathematical Reasoning. [Paper]
  • ToolComp: A Multi-Tool Reasoning & Process Supervision Benchmark. [Paper]
  • AutoPSV: Automated Process-Supervised Verifier [Paper]
  • ReST-MCTS*: LLM Self-Training via Process Reward Guided Tree Search [Paper]
  • Free Process Rewards without Process Labels. [Paper]
  • Outcome-Refining Process Supervision for Code Generation [Paper]
  • Math-Shepherd: Verify and Reinforce LLMs Step-by-step without Human Annotations [Paper]
  • OVM: Outcome-supervised Value Models for Planning in Mathematical Reasoning [Paper]
  • Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs [Paper]
  • Let's Verify Step by Step. [Paper]
  • Improve Mathematical Reasoning in Language Models by Automated Process Supervision [Paper]
  • Making Large Language Models Better Reasoners with Step-Aware Verifier [Paper]
  • Solving Math Word Problems with Process and Outcome-Based Feedback [Paper]
  • Uncertainty-Aware Step-wise Verification with Generative Reward Models [Paper]
  • AdaptiveStep: Automatically Dividing Reasoning Step through Model Confidence [Paper]
  • Self-Consistency of the Internal Reward Models Improves Self-Rewarding Language Models [Paper]
  • Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling [Paper]

Part 3: Reinforcement Learning

  • Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search [Paper]
  • DeepScaleR: Surpassing O1-Preview with a 1.5B Model by Scaling RL [Paper]
  • QLASS: Boosting Language Agent Inference via Q-Guided Stepwise Search [Paper]
  • Process Reinforcement through Implicit Rewards [Paper]
  • Advancing Language Model Reasoning through Reinforcement Learning and Inference Scaling [Paper]
  • Challenges in Ensuring AI Safety in DeepSeek-R1 Models: The Shortcomings of Reinforcement Learning Strategies [Paper]
  • DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning [Paper]
  • Kimi k1.5: Scaling Reinforcement Learning with LLMs [Paper]
  • Does RLHF Scale? Exploring the Impacts From Data, Model, and Method [Paper]
  • Offline Reinforcement Learning for LLM Multi-Step Reasoning [Paper]
  • ReFT: Representation Finetuning for Language Models [Paper]
  • Deepseekmath: Pushing the limits of mathematical reasoning in open language models [Paper]
  • Reasoning with Reinforced Functional Token Tuning [Paper]
  • Value-Based Deep RL Scales Predictably [Paper]
  • InfAlign: Inference-aware language model alignment [Paper]
  • LIMR: Less is More for RL Scaling [Paper]
  • A Survey on Feedback-based Multi-step Reasoning for Large Language Models on Mathematics [Paper]

Part 4: MCTS/Tree Search

  • On the Convergence Rate of MCTS for the Optimal Value Estimation in Markov Decision Processes [Paper]
  • Search-o1: Agentic Search-Enhanced Large Reasoning Models [Paper]
  • rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking [Paper]
  • ReST-MCTS*: LLM Self-Training via Process Reward Guided Tree Search [Paper]
  • Forest-of-Thought: Scaling Test-Time Compute for Enhancing LLM Reasoning [Paper]
  • HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs [Paper]
  • Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search [Paper]
  • Proposing and solving olympiad geometry with guided tree search [Paper]
  • SPaR: Self-Play with Tree-Search Refinement to Improve Instruction-Following in Large Language Models [Paper]
  • Towards Intrinsic Self-Correction Enhancement in Monte Carlo Tree Search Boosted Reasoning via Iterative Preference Learning [Paper]
  • CodeTree: Agent-guided Tree Search for Code Generation with Large Language Models [Paper]
  • GPT-Guided Monte Carlo Tree Search for Symbolic Regression in Financial Fraud Detection [Paper]
  • MC-NEST -- Enhancing Mathematical Reasoning in Large Language Models with a Monte Carlo Nash Equilibrium Self-Refine Tree [Paper]
  • Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions [Paper]
  • SRA-MCTS: Self-driven Reasoning Augmentation with Monte Carlo Tree Search for Code Generation [Paper]
  • Don’t throw away your value model! Generating more preferable text with Value-Guided Monte-Carlo Tree Search decoding [Paper]
  • AFlow: Automating Agentic Workflow Generation [Paper]
  • Interpretable Contrastive Monte Carlo Tree Search Reasoning [Paper]
  • LLaMA-Berry: Pairwise Optimization for O1-like Olympiad-Level Mathematical Reasoning [Paper]
  • Towards Self-Improvement of LLMs via MCTS: Leveraging Stepwise Knowledge with Curriculum Preference Learning [Paper]
  • TreeBoN: Enhancing Inference-Time Alignment with Speculative Tree-Search and Best-of-N Sampling [Paper]
  • Understanding When Tree of Thoughts Succeeds: Larger Models Excel in Generation, Not Discrimination [Paper]
  • RethinkMCTS: Refining Erroneous Thoughts in Monte Carlo Tree Search for Code Generation [Paper]
  • Strategist: Learning Strategic Skills by LLMs via Bi-Level Tree Search [Paper]
  • LiteSearch: Efficacious Tree Search for LLM [Paper]
  • Tree Search for Language Model Agents [Paper]
  • Uncertainty-Guided Optimization on Large Language Model Search Trees [Paper]
  • Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMa-3 8B [Paper]
  • Beyond A*: Better Planning with Transformers via Search Dynamics Bootstrapping [Paper]
  • LLM Reasoners: New Evaluation, Library, and Analysis of Step-by-Step Reasoning with Large Language Models [Paper]
  • AlphaMath Almost Zero: process Supervision without process [Paper]
  • Generating Code World Models with Large Language Models Guided by Monte Carlo Tree Search [Paper]
  • MindStar: Enhancing Math Reasoning in Pre-trained LLMs at Inference Time [Paper]
  • Monte Carlo Tree Search Boosts Reasoning via Iterative Preference Learning [Paper]
  • Monte Carlo Tree Search Boosts Reasoning via Iterative Preference Learning [Paper]
  • Stream of Search (SoS): Learning to Search in Language [Paper]
  • Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing [Paper]
  • Reasoning with Language Model is Planning with World Model [Paper]
  • Large Language Models as Commonsense Knowledge for Large-Scale Task Planning [Paper]
  • ALPHAZERO-LIKE TREE-SEARCH CAN GUIDE LARGE LANGUAGE MODEL DECODING AND TRAINING [Paper]
  • Alphazero-like Tree-Search can Guide Large Language Model Decoding and Training [Paper]
  • MAKING PPO EVEN BETTER: VALUE-GUIDED MONTE-CARLO TREE SEARCH DECODING [Paper]
  • Leveraging Constrained Monte Carlo Tree Search to Generate Reliable Long Chain-of-Thought for Mathematical Reasoning [Paper]
  • Hypothesis-Driven Theory-of-Mind Reasoning for Large Language Models [Paper]
  • Fine-grained Conversational Decoding via Isotropic and Proximal Search [Paper]
  • Control-DAG: Constrained Decoding for Non-Autoregressive Directed Acyclic T5 using Weighted Finite State Automata [Paper]
  • Look-back Decoding for Open-Ended Text Generation [Paper]

Part 5: Self-Training / Self-Improve

  • Small LLMs Can Master Reasoning with Self-Evolved Deep Thinking (Rstar-Math) [Paper]
  • ReST-MCTS*: LLM Self-Training via Process Reward Guided Tree Search [Paper]
  • Recursive Introspection: Teaching Language Model Agents How to Self-Improve [Paper]
  • B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoner [Paper]
  • ReST-EM: Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models [Paper]
  • ReFT: Representation Finetuning for Language Models [Paper]
  • Interactive Evolution: A Neural-Symbolic Self-Training Framework for Large Language Models [Paper]
  • CRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing [Paper]
  • Enhancing Large Vision Language Models with Self-Training on Image Comprehension [Paper]
  • Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking [Paper]
  • V-star: Training Verifiers for Self-Taught Reasoners [Paper]
  • Self-Refine: Iterative Refinement with Self-Feedback [Paper]
  • ReST: Reinforced Self-Training for Language Modeling [Paper]
  • STaR: Bootstrapping Reasoning With Reasoning [Paper]
  • Expert Iteration: Thinking Fast and Slow with Deep Learning and Tree Search [Paper]
  • Self-Improvement in Language Models: The Sharpening Mechanism [Paper]
  • Enabling Scalable Oversight via Self-Evolving Critic [Paper]
  • S2R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning [Paper]
  • ProgCo: Program Helps Self-Correction of Large Language Models [Paper]
  • Self-Refine: Iterative Refinement with Self-Feedback [Paper]
  • SelfCheck: Using LLMs to Zero-Shot Check Their Own Step-by-Step Reasoning [Paper]
  • CRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing [Paper]
  • Large Language Models are Better Reasoners with Self-Verification [Paper]
  • Self-Evaluation Guided Beam Search for Reasoning [Paper]
  • Learning From Correctness Without Prompting Makes LLM Efficient Reasoner [Paper]

Part 6: Reflection

  • rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking [Paper]
  • RedStar: Does Scaling Long-CoT Data Unlock Better Slow-Reasoning Systems? [Paper]
  • HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs [Paper]
  • Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search [Paper]
  • AtomThink: A Slow Thinking Framework for Multimodal Mathematical Reasoning [Paper]
  • Beyond Examples: High-level Automated Reasoning Paradigm in In-Context Learning via MCTS [Paper]
  • Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions [Paper]
  • LLaVA-o1: Let Vision Language Models Reason Step-by-Step [Paper]
  • Vision-Language Models Can Self-Improve Reasoning via Reflection [Paper]
  • LLaMA-Berry: Pairwise Optimization for O1-like Olympiad-Level Mathematical Reasoning [Paper]
  • Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers [Paper]
  • Refiner: Restructure Retrieved Content Efficiently to Advance Question-Answering Capabilities [Paper]
  • Reflection-Tuning: An Approach for Data Recycling [Paper]
  • Learning From Mistakes Makes LLM Better Reasoner [Paper]
  • SelfCheck: Using LLMs to Zero-Shot Check Their Own Step-by-Step Reasoning [Paper]

Part 7: Efficient System2

  • O1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reasoning Pruning [Paper]
  • Think More, Hallucinate Less: Mitigating Hallucinations via Dual Process of Fast and Slow Thinking [Paper]
  • DynaThink: Fast or Slow? A Dynamic Decision-Making Framework for Large Language Models [Paper]
  • B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoner [Paper]
  • Token-Budget-Aware LLM Reasoning [Paper]
  • Training Large Language Models to Reason in a Continuous Latent Space [Paper]
  • Guiding Language Model Reasoning with Planning Tokens [Paper]
  • One Example Shown, Many Concepts Known! Counterexample-Driven Conceptual Reasoning in Mathematical LLMs [Paper]
  • Small Models Struggle to Learn from Strong Reasoners [Paper]
  • TokenSkip: Controllable Chain-of-Thought Compression in LLMs [Paper]
  • SoftCoT: Soft Chain-of-Thought for Efficient Reasoning with LLMs [Paper]
  • Dynamic Chain-of-Thought: Towards Adaptive Deep Reasoning [Paper]
  • Thinking Preference Optimization [Paper]
  • Revisiting the Test-Time Scaling of o1-like Models: Do they Truly Possess Test-Time Scaling Capabilities? [Paper]
  • Flow-of-Options: Diversified and Improved LLM Reasoning by Thinking Through Options [Paper]
  • CodeI/O: Condensing Reasoning Patterns via Code Input-Output Prediction [Paper]
  • OctoTools: An Agentic Framework with Extensible Tools for Complex Reasoning [Paper]
  • LogiDynamics: Unraveling the Dynamics of Logical Inference in Large Language Model Reasoning [Paper]
  • Atom of Thoughts for Markov LLM Test-Time Scaling [Paper]
  • Efficient Long-Decoding Inference with Reasoning-Aware Attention Sparsity [Paper]
  • Integrating Arithmetic Learning Improves Mathematical Reasoning in Smaller Models [Paper]
  • Stepwise Perplexity-Guided Refinement for Efficient Chain-of-Thought Reasoning in Large Language Models [Paper]
  • Titans: Learning to Memorize at Test Time [Paper]
  • MoBA: Mixture of Block Attention for Long-Context LLMs [Paper]
  • AutoReason: Automatic Few-Shot Reasoning Decomposition [Paper]
  • Enhancing Auto-regressive Chain-of-Thought through Loop-Aligned Reasoning [Paper]

Part 8: Explainability

  • Agents Thinking Fast and Slow: A Talker-Reasoner Architecture [Paper]
  • What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective [Paper]
  • When a Language Model is Optimized for Reasoning, Does It Still Show Embers of Autoregression? An Analysis of OpenAI o1 [Paper]
  • The Impact of Reasoning Step Length on Large Language Models [Paper]
  • Distilling System 2 into System 1 [Paper]
  • System 2 Attention (is something you might need too) [Paper]
  • SafeChain: Safety of Language Models with Long Chain-of-Thought Reasoning Capabilities [Paper]
  • ThinkGuard: Deliberative Slow Thinking Leads to Cautious Guardrails [Paper]
  • SafeChain: Safety of Language Models with Long Chain-of-Thought Reasoning Capabilities [Paper]
  • H-CoT: Hijacking the Chain-of-Thought Safety Reasoning Mechanism to Jailbreak Large Reasoning Models, Including OpenAI o1/o3, DeepSeek-R1, and Gemini 2.0 Flash Thinking [Paper]
  • BoT: Breaking Long Thought Processes of o1-like Large Language Models through Backdoor Attack [Paper]

Part 9: Multimodal Agent related Slow-Fast System

  • Diving into Self-Evolving Training for Multimodal Reasoning [Paper]
  • Visual Agents as Fast and Slow Thinkers [Paper]
  • Virgo: A Preliminary Exploration on Reproducing o1-like MLLM [Paper]
  • Scaling Inference-Time Search With Vision Value Model for Improved Visual Comprehension [Paper]
  • Slow Perception: Let's Perceive Geometric Figures Step-by-Step [Paper]
  • AtomThink: A Slow Thinking Framework for Multimodal Mathematical Reasoning [Paper]
  • LLaVA-o1: Let Vision Language Models Reason Step-by-Step [Paper]
  • Vision-Language Models Can Self-Improve Reasoning via Reflection [Paper]
  • I Think, Therefore I Diffuse: Enabling Multimodal In-Context Reasoning in Diffusion Models [Paper]
  • RAG-Gym: Optimizing Reasoning and Search Agents with Process Supervision [Paper]

Part 10: Benchmark and Datasets

  • PRMBench: A Fine-grained and Challenging Benchmark for Process-Level Reward Models [Paper]
  • MR-Ben: A Meta-Reasoning Benchmark for Evaluating System-2 Thinking in LLMs [Paper]
  • Do NOT Think That Much for 2+3=? On the Overthinking of o1-like LLMs [Paper]
  • A Preliminary Study of o1 in Medicine: Are We Closer to an AI Doctor? [Paper]
  • EquiBench: Benchmarking Code Reasoning Capabilities of Large Language Models via Equivalence Checking [Paper]
  • SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines [Paper]
  • Multimodal RewardBench: Holistic Evaluation of Reward Models for Vision Language Models [Paper]
  • FrontierMath: A Benchmark for Evaluating Advanced Mathematical Reasoning in AI [Paper]
  • Evaluation of OpenAI o1: Opportunities and Challenges of AGI [Paper]
  • MATH-Perturb: Benchmarking LLMs' Math Reasoning Abilities against Hard Perturbations [Paper]
  • LongReason: A Synthetic Long-Context Reasoning Benchmark via Context Expansion [Paper]
  • Humanity's Last Exam [Paper]

⭐ Star History

Star History Chart

For Tasks:

Click tags to check more tools for each tasks

For Jobs:

Alternative AI tools for Awesome-System2-Reasoning-LLM

Similar Open Source Tools

For similar tasks

For similar jobs