Awesome-LLM-Reasoning-Openai-o1-Survey

Awesome-LLM-Reasoning-Openai-o1-Survey

The related works and background techniques about Openai o1

Stars: 184

Visit
 screenshot

The repository 'Awesome LLM Reasoning Openai-o1 Survey' provides a collection of survey papers and related works on OpenAI o1, focusing on topics such as LLM reasoning, self-play reinforcement learning, complex logic reasoning, and scaling law. It includes papers from various institutions and researchers, showcasing advancements in reasoning bootstrapping, reasoning scaling law, self-play learning, step-wise and process-based optimization, and applications beyond math. The repository serves as a valuable resource for researchers interested in exploring the intersection of language models and reasoning techniques.

README:

Awesome LLM Reasoning Openai-o1 Survey

Awesome License Visitors Stars Forks

The related works and background techniques about OpenAI o1, including LLM reasoning, self-play reinforcement learning, complex logic reasoning, scaling law, etc.

Introduction

Survey Papers

  • A Survey on Self-play Methods in Reinforcement Learning [Paper] (2024)
    • Ruize Zhang, Zelai Xu, Chengdong Ma, Chao Yu, Wei-Wei Tu, Shiyu Huang, Deheng Ye, Wenbo Ding, Yaodong Yang, Yu Wang
    • Tencent, Tsinghua

Related Papers

Complex Logical Reasoning

  • Generative Language Modeling for Automated Theorem Proving [Paper] (2020)
    • Stanislas Polu, Ilya Sutskever
    • OpenAI
  • Hypothesis Search: Inductive Reasoning with Language Models [Paper] (ICLR 2024)
    • Ruocheng Wang, Eric Zelikman, Gabriel Poesia, Yewen Pu, Nick Haber, Noah D. Goodman
    • Stanford, Autodesk Research
  • Phenomenal Yet Puzzling: Testing Inductive Reasoning Capabilities of Language Models with Hypothesis Refinement [Paper] (ICLR 2024)
    • Linlu Qiu, Liwei Jiang, Ximing Lu, Melanie Sclar, Valentina Pyatkin, Chandra Bhagavatula, Bailin Wang, Yoon Kim, Yejin Choi, Nouha Dziri, Xiang Ren
    • MIT, Allen AI, UW, USC
  • Training Verifiers to Solve Math Word Problems [Paper] (2021)
    • Karl Cobbe, Vineet Kosaraju, Mohammad Bavarian, Mark Chen, Heewoo Jun, Lukasz Kaiser, Matthias Plappert, Jerry Tworek, Jacob Hilton, Reiichiro Nakano, Christopher Hesse, John Schulman
    • OpenAI
  • To CoT or not to CoT? Chain-of-thought Helps Mainly on Math and Symbolic Reasoning [Paper] (2024.9)
    • Zayne Sprague, Fangcong Yin, Juan Diego Rodriguez, Dongwei Jiang, Manya Wadhwa, Prasann Singhal, Xinyu Zhao, Xi Ye, Kyle Mahowald, Greg Durrett
    • The University of Texas at Austin, Johns Hopkins University, Princeton University

Reasoning Bootstrapping

  • STaR: Self-Taught Reasoner Bootstrapping Reasoning With Reasoning [Paper] [Github] (NeurIPS 2022)
    • Eric Zelikman, Yuhuai Wu, Jesse Mu, Noah D. Goodman
    • Stanford, Google
  • Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking [Paper] [Github] (2022)
    • Eric Zelikman, Georges Harik, Yijia Shao, Varuna Jayasiri, Nick Haber, Noah D. Goodman
    • Stanford, Notbad AI
  • Training Chain-of-thought via Latent-variable Inference [Paper] (NeurIPS 2023)
    • Du Phan, Matthew D. Hoffman, David Dohan, Sholto Douglas, Tuan Anh Le, Aaron Parisi, Pavel Sountsov, Charles Sutton, Sharad Vikram, Rif A. Saurous
    • Google
  • Chain-of-thought Reasoning without Prompting [Paper] (2024)
    • Xuezhi Wang, Denny Zhou
    • Google DeepMind
  • Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers [Paper] [Github] (2024)
    • Zhenting Qi, Mingyuan Ma, Jiahang Xu, Li Lyna Zhang, Fan Yang, Mao Yang
    • MSRA, Harvard University

Reasoning Scaling Law

  • Large Language Monkeys: Scaling Inference Compute with Repeated Sampling [Paper] (2024)
    • Bradley Brown, Jordan Juravsky, Ryan Ehrlich, Ronald Clark, Quoc V. Le, Christopher Ré, Azalia Mirhoseini
    • Stanford, Oxford, Google DeepMind
  • Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters [Paper] (2024)
    • Charlie Snell, Jaehoon Lee, Kelvin Xu, Aviral Kumar
    • UC Berkeley, Google DeepMind
  • An Empirical Analysis of Compute-Optimal Inference for Problem-Solving with Language Models [Paper] (2024)
    • Yangzhen Wu, Zhiqing Sun, Shanda Li, Sean Welleck, Yiming Yang
    • Tsinghua, CMU
  • Training Language Models to Self-Correct via Reinforcement Learning [Paper] (2024)
    • Evan Wang, Federico Cassano, Catherine Wu, Yunfeng Bai, Will Song, Vaskar Nath, Ziwen Han, Sean Hendryx, Summer Yue, Hugh Zhang
    • Google DeepMind
  • From Medprompt to o1: Exploration of Run-Time Strategies for Medical Challenge Problems and Beyond [[https://arxiv.org/abs/2411.03590]] (2024)
    • Harsha Nori, Naoto Usuyama, Nicholas King, Scott Mayer McKinney, Xavier Fernandes, Sheng Zhang, Eric Horvitz
    • Microsoft, OpenAI

Self-play Learning

  • Mastering Chess and Shogi by Self-play with a General Reinforcement Learning Algorithm [Paper] (2017)
    • David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew Lai, Arthur Guez,Marc Lanctot, Laurent Sifre, Dharshan Kumaran, Thore Graepel, Timothy Lillicrap, Karen Simonyan, Demis Hassabis
    • Google DeepMind
  • Language Models Can Teach Themselves to Program Better [Paper] [Github] (ICLR 2023)
    • Patrick Haluptzok, Matthew Bowers, Adam Tauman Kalai
    • Microsoft Research, MIT
  • Large Language Models Can Self-Improve [Paper]
    • Jiaxin Huang, Shixiang Shane Gu, Le Hou, Yuexin Wu, Xuezhi Wang, Hongkun Yu, Jiawei Han
    • University of Illinois at Urbana-Champaign, Google
  • Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models [Paper] [Github] (ICML 2024)
    • Zixiang Chen, Yihe Deng, Huizhuo Yuan, Kaixuan Ji, Quanquan Gu
    • UCLA
  • Self-Play Preference Optimization for Language Model Alignment [Paper] [Github] (2024)
    • Yue Wu, Zhiqing Sun, Huizhuo Yuan, Kaixuan Ji, Yiming Yang, Quanquan Gu
    • UCLA
  • Scalable Online Planning via Reinforcement Learning Fine-Tuning [Paper] (NeurIPS 2021)
    • Arnaud Fickinger, Hengyuan Hu, Brandon Amos, Stuart Russell, Noam Brown
  • Generative Verifiers: Reward Modeling as Next-Token Prediction [Paper] (2024)
    • Lunjun Zhang, Arian Hosseini, Hritik Bansal, Mehran Kazemi, Aviral Kumar, Rishabh Agarwal
    • Google DeepMind
  • Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMa-3 8B [Paper] (2024)
    • Di Zhang, Xiaoshui Huang, Dongzhan Zhou, Yuqiang Li, Wanli Ouyang
    • Fudan University, Shanghai AI Lab
  • Interpretable Contrastive Monte Carlo Tree Search Reasoning [Paper] (2024)
    • Zitian Gao, Boye Niu, Xuzheng He, Haotian Xu, Hongzhang Liu, Aiwei Liu, Xuming Hu, Lijie Wen
    • The University of Sydney, Peking University, Xiaohongshu, Shanghai AI Lab, Tsinghua, HKUST

Step-wise and Process-based Optimization

  • Solving Math Word Problems with Process-and Outcome-based Feedback [Paper] (2022)
    • Jonathan Uesato, Nate Kushman, Ramana Kumar, Francis Song, Noah Siegel, Lisa Wang, Antonia, Creswell, Geoffrey Irving, Irina Higgins
    • Google DeepMind
  • Thinking Fast and Slow With Deep Learning and Tree Search [Paper] (NeurIPS 2017)
    • Thomas Anthony, Zheng Tian, David Barber
    • University College Londo, Alen
  • Let’s Verify Step by Step [Paper] (2023)
    • Hunter Lightman, Vineet Kosaraju, Yura Burda, Harri Edwards, Bowen Baker, Teddy Lee, Jan Leike, John Schulman, Ilya Sutskever, Karl Cobbe
    • OpenAI
  • OVM, Outcome-supervised Value Models for Planning in Mathematical Reasoning [Paper] (Findings of NAACL 2024)
    • Fei Yu, Anningzhe Gao, Benyou Wang
    • The Chinese University of Hong Kong, Shenzhen (CUHKSZ) & Shenzhen Research Insitute of Big Data (SRIBD)
  • LLM Critics Help Catch LLM Bugs [Paper] (2024)
    • Nat McAleese, Rai Michael Pokorny, Juan Felipe Ceron Uribe, Evgenia Nitishinskaya, Maja Trebacz, Jan Leike
    • OpenAI
  • Self-critiquing Models for Assisting Human Evaluators [Paper] (2022)
    • William Saunders, Catherine Yeh, Jeff Wu, Steven Bills, Long Ouyang, Jonathan Ward, Jan Leike
    • OpenAI
  • Improve Mathematical Reasoning in Language Models by Automated Process Supervision [Paper] (2024)
    • Liangchen Luo, Yinxiao Liu, Rosanne Liu, Samrat Phatale, Harsh Lara, Yunxuan Li, Lei Shu, Yun Zhu, Lei Meng, Jiao Sun, Abhinav Rastogi
    • Google DeepMind
  • Q*: Improving Multi-step Reasoning for LLMs with Deliberative Planning [Paper] (2024)
    • Chaojie Wang, Yanchen Deng, Zhiyi Lyu, Liang Zeng, Jujie He, Shuicheng Yan, Bo An
    • Skywork AI, NTU
  • Math-shepherd: Verify and Reinforce LLMs step-by-step without Human Annotations [Paper] (ACL 2024)
    • Peiyi Wang, Lei Li, Zhihong Shao, Runxin Xu, Damai Dai, Yifei Li, Deli Chen, Yu Wu, Zhifang Sui
    • Peking University, DeepSeek AI, HKU, Tsinghua University, The Ohio State University

Social News

Applications beyond Math

  • HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs [Paper] (2024)

    • Junying Chen, Zhenyang Cai, Ke Ji, Xidong Wang, Wanlong Liu, Rongsheng Wang, Jianye Hou, Benyou Wang
    • The Chinese University of Hong Kong, Shenzhen (CUHKSZ)
  • o1-Coder: an o1 Replication for Coding [Paper] (2024)

    • Yuxiang Zhang, Shangxi Wu, Yuqi Yang, Jiangming Shu, Jinlin Xiao, Chao Kong, Jitao Sang
    • Beijing Jiaotong University

Open-source Projects

Communication Groups

Contributions

We welcome every researcher who contributes to this repository.

For Tasks:

Click tags to check more tools for each tasks

For Jobs:

Alternative AI tools for Awesome-LLM-Reasoning-Openai-o1-Survey

Similar Open Source Tools

For similar tasks

For similar jobs