awesome-llm-planning-reasoning

awesome-llm-planning-reasoning

A curated collection of LLM reasoning and planning resources, including key papers, limitations, benchmarks, and additional learning materials.

Stars: 117

Visit
 screenshot

The 'Awesome LLMs Planning Reasoning' repository is a curated collection focusing on exploring the capabilities of Large Language Models (LLMs) in planning and reasoning tasks. It includes research papers, code repositories, and benchmarks that delve into innovative techniques, reasoning limitations, and standardized evaluations related to LLMs' performance in complex cognitive tasks. The repository serves as a comprehensive resource for researchers, developers, and enthusiasts interested in understanding the advancements and challenges in leveraging LLMs for planning and reasoning in real-world scenarios.

README:

Topics covered

About

Welcome to the Awesome LLMs Planning Reasoning repository! This collection is dedicated to exploring the rapidly evolving field of Large Language Models (LLMs) and their capabilities in planning and reasoning.

Overview

As LLMs continue to demonstrate remarkable success in Natural Language Understanding (NLU) and Natural Language Generation (NLG), researchers are increasingly interested in assessing their abilities beyond traditional NLP tasks. One of the most promising and challenging areas of study is understanding how well LLMs can perform tasks that require planning and reasoning. These capabilities are essential for leveraging LLMs in more complex, real-world scenarios, such as autonomous decision-making, problem-solving, and strategic thinking. However, recent research suggests that LLMs often struggle with reasoning tasks that are relatively simple for most humans, highlighting the limitations of these models in this critical area.

This repository is a curated list of research papers, code repositories, and benchmarks that focus on the intersection of LLMs with planning and reasoning tasks. Here, you'll find:

  • Techniques: Innovative methods that enable LLMs to reason and plan effectively, such as Chain-of-Thought prompting and Tree of Thoughts.
  • Reasoning Limitations: Critical investigations that explore the limitations and challenges LLMs face in planning and reasoning tasks.
  • Benchmarks: Standardized tests and evaluations designed to measure the performance of LLMs in these complex tasks.
  • Miscellaneous Papers: Papers related to the field of LLMs and reasoning, but not directly focused on planning tasks.
  • Additional Resources: Supplementary materials such as slides, dissertations, and other resources that provide further insights into LLM planning and reasoning.

Whether you're a researcher, developer, or enthusiast, this repository serves as a comprehensive resource for staying updated on the latest advancements and understanding the current challenges in the domain of LLMs' planning and reasoning abilities. Dive in and explore the fascinating world where language models meet high-level cognitive tasks!

Techniques

Paper Link Code Venue Date Other
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models arXiv -- NeurIPS 22 28 Jan 2022 Video
Self-Consistency Improves Chain of Thought Reasoning in Language Models arXiv -- ICLR 23 7 Mar 2023 Video
REACT: Synergizing Reasoning and Acting in Language Models arXiv GitHub ICLR 23 10 Mar 2023 Project Video
LLM-Planner: Few-Shot Grounded Planning for Embodied Agents with Large Language Models arXiv GitHub ICCV 23 30 Mar 2023 Project
Least-To-Most Prompting Enables Complex Reasoning In Large Language Models arXiv -- ICLR 23 16 Apr 2023
Chain-of-Symbol Prompting Elicits Planning in Large Language Models arXiv GitHub ICLR 24 17 May 2023
PlaSma: Procedural Knowledge Models for Language based Planning and Re-Planning arXiv GitHub ICLR 24 26 Jul 2023
Better Zero-Shot Reasoning with Role-Play Prompting arXiv GitHub NAACL 24 15 Aug 2023
LLM+P: Empowering Large Language Models with Optimal Planning Proficiency arXiv GitHub arXiv 27 Sep 2023
Reasoning with Language Model is Planning with World Model arXiv GitHub EMNLP 23 23 Oct 2023
Large Language Models as Commonsense Knowledge for Large-Scale Task Planning arXiv GitHub NeurIPS 23 30 Oct 2023 Project
PromptAgent: Strategic Planning with Language Models Enables Expert-level Prompt Optimization arXiv GitHub ICLR 24 7 Dec 2023
Tree of Thoughts: Deliberate Problem Solving with Large Language Models arXiv GitHub NeurIPS 23 3 Dec 2023 Video
Learning adaptive planning representations with natural language guidance arXiv -- arXiv 13 Dec 2023
The Truth is in There: Improving Reasoning in Language Models with Layer-Selective Rank Reduction arXiv GitHub ICLR 24 21 Dec 2023
Large Language Models can Learn Rules arXiv -- arXiv 24 Apr 2024
What’s the Plan? Evaluating and Developing Planning-Aware Techniques for Language Models arXiv -- arXiv 22 May 2024
Language Agent Tree Search Unifies Reasoning, Acting, and Planning in Language Models arXiv GitHub arxiv 6 Jun 2024
Large Language Models Can Learn Temporal Reasoning arXiv GitHub ACL 24 11 Jun 2024
Flow of Reasoning: Efficient Training of LLM Policy with Divergent Thinking arXiv GitHub arXiv 24 Jun 2024
Tree Search for Language Model Agents arXiv GitHub arXiv 1 Jul 2024 Project
Tree-Planner: Efficient Close-loop Task Planning with Large Language Models arXiv GitHub ICLR 24 24 Jul 2024 Project
RELIEF: Reinforcement Learning Empowered Graph Feature Prompt Tuning arXiv -- arXiv 6 Aug 2024
Automating Thought of Search: A Journey Towards Soundness and Completeness arXiv -- arXiv 21 Aug 2024

Reasoning Limitations

Paper Link Code Venue Date Other
Understanding the Capabilities of Large Language Models for Automated Planning arXiv -- arXiv 25 May 2023
Are Large Language Models Really Good Logical Reasoners? A Comprehensive Evaluation and Beyond arXiv GitHub arXiv 8 Aug 2023
Evaluating Cognitive Maps and Planning in Large Language Models with CogEval arXiv GitHub NeurIPS 23 2 Nov 2023
On the Planning Abilities of Large Language Models : A Critical Investigation arXiv GitHub NeurIPS 23 6 Nov 2023
Large Language Models Cannot Self-Correct Reasoning Yet arXiv -- ICLR 24 14 Mar 2024
Dissociating language and thought in large language models arXiv -- Trends in Cognitive Sciences 23 Mar 2024
Reasoning or Reciting? Exploring the Capabilities and Limitations of Language Models Through Counterfactual Tasks arXiv GitHub NAACL 24 28 Mar 2024
Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations? arXiv -- arXiv 13 May 2024 Video
On the Brittle Foundations of ReAct Prompting for Agentic Large Language Models arXiv -- arXiv 22 May 2024
Clever Hans or Neural Theory of Mind? Stress Testing Social Reasoning in Large Language Models arXiv GitHub EACL 24 24 May 2023
Can Graph Learning Improve Task Planning? arXiv GitHub arXiv 29 May 2024
Graph-enhanced Large Language Models in Asynchronous Plan Reasoning arXiv GitHub ICML 24 3 Jun 2024
When is Tree Search Useful for LLM Planning? It Depends on the Discriminator arXiv GitHub ACL 24 6 Jun 2024
Chain of Thoughtlessness? An Analysis of CoT in Planning arXiv -- arXiv 6 Jun 2024
Can LLMs Learn from Previous Mistakes? Investigating LLMs' Errors to Boost for Reasoning arXiv GitHub ACL 24 7 Jun 2024
Can Language Models Serve as Text-Based World Simulators? arXiv GitHub ACL 24 10 Jun 2024
LLMs Can’t Plan, But Can Help Planning in LLM-Modulo Frameworks arXiv -- ICML 24 12 Jun 2024 Video
Alice in Wonderland: Simple Tasks Showing Complete Reasoning Breakdown in State-Of-the-Art Large Language Models arXiv GitHub arXiv 13 Jul 2024
On the Self-Verification Limitations of Large Language Models on Reasoning and Planning Tasks arXiv -- arXiv 3 Aug 2024
Does Reasoning Emerge? Examining the Probabilities of Causation in Large Language Models arXiv -- arXiv 15 Aug 2024

Benchmarks

Paper Link Code Venue Date Other
Benchmarks for Automated Commonsense Reasoning: A Survey arXiv -- arXiv 22 Feb 2023
BioPlanner: Automatic Evaluation of LLMs on Protocol Planning in Biology arXiv GitHub EMNLP 24 16 Oct 2023
AgentBench: Evaluating LLMs as Agents arXiv GitHub ICLR 24 25 Oct 2023
PlanBench: An Extensible Benchmark for Evaluating Large Language Models on Planning and Reasoning about Change arXiv GitHub NeurIPS 23 Track on Datasets and Benchmarks 23 Nov 2023
Put Your Money Where Your Mouth Is: Evaluating Strategic Planning and Execution of LLM Agents in an Auction Arena arXiv GitHub arXiv 3 Apr 2024 Project
WebArena: A Realistic Web Environment for Building Autonomous Agents arXiv GitHub NeurIPS 23 Workshop 16 Apr 2024 Project
Test of Time: A Benchmark for Evaluating LLMs on Temporal Reasoning arXiv -- arXiv 3 Jun 2024 HuggingFace
Open Grounded Planning: Challenges and Benchmark Construction arXiv GitHub ACL 24 5 Jun 2024
NATURAL PLAN: Benchmarking LLMs on Natural Language Planning arXiv -- arXiv 6 Jun 2024
ResearchArena: Benchmarking LLMs’ Ability to Collect and Organize Information as Research Agents arXiv arXiv 13 Jun 2024
OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI arXiv GitHub arXiv 22 Jun 2024 Project
TravelPlanner: A Benchmark for Real-World Planning with Language Agents arXiv GitHub ICML 24 23 Jun 2024 Project
GraCoRe: Benchmarking Graph Comprehension and Complex Reasoning in Large Language Models arXiv GitHub arXiv 3 Jul 2024
AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agents arXiv GitHub ACL 24 26 Jul 2024 Project Video

Miscellaneous

Paper Link Code Venue Date Other
Lost in the Middle: How Language Models Use Long Contexts arXiv -- TACL 23 20 Nov 2023
The Impact of Large Language Models on Scientific Discovery: a Preliminary Study using GPT-4 arXiv -- arXiv 8 Dec 2023 Project
Math-Shepherd: Verify and Reinforce LLMs Step-by-step without Human Annotations arXiv -- ACL 24 19 Feb 2024 Project
Better & Faster Large Language Models via Multi-token Prediction arXiv -- arXiv 30 Apr 2024 Video HuggingFace
Learning Iterative Reasoning through Energy Diffusion arXiv GitHub ICML 24 17 Jun 2024 Project
Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from Disparate Training Data arXiv GitHub arXiv 20 Jun 2024
What's the Magic Word? A Control Theory of LLM Prompting arXiv -- arXiv 3 Jul 2024
AGENTGEN: Enhancing Planning Abilities for Large Language Model based Agent via Environment and Task Generation arXiv -- arXiv 1 Aug 2024
Generative Verifiers: Reward Modeling as Next-Token Prediction arXiv -- arXiv 27 Aug 2024

Additional Resources

Resource Link
Yochan Tutorials on Large Language Models and Planning link
On The Capabilities and Risks of Large Language Models link
Large Language Models for Reasoning link
LLM Reasoners: New Evaluation, Library, and Analysis of Step-by-Step Reasoning with Large Language Models link
Physics of Language Models link

Acknowledgement

If you want to say thank you or/and support active development of Awesome LLMs for Planning and Reasoning, add a GitHub Star to the project.

Together, we can make Awesome LLMs for Planning and Reasoning better!

Contributing

First off, thanks for taking the time to contribute! Contributions are what make the open-source community such an amazing place to learn, inspire, and create. Any contributions you make will benefit everybody else and are greatly appreciated.

Authors & contributors

The original setup of this repository is by Sambhav Khurana.

For a full list of all authors and contributors, see the contributors page.

For Tasks:

Click tags to check more tools for each tasks

For Jobs:

Alternative AI tools for awesome-llm-planning-reasoning

Similar Open Source Tools

For similar tasks

For similar jobs