awesome-llm-planning-reasoning

A curated collection of LLM reasoning and planning resources, including key papers, limitations, benchmarks, and additional learning materials.

Stars: 117

Visit

The 'Awesome LLMs Planning Reasoning' repository is a curated collection focusing on exploring the capabilities of Large Language Models (LLMs) in planning and reasoning tasks. It includes research papers, code repositories, and benchmarks that delve into innovative techniques, reasoning limitations, and standardized evaluations related to LLMs' performance in complex cognitive tasks. The repository serves as a comprehensive resource for researchers, developers, and enthusiasts interested in understanding the advancements and challenges in leveraging LLMs for planning and reasoning in real-world scenarios.

README:

Topics covered

About
Overview
Techniques
Reasoning Limitations
Benchmarks
Miscellaneous
Additional Resources
Acknowledgement
Contributing
Authors & contributors

About

Welcome to the Awesome LLMs Planning Reasoning repository! This collection is dedicated to exploring the rapidly evolving field of Large Language Models (LLMs) and their capabilities in planning and reasoning.

Overview

As LLMs continue to demonstrate remarkable success in Natural Language Understanding (NLU) and Natural Language Generation (NLG), researchers are increasingly interested in assessing their abilities beyond traditional NLP tasks. One of the most promising and challenging areas of study is understanding how well LLMs can perform tasks that require planning and reasoning. These capabilities are essential for leveraging LLMs in more complex, real-world scenarios, such as autonomous decision-making, problem-solving, and strategic thinking. However, recent research suggests that LLMs often struggle with reasoning tasks that are relatively simple for most humans, highlighting the limitations of these models in this critical area.

This repository is a curated list of research papers, code repositories, and benchmarks that focus on the intersection of LLMs with planning and reasoning tasks. Here, you'll find:

Techniques: Innovative methods that enable LLMs to reason and plan effectively, such as Chain-of-Thought prompting and Tree of Thoughts.
Reasoning Limitations: Critical investigations that explore the limitations and challenges LLMs face in planning and reasoning tasks.
Benchmarks: Standardized tests and evaluations designed to measure the performance of LLMs in these complex tasks.
Miscellaneous Papers: Papers related to the field of LLMs and reasoning, but not directly focused on planning tasks.
Additional Resources: Supplementary materials such as slides, dissertations, and other resources that provide further insights into LLM planning and reasoning.

Whether you're a researcher, developer, or enthusiast, this repository serves as a comprehensive resource for staying updated on the latest advancements and understanding the current challenges in the domain of LLMs' planning and reasoning abilities. Dive in and explore the fascinating world where language models meet high-level cognitive tasks!

Techniques

Paper	Link	Code	Venue	Date	Other
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models	arXiv	--	NeurIPS 22	28 Jan 2022	Video
Self-Consistency Improves Chain of Thought Reasoning in Language Models	arXiv	--	ICLR 23	7 Mar 2023	Video
REACT: Synergizing Reasoning and Acting in Language Models	arXiv	GitHub	ICLR 23	10 Mar 2023	Project Video
LLM-Planner: Few-Shot Grounded Planning for Embodied Agents with Large Language Models	arXiv	GitHub	ICCV 23	30 Mar 2023	Project
Least-To-Most Prompting Enables Complex Reasoning In Large Language Models	arXiv	--	ICLR 23	16 Apr 2023
Chain-of-Symbol Prompting Elicits Planning in Large Language Models	arXiv	GitHub	ICLR 24	17 May 2023
PlaSma: Procedural Knowledge Models for Language based Planning and Re-Planning	arXiv	GitHub	ICLR 24	26 Jul 2023
Better Zero-Shot Reasoning with Role-Play Prompting	arXiv	GitHub	NAACL 24	15 Aug 2023
LLM+P: Empowering Large Language Models with Optimal Planning Proficiency	arXiv	GitHub	arXiv	27 Sep 2023
Reasoning with Language Model is Planning with World Model	arXiv	GitHub	EMNLP 23	23 Oct 2023
Large Language Models as Commonsense Knowledge for Large-Scale Task Planning	arXiv	GitHub	NeurIPS 23	30 Oct 2023	Project
PromptAgent: Strategic Planning with Language Models Enables Expert-level Prompt Optimization	arXiv	GitHub	ICLR 24	7 Dec 2023
Tree of Thoughts: Deliberate Problem Solving with Large Language Models	arXiv	GitHub	NeurIPS 23	3 Dec 2023	Video
Learning adaptive planning representations with natural language guidance	arXiv	--	arXiv	13 Dec 2023
The Truth is in There: Improving Reasoning in Language Models with Layer-Selective Rank Reduction	arXiv	GitHub	ICLR 24	21 Dec 2023
Large Language Models can Learn Rules	arXiv	--	arXiv	24 Apr 2024
What’s the Plan? Evaluating and Developing Planning-Aware Techniques for Language Models	arXiv	--	arXiv	22 May 2024
Language Agent Tree Search Unifies Reasoning, Acting, and Planning in Language Models	arXiv	GitHub	arxiv	6 Jun 2024
Large Language Models Can Learn Temporal Reasoning	arXiv	GitHub	ACL 24	11 Jun 2024
Flow of Reasoning: Efficient Training of LLM Policy with Divergent Thinking	arXiv	GitHub	arXiv	24 Jun 2024
Tree Search for Language Model Agents	arXiv	GitHub	arXiv	1 Jul 2024	Project
Tree-Planner: Efficient Close-loop Task Planning with Large Language Models	arXiv	GitHub	ICLR 24	24 Jul 2024	Project
RELIEF: Reinforcement Learning Empowered Graph Feature Prompt Tuning	arXiv	--	arXiv	6 Aug 2024
Automating Thought of Search: A Journey Towards Soundness and Completeness	arXiv	--	arXiv	21 Aug 2024

Reasoning Limitations

Paper	Link	Code	Venue	Date	Other
Understanding the Capabilities of Large Language Models for Automated Planning	arXiv	--	arXiv	25 May 2023
Are Large Language Models Really Good Logical Reasoners? A Comprehensive Evaluation and Beyond	arXiv	GitHub	arXiv	8 Aug 2023
Evaluating Cognitive Maps and Planning in Large Language Models with CogEval	arXiv	GitHub	NeurIPS 23	2 Nov 2023
On the Planning Abilities of Large Language Models : A Critical Investigation	arXiv	GitHub	NeurIPS 23	6 Nov 2023
Large Language Models Cannot Self-Correct Reasoning Yet	arXiv	--	ICLR 24	14 Mar 2024
Dissociating language and thought in large language models	arXiv	--	Trends in Cognitive Sciences	23 Mar 2024
Reasoning or Reciting? Exploring the Capabilities and Limitations of Language Models Through Counterfactual Tasks	arXiv	GitHub	NAACL 24	28 Mar 2024
Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations?	arXiv	--	arXiv	13 May 2024	Video
On the Brittle Foundations of ReAct Prompting for Agentic Large Language Models	arXiv	--	arXiv	22 May 2024
Clever Hans or Neural Theory of Mind? Stress Testing Social Reasoning in Large Language Models	arXiv	GitHub	EACL 24	24 May 2023
Can Graph Learning Improve Task Planning?	arXiv	GitHub	arXiv	29 May 2024
Graph-enhanced Large Language Models in Asynchronous Plan Reasoning	arXiv	GitHub	ICML 24	3 Jun 2024
When is Tree Search Useful for LLM Planning? It Depends on the Discriminator	arXiv	GitHub	ACL 24	6 Jun 2024
Chain of Thoughtlessness? An Analysis of CoT in Planning	arXiv	--	arXiv	6 Jun 2024
Can LLMs Learn from Previous Mistakes? Investigating LLMs' Errors to Boost for Reasoning	arXiv	GitHub	ACL 24	7 Jun 2024
Can Language Models Serve as Text-Based World Simulators?	arXiv	GitHub	ACL 24	10 Jun 2024
LLMs Can’t Plan, But Can Help Planning in LLM-Modulo Frameworks	arXiv	--	ICML 24	12 Jun 2024	Video
Alice in Wonderland: Simple Tasks Showing Complete Reasoning Breakdown in State-Of-the-Art Large Language Models	arXiv	GitHub	arXiv	13 Jul 2024
On the Self-Verification Limitations of Large Language Models on Reasoning and Planning Tasks	arXiv	--	arXiv	3 Aug 2024
Does Reasoning Emerge? Examining the Probabilities of Causation in Large Language Models	arXiv	--	arXiv	15 Aug 2024

Benchmarks

Paper	Link	Code	Venue	Date	Other
Benchmarks for Automated Commonsense Reasoning: A Survey	arXiv	--	arXiv	22 Feb 2023
BioPlanner: Automatic Evaluation of LLMs on Protocol Planning in Biology	arXiv	GitHub	EMNLP 24	16 Oct 2023
AgentBench: Evaluating LLMs as Agents	arXiv	GitHub	ICLR 24	25 Oct 2023
PlanBench: An Extensible Benchmark for Evaluating Large Language Models on Planning and Reasoning about Change	arXiv	GitHub	NeurIPS 23 Track on Datasets and Benchmarks	23 Nov 2023
Put Your Money Where Your Mouth Is: Evaluating Strategic Planning and Execution of LLM Agents in an Auction Arena	arXiv	GitHub	arXiv	3 Apr 2024	Project
WebArena: A Realistic Web Environment for Building Autonomous Agents	arXiv	GitHub	NeurIPS 23 Workshop	16 Apr 2024	Project
Test of Time: A Benchmark for Evaluating LLMs on Temporal Reasoning	arXiv	--	arXiv	3 Jun 2024	HuggingFace
Open Grounded Planning: Challenges and Benchmark Construction	arXiv	GitHub	ACL 24	5 Jun 2024
NATURAL PLAN: Benchmarking LLMs on Natural Language Planning	arXiv	--	arXiv	6 Jun 2024
ResearchArena: Benchmarking LLMs’ Ability to Collect and Organize Information as Research Agents	arXiv		arXiv	13 Jun 2024
OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI	arXiv	GitHub	arXiv	22 Jun 2024	Project
TravelPlanner: A Benchmark for Real-World Planning with Language Agents	arXiv	GitHub	ICML 24	23 Jun 2024	Project
GraCoRe: Benchmarking Graph Comprehension and Complex Reasoning in Large Language Models	arXiv	GitHub	arXiv	3 Jul 2024
AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agents	arXiv	GitHub	ACL 24	26 Jul 2024	Project Video

Miscellaneous

Paper	Link	Code	Venue	Date	Other
Lost in the Middle: How Language Models Use Long Contexts	arXiv	--	TACL 23	20 Nov 2023
The Impact of Large Language Models on Scientific Discovery: a Preliminary Study using GPT-4	arXiv	--	arXiv	8 Dec 2023	Project
Math-Shepherd: Verify and Reinforce LLMs Step-by-step without Human Annotations	arXiv	--	ACL 24	19 Feb 2024	Project
Better & Faster Large Language Models via Multi-token Prediction	arXiv	--	arXiv	30 Apr 2024	Video HuggingFace
Learning Iterative Reasoning through Energy Diffusion	arXiv	GitHub	ICML 24	17 Jun 2024	Project
Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from Disparate Training Data	arXiv	GitHub	arXiv	20 Jun 2024
What's the Magic Word? A Control Theory of LLM Prompting	arXiv	--	arXiv	3 Jul 2024
AGENTGEN: Enhancing Planning Abilities for Large Language Model based Agent via Environment and Task Generation	arXiv	--	arXiv	1 Aug 2024
Generative Verifiers: Reward Modeling as Next-Token Prediction	arXiv	--	arXiv	27 Aug 2024

Additional Resources

Resource	Link
Yochan Tutorials on Large Language Models and Planning	link
On The Capabilities and Risks of Large Language Models	link
Large Language Models for Reasoning	link
LLM Reasoners: New Evaluation, Library, and Analysis of Step-by-Step Reasoning with Large Language Models	link
Physics of Language Models	link

Acknowledgement

If you want to say thank you or/and support active development of Awesome LLMs for Planning and Reasoning, add a GitHub Star to the project.

Together, we can make Awesome LLMs for Planning and Reasoning better!

Contributing

First off, thanks for taking the time to contribute! Contributions are what make the open-source community such an amazing place to learn, inspire, and create. Any contributions you make will benefit everybody else and are greatly appreciated.

Authors & contributors

The original setup of this repository is by Sambhav Khurana.

For a full list of all authors and contributors, see the contributors page.

For Tasks:

Click tags to check more tools for each tasks

analyze data generate insights solve problems make decisions optimize strategies

For Jobs:

researcher developer data scientist machine learning engineer ai researcher

Alternative AI tools for awesome-llm-planning-reasoning

Similar Open Source Tools

awesome-llm-planning-reasoning

github

: 117

Github-Ranking-AI

This repository provides a list of the most starred and forked repositories on GitHub. It is updated automatically and includes information such as the project name, number of stars, number of forks, language, number of open issues, description, and last commit date. The repository is divided into two sections: LLM and chatGPT. The LLM section includes repositories related to large language models, while the chatGPT section includes repositories related to the chatGPT chatbot.

github

: 227

Awesome-Resource-Efficient-LLM-Papers

A curated list of high-quality papers on resource-efficient Large Language Models (LLMs) with a focus on various aspects such as architecture design, pre-training, fine-tuning, inference, system design, and evaluation metrics. The repository covers topics like efficient transformer architectures, non-transformer architectures, memory efficiency, data efficiency, model compression, dynamic acceleration, deployment optimization, support infrastructure, and other related systems. It also provides detailed information on computation metrics, memory metrics, energy metrics, financial cost metrics, network communication metrics, and other metrics relevant to resource-efficient LLMs. The repository includes benchmarks for evaluating the efficiency of NLP models and references for further reading.

github

: 105

Awesome-Agent-Papers

github

: 98

RAGHub

RAGHub is a community-driven project focused on cataloging new and emerging frameworks, projects, and resources in the Retrieval-Augmented Generation (RAG) ecosystem. It aims to help users stay ahead of changes in the field by providing a platform for the latest innovations in RAG. The repository includes information on RAG frameworks, evaluation frameworks, optimization frameworks, citation frameworks, engines, search reranker frameworks, projects, resources, and real-world use cases across industries and professions.

github

: 465

Awesome-LLM4IE-Papers

github

: 645

models

The Intel® AI Reference Models repository contains links to pre-trained models, sample scripts, best practices, and tutorials for popular open-source machine learning models optimized by Intel to run on Intel® Xeon® Scalable processors and Intel® Data Center GPUs. It aims to replicate the best-known performance of target model/dataset combinations in optimally-configured hardware environments. The repository will be deprecated upon the publication of v3.2.0 and will no longer be maintained or published.

github

: 669

LLM4Opt

LLM4Opt is a collection of references and papers focusing on applying Large Language Models (LLMs) for diverse optimization tasks. The repository includes research papers, tutorials, workshops, competitions, and related collections related to LLMs in optimization. It covers a wide range of topics such as algorithm search, code generation, machine learning, science, industry, and more. The goal is to provide a comprehensive resource for researchers and practitioners interested in leveraging LLMs for optimization tasks.

github

: 125

Model-References

The 'Model-References' repository contains examples for training and inference using Intel Gaudi AI Accelerator. It includes models for computer vision, natural language processing, audio, generative models, MLPerf™ training, and MLPerf™ inference. The repository provides performance data and model validation information for various frameworks like PyTorch. Users can find examples of popular models like ResNet, BERT, and Stable Diffusion optimized for Intel Gaudi AI accelerator.

github

: 138

ai-reference-models

The Intel® AI Reference Models repository contains links to pre-trained models, sample scripts, best practices, and tutorials for popular open-source machine learning models optimized by Intel to run on Intel® Xeon® Scalable processors and Intel® Data Center GPUs. The purpose is to quickly replicate complete software environments showcasing the AI capabilities of Intel platforms. It includes optimizations for popular deep learning frameworks like TensorFlow and PyTorch, with additional plugins/extensions for improved performance. The repository is licensed under Apache License Version 2.0.

github

: 676

ai-game-devtools

github

: 735

ai-game-development-tools

Here we will keep track of the AI Game Development Tools, including LLM, Agent, Code, Writer, Image, Texture, Shader, 3D Model, Animation, Video, Audio, Music, Singing Voice and Analytics. 🔥 * Tool (AI LLM) * Game (Agent) * Code * Framework * Writer * Image * Texture * Shader * 3D Model * Avatar * Animation * Video * Audio * Music * Singing Voice * Speech * Analytics * Video Tool

github

: 312

open-llms

Open LLMs is a repository containing various Large Language Models licensed for commercial use. It includes models like T5, GPT-NeoX, UL2, Bloom, Cerebras-GPT, Pythia, Dolly, and more. These models are designed for tasks such as transfer learning, language understanding, chatbot development, code generation, and more. The repository provides information on release dates, checkpoints, papers/blogs, parameters, context length, and licenses for each model. Contributions to the repository are welcome, and it serves as a resource for exploring the capabilities of different language models.

github

: 10.3k

Data-and-AI-Concepts

This repository is a curated collection of data science and AI concepts and IQs, covering topics from foundational mathematics to cutting-edge generative AI concepts. It aims to support learners and professionals preparing for various data science roles by providing detailed explanations and notebooks for each concept.

github

: 152

LLM4EC

LLM4EC is an interdisciplinary research repository focusing on the intersection of Large Language Models (LLM) and Evolutionary Computation (EC). It provides a comprehensive collection of papers and resources exploring various applications, enhancements, and synergies between LLM and EC. The repository covers topics such as LLM-assisted optimization, EA-based LLM architecture search, and applications in code generation, software engineering, neural architecture search, and other generative tasks. The goal is to facilitate research and development in leveraging LLM and EC for innovative solutions in diverse domains.

github

: 78

Awesome-LLM-Papers-Comprehensive-Topics

github

: 172

For similar tasks

Azure-Analytics-and-AI-Engagement

The Azure-Analytics-and-AI-Engagement repository provides packaged Industry Scenario DREAM Demos with ARM templates (Containing a demo web application, Power BI reports, Synapse resources, AML Notebooks etc.) that can be deployed in a customer’s subscription using the CAPE tool within a matter of few hours. Partners can also deploy DREAM Demos in their own subscriptions using DPoC.

github

: 136

sorrentum

Sorrentum is an open-source project that aims to combine open-source development, startups, and brilliant students to build machine learning, AI, and Web3 / DeFi protocols geared towards finance and economics. The project provides opportunities for internships, research assistantships, and development grants, as well as the chance to work on cutting-edge problems, learn about startups, write academic papers, and get internships and full-time positions at companies working on Sorrentum applications.

github

: 89

tidb

TiDB is an open-source distributed SQL database that supports Hybrid Transactional and Analytical Processing (HTAP) workloads. It is MySQL compatible and features horizontal scalability, strong consistency, and high availability.

github

: 37.1k

zep-python

Zep is an open-source platform for building and deploying large language model (LLM) applications. It provides a suite of tools and services that make it easy to integrate LLMs into your applications, including chat history memory, embedding, vector search, and data enrichment. Zep is designed to be scalable, reliable, and easy to use, making it a great choice for developers who want to build LLM-powered applications quickly and easily.

github

: 60

telemetry-airflow

This repository codifies the Airflow cluster that is deployed at workflow.telemetry.mozilla.org (behind SSO) and commonly referred to as "WTMO" or simply "Airflow". Some links relevant to users and developers of WTMO: * The `dags` directory in this repository contains some custom DAG definitions * Many of the DAGs registered with WTMO don't live in this repository, but are instead generated from ETL task definitions in bigquery-etl * The Data SRE team maintains a WTMO Developer Guide (behind SSO)

github

: 185

mojo

Mojo is a new programming language that bridges the gap between research and production by combining Python syntax and ecosystem with systems programming and metaprogramming features. Mojo is still young, but it is designed to become a superset of Python over time.

github

: 23.0k

pandas-ai

PandasAI is a Python library that makes it easy to ask questions to your data in natural language. It helps you to explore, clean, and analyze your data using generative AI.

github

: 14.0k

databend

Databend is an open-source cloud data warehouse that serves as a cost-effective alternative to Snowflake. With its focus on fast query execution and data ingestion, it's designed for complex analysis of the world's largest datasets.

github

: 7.7k

For similar jobs

weave

Weave is a toolkit for developing Generative AI applications, built by Weights & Biases. With Weave, you can log and debug language model inputs, outputs, and traces; build rigorous, apples-to-apples evaluations for language model use cases; and organize all the information generated across the LLM workflow, from experimentation to evaluations to production. Weave aims to bring rigor, best-practices, and composability to the inherently experimental process of developing Generative AI software, without introducing cognitive overhead.

github

: 855

LLMStack

LLMStack is a no-code platform for building generative AI agents, workflows, and chatbots. It allows users to connect their own data, internal tools, and GPT-powered models without any coding experience. LLMStack can be deployed to the cloud or on-premise and can be accessed via HTTP API or triggered from Slack or Discord.

github

: 1.5k

VisionCraft

The VisionCraft API is a free API for using over 100 different AI models. From images to sound.

github

: 94

kaito

Kaito is an operator that automates the AI/ML inference model deployment in a Kubernetes cluster. It manages large model files using container images, avoids tuning deployment parameters to fit GPU hardware by providing preset configurations, auto-provisions GPU nodes based on model requirements, and hosts large model images in the public Microsoft Container Registry (MCR) if the license allows. Using Kaito, the workflow of onboarding large AI inference models in Kubernetes is largely simplified.

github

: 405

PyRIT

PyRIT is an open access automation framework designed to empower security professionals and ML engineers to red team foundation models and their applications. It automates AI Red Teaming tasks to allow operators to focus on more complicated and time-consuming tasks and can also identify security harms such as misuse (e.g., malware generation, jailbreaking), and privacy harms (e.g., identity theft). The goal is to allow researchers to have a baseline of how well their model and entire inference pipeline is doing against different harm categories and to be able to compare that baseline to future iterations of their model. This allows them to have empirical data on how well their model is doing today, and detect any degradation of performance based on future improvements.

github

: 2.3k

tabby

Tabby is a self-hosted AI coding assistant, offering an open-source and on-premises alternative to GitHub Copilot. It boasts several key features: * Self-contained, with no need for a DBMS or cloud service. * OpenAPI interface, easy to integrate with existing infrastructure (e.g Cloud IDE). * Supports consumer-grade GPUs.

github

: 30.6k

spear

SPEAR (Simulator for Photorealistic Embodied AI Research) is a powerful tool for training embodied agents. It features 300 unique virtual indoor environments with 2,566 unique rooms and 17,234 unique objects that can be manipulated individually. Each environment is designed by a professional artist and features detailed geometry, photorealistic materials, and a unique floor plan and object layout. SPEAR is implemented as Unreal Engine assets and provides an OpenAI Gym interface for interacting with the environments via Python.

github

: 224

Magick

Magick is a groundbreaking visual AIDE (Artificial Intelligence Development Environment) for no-code data pipelines and multimodal agents. Magick can connect to other services and comes with nodes and templates well-suited for intelligent agents, chatbots, complex reasoning systems and realistic characters.

github

: 675