awesome-tool-llm

awesome-tool-llm

None

Stars: 114

Visit
 screenshot

This repository focuses on exploring tools that enhance the performance of language models for various tasks. It provides a structured list of literature relevant to tool-augmented language models, covering topics such as tool basics, tool use paradigm, scenarios, advanced methods, and evaluation. The repository includes papers, preprints, and books that discuss the use of tools in conjunction with language models for tasks like reasoning, question answering, mathematical calculations, accessing knowledge, interacting with the world, and handling non-textual modalities.

README:

🛠️ Awesome LMs with Tools

Awesome PRs Welcome arXiv

Language models (LMs) are powerful yet mostly for text-generation tasks. Tools have substantially enhanced their performance for tasks that require complex skills.

Based on our recent survey about LM-used tools, "What Are Tools Anyway? A Survey from the Language Model Perspective", we provide a structured list of literature relevant to tool-augmented LMs.

  • Tool basics ($\S2$)
  • Tool use paradigm ($\S3$)
  • Scenarios ($\S4$)
  • Advanced methods ($\S5$)
  • Evaluation ($\S6$)

If you find our paper or code useful, please cite the paper:

@article{wang2022what,
  title={What Are Tools Anyway? A Survey from the Language Model Perspective},
  author={Zhiruo Wang, Zhoujun Cheng, Hao Zhu, Daniel Fried, Graham Neubig},
  journal={arXiv preprint arXiv:2403.15452},
  year={2024}
}

$\S2$ Tool Basics

$\S2.1$ What are tools? 🛠️

  • Definition and discussion of animal-used tools

    Animal tool behavior: the use and manufacture of tools by animals Shumaker, Robert W., Kristina R. Walkup, and Benjamin B. Beck. 2011 [Book]

  • Early discussions on LM-used tools

    ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs Qin, Yujia, et al. 2023.07 [Paper]

  • A survey on augmented LMs, including tool augmentation

    Augmented Language Models: a Survey Mialon, Grégoire, et al. 2023.02 [Paper]

$\S2.3$ Tools and "Agents" 🤖

  • Definition of agents

    Artificial intelligence a modern approach Russell, Stuart J., and Peter Norvig. 2016 [Book]

  • Survey about agents that perceive and act in the environment

    The Rise and Potential of Large Language Model Based Agents: A Survey Xi, Zhiheng, et al. 2023.09 [Preprint]

  • Survey about the cognitive architectures for language agents

    Cognitive Architectures for Language Agents Sumers, Theodore R., et al. 2023.09 [Paper]

$\S3$ The basic tool use paradigm

  • Early works that set up the commonly used tooling paradigm

    Toolformer: Language Models Can Teach Themselves to Use Tools Schick, Timo, et al. 2024 [Paper]

Inference-time prompting

  • Provide in-context examples for tool-using on visual programming problems

    Visual Programming: Compositional visual reasoning without training Gupta, Tanmay, and Aniruddha Kembhavi. 2023 [Paper]

  • Tool learning via in-context examples on reasoning problems involving text or multi-modal inputs

    Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models Lu, Pan, et al. 2024 [Paper]

  • In-context learning based tool using for reasoning problems in BigBench and MMLU

    ART: Automatic multi-step reasoning and tool-use for large language models Paranjape, Bhargavi, et al. 2023.03 [Preprint]

  • Providing tool documentation for in-context tool learning

    Tool Documentation Enables Zero-Shot Tool-Usage with Large Language Models Hsieh, Cheng-Yu, et al. 2023.08 [Preprint]

Learning by training

  • Training on human annotated examples of (NL input, tool-using solution output) pairs

    API-Bank: A Comprehensive Benchmark for Tool-Augmented LLMs Li, Minghao, et al. 2023.12 [Paper]

    Calc-X and Calcformers: Empowering Arithmetical Chain-of-Thought through Interaction with Symbolic Systems Kadlčík, Marek, et al. 2023 [Paper]

  • Training on model-synthesized examples

    ToolAlpaca: Generalized Tool Learning for Language Models with 3000 Simulated Cases Tang, Qiaoyu, et al. 2023.06 [Preprint]

    ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs Qin, Yujia, et al. 2023.07 [Paper]

    MetaTool Benchmark for Large Language Models: Deciding Whether to Use Tools and Which to Use Huang, Yue, et al. 2023.10 [Paper]

    Making Language Models Better Tool Learners with Execution Feedback Qiao, Shuofei, et al. 2023.05 [Preprint]

    LLMs in the Imaginarium: Tool Learning through Simulated Trial and Error Wang, Boshi, et al. 2024.03 [Preprint]

  • Self-training with bootstrapped examples

    Toolformer: Language Models Can Teach Themselves to Use Tools Schick, Timo, et al. 2024 Paper

$\S4$ Scenarios

Knowledge access 📚

  • Collect data from structured knowledge sources, e.g., databases, knowledge graphs, etc.

    LaMDA: Language Models for Dialog Applications Thoppilan, Romal, et al. 2022.01 [Paper]

    TALM: Tool Augmented Language Models Parisi, Aaron, Yao Zhao, and Noah Fiedel. 2022.05 [Preprint]

    ToolkenGPT: Augmenting Frozen Language Models with Massive Tools via Tool Embeddings Hao, Shibo, et al. 2024 [Paper]

    ToolQA: A Dataset for LLM Question Answering with External Tools Zhuang, Yuchen, et al. 2024 [Paper]

    Middleware for LLMs: Tools are Instrumental for Language Agents in Complex Environments Gu, Yu, et al. 2024 [Paper]

    GeneGPT: Augmenting Large Language Models with Domain Tools for Improved Access to Biomedical Information Jin, Qiao, et al. 2024 [Paper]

  • Search information from the web

    Internet-augmented language models through few-shot prompting for open-domain question answering Lazaridou, Angeliki, et al. 2022.03 [Paper]

    Internet-Augmented Dialogue Generation Komeili, Mojtaba, Kurt Shuster, and Jason Weston. 2022 [Paper]

  • Viewing retrieval models as tools under the retrieval-augmented generation context

    Retrieval-based Language Models and Applications Asai, Akari, et al. 2023 [Tutorial]

    Augmented Language Models: a Survey Mialon, Grégoire, et al. 2023.02 [Paper]

Computation activities 🔣

  • Using calculator for math calculations

    Toolformer: Language Models Can Teach Themselves to Use Tools Schick, Timo, et al. 2024 [Paper]

    Calc-X and Calcformers: Empowering Arithmetical Chain-of-Thought through Interaction with Symbolic Systems Kadlčík, Marek, et al. 2023 [Paper]

  • Using programs/Python interpreter to perform more complex operations

    Pal: Program-aided language models Gao, Luyu, et al. 2023 [Paper]

    Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks Chen, Wenhu, et al. 2022.11 [Paper]

    Mint: Evaluating llms in multi-turn interaction with tools and language feedback Wang, Xingyao, et al. 2023.09 [Paper]

    MATHSENSEI: A Tool-Augmented Large Language Model for Mathematical Reasoning Das, Debrup, et al. 2024 [Preprint]

    ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem Solving Gou, Zhibin, et al. 2023.09 [Paper]

  • Tools for more advanced business activities, e.g., financial, medical, education, etc.

    On the Tool Manipulation Capability of Open-source Large Language Models Xu, Qiantong, et al. 2023.05 [Paper]

    ToolAlpaca: Generalized Tool Learning for Language Models with 3000 Simulated Cases Tang, Qiaoyu, et al. 2023.06 [Preprint]

    Mint: Evaluating llms in multi-turn interaction with tools and language feedback Wang, Xingyao, et al. 2023.09 [Paper]

    AgentMD: Empowering Language Agents for Risk Prediction with Large-Scale Clinical Tool Learning Jin, Qiao, et al. 2024.02 [Paper]

Interaction with the world 🌐

  • Access real-time or real-world information such as weather, location, etc.

    On the Tool Manipulation Capability of Open-source Large Language Models Xu, Qiantong, et al. 2023.05 [Paper]

    ToolAlpaca: Generalized Tool Learning for Language Models with 3000 Simulated Cases Tang, Qiaoyu, et al. 2023.06 [Preprint]

  • Managing personal events such as calendar or emails

    Toolformer: Language Models Can Teach Themselves to Use Tools Schick, Timo, et al. 2024 [Paper]

  • Tools in embodied environments, e.g., the Minecraft world

    Voyager: An Open-Ended Embodied Agent with Large Language Models Wang, Guanzhi, et al. 2023.05 [Paper]

  • Tools interacting with the physical world

    ProgPrompt: Generating Situated Robot Task Plans using Large Language Models Singh, Ishika, et al. 2023 [Paper]

    Alfred: A benchmark for interpreting grounded instructions for everyday tasks Shridhar, Mohit, et al. 2020 [Paper]

    Autonomous chemical research with large language models Boiko, Daniil A., et al. 2023 [Paper]

Non-textual modalities 🎞️

  • Tools providing access to information in non-textual modalities

    Vipergpt: Visual inference via python execution for reasoning Surís, Dídac, Sachit Menon, and Carl Vondrick. 2023 [Paper]

    MM-REACT: Prompting ChatGPT for Multimodal Reasoning and Action Yang, Zhengyuan, et al. 2023.03 [Preprint]

    AssistGPT: A General Multi-modal Assistant that can Plan, Execute, Inspect, and Learn Gao, Difei, et al. 2023.06 [Preprint]

  • Tools that can answer questions about data in other modalities

    Visual Programming: Compositional visual reasoning without training Gupta, Tanmay, and Aniruddha Kembhavi. 2023 [Paper]

Special-skilled models 🤗

  • Text-generation models that can perform specific tasks, e.g., question answering, machine translation

    Toolformer: Language Models Can Teach Themselves to Use Tools Schick, Timo, et al. 2024 [Paper]

    ART: Automatic multi-step reasoning and tool-use for large language models Paranjape, Bhargavi, et al. 2023.03 [Preprint]

  • Integration of available models on Huggingface, TorchHub, TensorHub, etc.

    HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face Shen, Yongliang, et al. 2024 [Paper]

    Gorilla: Large language model connected with massive apis Patil, Shishir G., et al. 2023.05 [Paper]

    Taskbench: Benchmarking large language models for task automation Shen, Yongliang, et al. 2023.11 [Paper]

$\S5$ Advanced methods

$\S5.1$ Complex tool selection and usage 🧐

  • Train retrievers that map natural language instructions to tool documentation

    DocPrompting: Generating Code by Retrieving the Docs Zhou, Shuyan, et al. 2022.07 [Paper]

    ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs Qin, Yujia, et al. 2023.07 [Paper]

  • Ask LMs to write hypothetical tool descriptions and search relevant tools

    CRAFT: Customizing LLMs by Creating and Retrieving from Specialized Toolsets Yuan, Lifan, et al. 2023.09 [Paper]

  • Complex tool usage, e.g., parallel calls

    Function Calling and Other API Updates Eleti, Atty, et al. 2023.06 [Blog]

$\S5.2$ Tools in programmatic contexts 👩‍💻

  • Domain-specific logical forms to query structured data

    Semantic parsing on freebase from question-answer pairs Berant, Jonathan, et al. 2013 [Paper]

    Spider: A large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-sql task Yu, Tao, et al. 2018.09 [Paper]

    Break It Down: A Question Understanding Benchmark Wolfson, Tomer, et al. 2020 [Paper]

  • Domain-specific actions for agentic tasks such as web navigation

    Reinforcement Learning on Web Interfaces using Workflow-Guided Exploration Liu, Evan Zheran, et al. 2018.02 [Paper]

    WebShop: Towards Scalable Real-World Web Interaction with Grounded Language Agents Yao, Shunyu, et al. 2022.07 [Paper]

    Webarena: A realistic web environment for building autonomous agents Zhou, Shuyan, et al. 2023.07 [Paper]

  • Using external Python libraries as tools

    ToolCoder: Teach Code Generation Models to use API search tools Zhang, Kechi, et al. 2023.05 [Paper]

  • Using expert designed functions as tools to answer questions about images

    Visual Programming: Compositional visual reasoning without training Gupta, Tanmay, and Aniruddha Kembhavi. 2023 [Paper]

    Vipergpt: Visual inference via python execution for reasoning Surís, Dídac, Sachit Menon, and Carl Vondrick. 2023 [Paper]

  • Using GPT as a tool to query external Wikipedia knowledge for table-based question answering

    Binding Language Models in Symbolic Languages Cheng, Zhoujun, et al. 2022.10 [Paper]

  • Incorporate QA API and operation APIs to assist table-based question answering

    API-Assisted Code Generation for Question Answering on Varied Table Structures Cao, Yihan, et al. 2023.12 [Paper]

$\S5.3$ Tool creation and reuse 👩‍🔬

  • Approaches to abstract libraries for domain-specific logical forms from a large corpus

    DreamCoder: growing generalizable, interpretable knowledge with wake--sleep Bayesian program learning Ellis, Kevin, et al. 2020.06 [Paper]

    Leveraging Language to Learn Program Abstractions and Search Heuristics] Wong, Catherine, et al. 2021 [Paper]

    Top-Down Synthesis for Library Learning Bowers, Matthew, et al. 2023 [Paper]

    LILO: Learning Interpretable Libraries by Compressing and Documenting Code Grand, Gabriel, et al. 2023.10 [Paper]

  • Make and learn skills (Java programs) in the embodied Minecraft world

    Voyager: An Open-Ended Embodied Agent with Large Language Models Wang, Guanzhi, et al. 2023.05 [Paper]

  • Leverage LMs as tool makers on BigBench tasks

    Large Language Models as Tool Makers Cai, Tianle, et al. 2023.05 [Preprint]

  • Create tools for math and table QA tasks by example-wise tool making

    CREATOR: Disentangling Abstract and Concrete Reasonings of Large Language Models through Tool Creation Qian, Cheng, et al. 2023.05 [Paper]

  • Make tools via heuristic-based training and tool deduplication

    CRAFT: Customizing LLMs by Creating and Retrieving from Specialized Toolsets Yuan, Lifan, et al. 2023.09 [Paper]

  • Learning tools by refactoring a small amount of programs

    ReGAL: Refactoring Programs to Discover Generalizable Abstractions Stengel-Eskin, Elias, Archiki Prasad, and Mohit Bansal. 2024.01 [Preprint]

  • A training-free approach to make tools via execution consistency

    🎁 TroVE: Inducing Verifiable and Efficient Toolboxes for Solving Programmatic Tasks Wang, Zhiruo, Daniel Fried, and Graham Neubig. 2024.01 [Preprint]

$\S6$ Evaluation: Testbeds

$\S6.1.1$ Repurposed existing datasets

  • Datasets that require reasoning over texts

    Measuring Mathematical Problem Solving With the MATH Dataset Hendrycks, Dan, et al. 2021.03 [Paper]

    Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models Srivastava, Aarohi, et al. 2022.06 [Paper]

  • Datasets that require reasoning over structured data, e.g., tables

    Dynamic Prompt Learning via Policy Gradient for Semi-structured Mathematical Reasoning Lu, Pan, et al. 2022.09 [Paper]

    Compositional Semantic Parsing on Semi-Structured Tables Pasupat, Panupong, and Percy Liang. 2015 [Paper]

    HiTab: A Hierarchical Table Dataset for Question Answering and Natural Language Generation Cheng, Zhoujun, et al. 2022 [Paper]

  • Datasets that require reasoning over other modalities, e.g., images and image pairs

    Gqa: A new dataset for real-world visual reasoning and compositional question answering Hudson, Drew A., and Christopher D. Manning. 2019.02 [Paper]

    A Corpus for Reasoning about Natural Language Grounded in Photographs Suhr, Alane, et al. 2019 [Paper]

  • Example datasets that require retriever model (tool) to solve

    Natural Questions: A Benchmark for Question Answering Research Kwiatkowski, Tom, et al. 2019 [Paper]

    TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension Joshi, Mandar, et al. 2017 [Paper]

$\S6.1.2$ Aggregated API benchmarks

  • Collect RapidAPIs and use models to synthesize examples for evaluation

    ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs Qin, Yujia, et al. 2023.07 [Paper]

  • Collect APIs from PublicAPIs and use models to synthesize examples

    ToolAlpaca: Generalized Tool Learning for Language Models with 3000 Simulated Cases Tang, Qiaoyu, et al. 2023.06 [Preprint]

  • Collect APIs from PublicAPIs and manually annotate examples for evaluation

    API-Bank: A Comprehensive Benchmark for Tool-Augmented LLMs Li, Minghao, et al. 2023.12 [Paper]

  • Collect APIs from OpenAI plugin list and use models to synthesize examples

    MetaTool Benchmark for Large Language Models: Deciding Whether to Use Tools and Which to Use Huang, Yue, et al. 2023.10 [Paper]

  • Collect neural model tools from Huggingface hub, TorchHub, and TensorHub

    Gorilla: Large language model connected with massive apis Patil, Shishir G., et al. 2023.05 [Paper]

  • Collect neural model tools from Huggingface

    HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face Shen, Yongliang, et al. 2024 [Paper]

  • Collect tools from Huggingface and PublicAPIs

    Taskbench: Benchmarking large language models for task automation Shen, Yongliang, et al. 2023.11 [Paper]

For Tasks:

Click tags to check more tools for each tasks

For Jobs:

Alternative AI tools for awesome-tool-llm

Similar Open Source Tools

For similar tasks

For similar jobs