Awesome-Efficient-Agents

Survey and paper list on efficiency-guided LLM agents (memory, tool learning, planning).

Stars: 159

Visit

This repository, Awesome Efficient Agents, is a curated collection of papers focusing on memory, tool learning, and planning in agentic systems. It provides a comprehensive survey of efficient agent design, emphasizing memory construction, tool learning, and planning strategies. The repository categorizes papers based on memory processes, tool selection, tool calling, tool-integrated reasoning, and planning efficiency. It aims to help readers quickly access representative work in the field of efficient agent design.

README:

Awesome Efficient Agents: A Survey of Memory, Tool Learning, and Planning

🤝 Contributions welcome! Open an issue or submit a pull request to add papers, fix links, or improve categorization.

⚡Introduction

Recent years have seen growing interest in extending large language models into agentic systems. While agent capabilities have advanced rapidly, efficiency has received comparatively less attention despite being crucial for real-world deployment. This repository studies efficiency-guided agent design from three core components: memory, tool learning, and planning.

We provide a curated paper list to help readers quickly locate representative work, along with lightweight notes on how each topic connects to efficiency.

Efficient Memory. We organize memory-related papers into three processes: construction, management, and access.
Efficient Tool Learning. We group papers into tool selection, tool calling, and tool-integrated reasoning.
Efficient Planning. We collect work on planning that improves overall agent efficiency by reducing unnecessary actions and shortening trajectories.

🧾Paper List

📂 Table of Contents(click to expand/collapse)

🧠Memory
🛠️Tool Learning
🧩Planning
- Single-Agent Planning Efficiency
- Multi-Agent Collaborative Efficiency
📑Related Surveys

🧠Memory

In the paper, we organize memory into construction, management, and access. Since many papers overlap across these stages, this README is primarily organized around memory construction to avoid redundancy.

Working Memory

Textual Memory

Latent Memory

(2025-09) MemGen: Weaving Generative Latent Memory for Self-Evolving Agents
(2025-02) M+: Extending MemoryLLM with Scalable Long-Term Memory
(2025-01) Titans: Learning to Memorize at Test Time
(2024-09) MemoRAG: Boosting Long Context Processing with Global Memory-Enhanced Retrieval Augmentation
(2024-07) $\text{Memory}^3$: Language Modeling with Explicit Memory
(2024-02) MEMORYLLM: Towards Self-Updatable Large Language Models
(2024-01) Long Context Compression with Activation Beacon

External Memory

Item-based Memory

Graph-based Memory

Hierarchical Memory

(2025-10) Beyond a Million Tokens: Benchmarking and Enhancing Long-Term Memory in LLMs
(2025-10) LightMem: Lightweight and Efficient Memory-Augmented Generation
(2025-07) Hierarchical Memory for High-Efficiency Long-Term Reasoning in LLM Agents
(2025-07) MemOS: A Memory OS for AI System
(2025-06) Memory OS of AI Agent
(2024-08) HiAgent: Hierarchical Working Memory Management for Solving Long-Horizon Agent Tasks with Large Language Model
(2024-02) A Human-Inspired Reading Agent with Gist Memory of Very Long Contexts
(2023-10) MemGPT: Towards LLMs as Operating Systems

Multi-Agent Memory

Shared Memory

Local Memory

Mixed Memory

🛠️Tool Learning

Tool Selection

External Retriever

Multi-Label Classification (MLC)

(2024-09) Efficient and Scalable Estimation of Tool Representations in Vector Space
(2024-09) TinyAgent: Function Calling at the Edge

Vocabulary-based Retrieval

Tool Calling

In-Place Parameter Filling

(2024-01) Efficient Tool Use with Chain-of-Abstraction Reasoning
(2023-02) Toolformer: Language Models Can Teach Themselves to Use Tools

Parallel Tool Calling

(2024-11) CATP-LLM: Empowering Large Language Models for Cost-Aware Tool Planning
(2024-05) An LLM-Tool Compiler for Fused Parallel Function Calling
(2023-12) An LLM Compiler for Parallel Function Calling

Cost-Aware Tool Calling

Efficient Test-Time Scaling

Efficient Tool Calling with Post-training

Tool-Integrated Reasoning (TIR)

Selective Invocation

Cost-Aware Policy Optimization

🧩Planning

Single-Agent Planning Efficiency

Adaptive Budgeting and Control

(2025-11) Budget-Aware Tool-Use Enables Effective Agent Scaling
(2025-09) Learning When to Plan: Efficiently Allocating Test-Time Compute for LLM Agents
(2025-06) Query-Level Uncertainty in Large Language Models
(2023-12) ReST meets ReAct: Self-Improvement for Multi-Step Reasoning LLM Agent
(2023-05) SwiftSage: A Generative Agent with Fast and Slow Thinking for Complex Interactive Tasks
(2023-03) Reflexion: Language Agents with Verbal Reinforcement Learning

Structured Search

Task Decomposition

Policy Optimization

Memory and Skill Acquisition

Multi-Agent Collaborative Efficiency

Topological Efficiency and Sparsification

Protocol and Context Optimization

Distilling Coordination into Planning

📑Related Surveys

Given that our work mainly focuses on efficiency, which is rooted in effectiveness, we’ve gathered a list of related survey papers to offer a complementary perspective. We hope this will help bring visibility to some valuable surveys that deserve more attention.💡

Memory Survey

Tool Learning Survey

(2024-05) Tool Learning with Large Language Models: A Survey

Planning and Reasoning Survey

(2025-08) LLM-based Agentic Reasoning Frameworks: A Survey from Methods to Scenarios
(2024-02) Understanding the planning of LLM agents: A survey

📌Citation

If you find this survey useful, please cite:

@misc{yang2026efficientagentsmemorytool,
      title={Toward Efficient Agents: Memory, Tool learning, and Planning}, 
      author={Xiaofang Yang and Lijun Li and Heng Zhou and Tong Zhu and Xiaoye Qu and Yuchen Fan and Qianshan Wei and Rui Ye and Li Kang and Yiran Qin and Zhiqiang Kou and Daizong Liu and Qi Li and Ning Ding and Siheng Chen and Jing Shao},
      year={2026},
      eprint={2601.14192},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2601.14192}, 
}

For Tasks:

Click tags to check more tools for each tasks

optimize memory efficiency learn efficient tools plan agent actions improve agent reasoning enhance collaborative efficiency

For Jobs:

ai researcher machine learning engineer data scientist natural language processing specialist research scientist

Alternative AI tools for Awesome-Efficient-Agents

Similar Open Source Tools

Awesome-Efficient-Agents

github

: 159

chatgpt-auto-continue

ChatGPT Auto-Continue is a userscript that automatically continues generating ChatGPT responses when chats cut off. It relies on the powerful chatgpt.js library and is easy to install and use. Simply install Tampermonkey and ChatGPT Auto-Continue, and visit chat.openai.com as normal. Multi-reply conversations will automatically continue generating when cut-off!

github

: 177

awesome-yolo-object-detection

github

: 1.2k

llama.cpp

The main goal of llama.cpp is to enable LLM inference with minimal setup and state-of-the-art performance on a wide range of hardware - locally and in the cloud. It provides a Plain C/C++ implementation without any dependencies, optimized for Apple silicon via ARM NEON, Accelerate and Metal frameworks, and supports various architectures like AVX, AVX2, AVX512, and AMX. It offers integer quantization for faster inference, custom CUDA kernels for NVIDIA GPUs, Vulkan and SYCL backend support, and CPU+GPU hybrid inference. llama.cpp is the main playground for developing new features for the ggml library, supporting various models and providing tools and infrastructure for LLM deployment.

github

: 94.9k

VideoRefer

VideoRefer Suite is a tool designed to enhance the fine-grained spatial-temporal understanding capabilities of Video Large Language Models (Video LLMs). It consists of three primary components: Model (VideoRefer) for perceiving, reasoning, and retrieval for user-defined regions at any specified timestamps, Dataset (VideoRefer-700K) for high-quality object-level video instruction data, and Benchmark (VideoRefer-Bench) to evaluate object-level video understanding capabilities. The tool can understand any object within a video.

github

: 157

chatgpt-infinity

ChatGPT Infinity is a free and powerful add-on that makes ChatGPT generate infinite answers on any topic. It offers customizable topic selection, multilingual support, adjustable response interval, and auto-scroll feature for a seamless chat experience.

github

: 346

LlamaFactory

github

: 67.1k

awesome-saas

The Alchemyst Platform Cookbook is a comprehensive guide for developers and builders to bring their AI ideas to life. It provides cutting-edge AI tools and templates to empower users in creating innovative projects. The platform offers API documentation, quick start guides, official and community templates for various projects. Users can contribute to the platform by forking the repository, adding the topic 'alchemyst-awesome-saas', making their repository public, and submitting a pull request. Troubleshooting guidelines are provided for contributors. The platform is actively maintained by the Alchemyst AI Team.

github

: 315

Plug-play-modules

Plug-play-modules is a comprehensive collection of plug-and-play modules for AI, deep learning, and computer vision applications. It includes various convolution variants, latest attention mechanisms, feature fusion modules, up-sampling/down-sampling modules, suitable for tasks like image classification, object detection, instance segmentation, semantic segmentation, single object tracking (SOT), multi-object tracking (MOT), infrared object tracking (RGBT), image de-raining, de-fogging, de-blurring, super-resolution, and more. The modules are designed to enhance model performance and feature extraction capabilities across various tasks.

github

: 281

LLaMA-Factory

LLaMA Factory is a unified framework for fine-tuning 100+ large language models (LLMs) with various methods, including pre-training, supervised fine-tuning, reward modeling, PPO, DPO and ORPO. It features integrated algorithms like GaLore, BAdam, DoRA, LongLoRA, LLaMA Pro, LoRA+, LoftQ and Agent tuning, as well as practical tricks like FlashAttention-2, Unsloth, RoPE scaling, NEFTune and rsLoRA. LLaMA Factory provides experiment monitors like LlamaBoard, TensorBoard, Wandb, MLflow, etc., and supports faster inference with OpenAI-style API, Gradio UI and CLI with vLLM worker. Compared to ChatGLM's P-Tuning, LLaMA Factory's LoRA tuning offers up to 3.7 times faster training speed with a better Rouge score on the advertising text generation task. By leveraging 4-bit quantization technique, LLaMA Factory's QLoRA further improves the efficiency regarding the GPU memory.

github

: 59.4k

llama.cpp

llama.cpp is a C++ implementation of LLaMA, a large language model from Meta. It provides a command-line interface for inference and can be used for a variety of tasks, including text generation, translation, and question answering. llama.cpp is highly optimized for performance and can be run on a variety of hardware, including CPUs, GPUs, and TPUs.

github

: 72.0k

ST-LLM

ST-LLM is a temporal-sensitive video large language model that incorporates joint spatial-temporal modeling, dynamic masking strategy, and global-local input module for effective video understanding. It has achieved state-of-the-art results on various video benchmarks. The repository provides code and weights for the model, along with demo scripts for easy usage. Users can train, validate, and use the model for tasks like video description, action identification, and reasoning.

github

: 99

InternLM

InternLM is a powerful language model series with features such as 200K context window for long-context tasks, outstanding comprehensive performance in reasoning, math, code, chat experience, instruction following, and creative writing, code interpreter & data analysis capabilities, and stronger tool utilization capabilities. It offers models in sizes of 7B and 20B, suitable for research and complex scenarios. The models are recommended for various applications and exhibit better performance than previous generations. InternLM models may match or surpass other open-source models like ChatGPT. The tool has been evaluated on various datasets and has shown superior performance in multiple tasks. It requires Python >= 3.8, PyTorch >= 1.12.0, and Transformers >= 4.34 for usage. InternLM can be used for tasks like chat, agent applications, fine-tuning, deployment, and long-context inference.

github

: 6.7k

awesome-cuda-and-hpc

github

: 129

awesome-cuda-tensorrt-fpga

Okay, here is a JSON object with the requested information about the awesome-cuda-tensorrt-fpga repository:

github

: 103

awesome-hpc-cuda-fpga

github

: 104

For similar tasks

Awesome-Efficient-Agents

github

: 159

APOLLO

APOLLO is a memory-efficient optimizer designed for large language model (LLM) pre-training and full-parameter fine-tuning. It offers SGD-like memory cost with AdamW-level performance. The optimizer integrates low-rank approximation and optimizer state redundancy reduction to achieve significant memory savings while maintaining or surpassing the performance of Adam(W). Key contributions include structured learning rate updates for LLM training, approximated channel-wise gradient scaling in a low-rank auxiliary space, and minimal-rank tensor-wise gradient scaling. APOLLO aims to optimize memory efficiency during training large language models.

github

: 175

mountain-goap

Mountain GOAP is a generic C# GOAP (Goal Oriented Action Planning) library for creating AI agents in games. It favors composition over inheritance, supports multiple weighted goals, and uses A* pathfinding to plan paths through sequential actions. The library includes concepts like agents, goals, actions, sensors, permutation selectors, cost callbacks, state mutators, state checkers, and a logger. It also features event handling for agent planning and execution. The project structure includes examples, API documentation, and internal classes for planning and execution.

github

: 77

For similar jobs

weave

Weave is a toolkit for developing Generative AI applications, built by Weights & Biases. With Weave, you can log and debug language model inputs, outputs, and traces; build rigorous, apples-to-apples evaluations for language model use cases; and organize all the information generated across the LLM workflow, from experimentation to evaluations to production. Weave aims to bring rigor, best-practices, and composability to the inherently experimental process of developing Generative AI software, without introducing cognitive overhead.

github

: 980

LLMStack

LLMStack is a no-code platform for building generative AI agents, workflows, and chatbots. It allows users to connect their own data, internal tools, and GPT-powered models without any coding experience. LLMStack can be deployed to the cloud or on-premise and can be accessed via HTTP API or triggered from Slack or Discord.

github

: 1.5k

VisionCraft

The VisionCraft API is a free API for using over 100 different AI models. From images to sound.

github

: 94

kaito

Kaito is an operator that automates the AI/ML inference model deployment in a Kubernetes cluster. It manages large model files using container images, avoids tuning deployment parameters to fit GPU hardware by providing preset configurations, auto-provisions GPU nodes based on model requirements, and hosts large model images in the public Microsoft Container Registry (MCR) if the license allows. Using Kaito, the workflow of onboarding large AI inference models in Kubernetes is largely simplified.

github

: 405

PyRIT

PyRIT is an open access automation framework designed to empower security professionals and ML engineers to red team foundation models and their applications. It automates AI Red Teaming tasks to allow operators to focus on more complicated and time-consuming tasks and can also identify security harms such as misuse (e.g., malware generation, jailbreaking), and privacy harms (e.g., identity theft). The goal is to allow researchers to have a baseline of how well their model and entire inference pipeline is doing against different harm categories and to be able to compare that baseline to future iterations of their model. This allows them to have empirical data on how well their model is doing today, and detect any degradation of performance based on future improvements.

github

: 2.9k

tabby

Tabby is a self-hosted AI coding assistant, offering an open-source and on-premises alternative to GitHub Copilot. It boasts several key features: * Self-contained, with no need for a DBMS or cloud service. * OpenAPI interface, easy to integrate with existing infrastructure (e.g Cloud IDE). * Supports consumer-grade GPUs.

github

: 32.9k

spear

SPEAR (Simulator for Photorealistic Embodied AI Research) is a powerful tool for training embodied agents. It features 300 unique virtual indoor environments with 2,566 unique rooms and 17,234 unique objects that can be manipulated individually. Each environment is designed by a professional artist and features detailed geometry, photorealistic materials, and a unique floor plan and object layout. SPEAR is implemented as Unreal Engine assets and provides an OpenAI Gym interface for interacting with the environments via Python.

github

: 224

Magick

Magick is a groundbreaking visual AIDE (Artificial Intelligence Development Environment) for no-code data pipelines and multimodal agents. Magick can connect to other services and comes with nodes and templates well-suited for intelligent agents, chatbots, complex reasoning systems and realistic characters.

github

: 675