Awesome-Agent-Papers

[Up-to-date] Large Language Model Agent: A Survey on Methodology, Applications and Challenges

Stars: 98

Visit

README:

🤖 Comprehensive LLM Agent Research Collection

🌟 Overview

This repository contains a comprehensive collection of research papers on Large Language Model (LLM) agents. We organize papers across key categories including agent construction, collaboration mechanisms, evolution, tools, security, benchmarks, and applications.

Our taxonomy provides a structured framework for understanding the rapidly evolving field of LLM agents, from architectural foundations to practical implementations. The repository bridges fragmented research threads by highlighting connections between agent design principles and emergent behaviors.

📄 Read our survey paper here

📊 Statistics & Trends

Our survey covers the rapidly evolving field of LLM agents, with a significant increase in research publications since 2023.

🔍 Key Categories

Agent Construction: Methodologies and architectures for building LLM agents
Agent Collaboration: Frameworks for multi-agent interaction and cooperation
Agent Evolution: Self-improvement and learning capabilities of agents
Tools: Integration of external tools and APIs with LLM agents
Security: Security concerns and protections for LLM agent systems
Benchmarks: Evaluation frameworks and datasets for testing agent capabilities
Applications: Real-world implementations and use cases

📚 Resource List

Title	Section_or_Category	Year	url
Adaptive Collaboration Strategy for LLMs in Medical Decision Making	Agent Collaboration	2024	link
ReConcile: Round-Table Conference Improves Reasoning via Consensus among Diverse LLMs	Agent Collaboration	2024	link
MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework	Agent Collaboration	2024	link
Debating with More Persuasive LLMs Leads to More Truthful Answers	Agent Collaboration	2024	link
Roco: Dialectic multi-robot collaboration with large language models	Agent Collaboration	2024	link
AutoAct: Automatic Agent Learning from Scratch for QA via Self-Planning	Agent Collaboration	2024	link
Meta-Prompting: Enhancing Language Models with Task-Agnostic Scaffolding	Agent Collaboration	2024	link
Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate	Agent Collaboration	2024	link
AgentVerse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors	Agent Collaboration	2024	link
Dynamic LLM-Powered Agent Network for Task-Oriented Agent Collaboration	Agent Collaboration	2024	link
ChatDev: Communicative Agents for Software Development	Agent Collaboration	2024	link
ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate	Agent Collaboration	2024	link
A Dynamic LLM-Powered Agent Network for Task-Oriented Agent Collaboration	Agent Collaboration	2024	link
AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation	Agent Collaboration	2023	link
Improving Factuality and Reasoning in Language Models through Multiagent Debate	Agent Collaboration	2023	link
Autonomous chemical research with large language models	Agent Collaboration	2023	link
Planning with Multi-Constraints via Collaborative Language Agents	Agent Construction	2025	link
Embodied Agent Interface: Benchmarking LLMs for Embodied Decision Making	Agent Construction	2025	link
AutoAgents: A Framework for Automatic Agent Generation	Agent Construction	2024	link
MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework	Agent Construction	2024	link
Cognitive Architectures for Language Agents	Agent Construction	2024	link
Executable Code Actions Elicit Better LLM Agents	Agent Construction	2024	link
ChatDev: Communicative Agents for Software Development	Agent Construction	2024	link
Editable Scene Simulation for Autonomous Driving via Collaborative LLM-Agents	Agent Construction	2024	link
A Dynamic LLM-Powered Agent Network for Task-Oriented Agent Collaboration	Agent Construction	2024	link
More Agents Is All You Need	Agent Construction	2024	link
Agent Hospital: A Simulacrum of Hospital with Evolvable Medical Agents	Agent Construction	2024	link
Empowering biomedical discovery with AI agents	Agent Construction	2024	link
SMART-LLM: Smart Multi-Agent Robot Task Planning using Large Language Models	Agent Construction	2024	link
Perceive, Reflect, and Plan: Designing LLM Agent for Goal-Directed City Navigation without Instructions	Agent Construction	2024	link
Enhancing the General Agent Capabilities of Low-Parameter LLMs through Tuning and Multi-Branch Reasoning	Agent Construction	2024	link
PlanCritic: Formal Planning with Human Feedback	Agent Construction	2024	link
Enhancing Robot Task Planning: Integrating Environmental Information and Feedback Insights through Large Language Models	Agent Construction	2024	link
Devil's Advocate: Anticipatory Reflection for LLM Agents	Agent Construction	2024	link
Planning, Creation, Usage: Benchmarking LLMs for Comprehensive Tool Utilization in Real-World Complex Scenarios	Agent Construction	2024	link
CAMEL: Communicative Agents for "Mind" Exploration of Large Language Model Society	Agent Construction	2023	link
AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation	Agent Construction	2023	link
AgentCoder: Multi-Agent-based Code Generation with Iterative Testing and Optimisation	Agent Construction	2023	link
War and Peace (WarAgent): Large Language Model-based Multi-Agent Simulation of World Wars	Agent Construction	2023	link
Describe, Explain, Plan and Select: Interactive Planning with LLMs Enables Open-World Multi-Task Agents	Agent Construction	2023	link
TPTU: Large Language Model-based AI Agents for Task Planning and Tool Usage	Agent Construction	2023	link
Evolutionary optimization of model merging recipes	Agent Evolution	2025	link
CREAM: Consistency Regularized Self-Rewarding Language Models	Agent Evolution	2025	link
KnowAgent: Knowledge-Augmented Planning for LLM-Based Agents	Agent Evolution	2025	link
Benchmark Self-Evolving: A Multi-Agent Framework for Dynamic LLM Evaluation	Agent Evolution	2024	link
Agent-Pro: Learning to Evolve via Policy-Level Reflection and Optimization	Agent Evolution	2024	link
Coevolving with the Other You: Fine-Tuning LLM with Sequential Cooperative Multi-Agent Reinforcement Learning	Agent Evolution	2024	link
A Survey on Self-Evolution of Large Language Models	Agent Evolution	2024	link
LLM-Evolve: Evaluation for LLM’s Evolving Capability on Benchmarks	Agent Evolution	2024	link
CRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing	Agent Evolution	2024	link
Iterative Translation Refinement with Large Language Models	Agent Evolution	2024	link
Agent Alignment in Evolving Social Norms	Agent Evolution	2024	link
Mitigating the Alignment Tax of RLHF	Agent Evolution	2024	link
Self-Rewarding Language Models	Agent Evolution	2024	link
V-STaR: Training Verifiers for Self-Taught Reasoners	Agent Evolution	2024	link
RLCD: Reinforcement learning from contrastive distillation for LM alignment	Agent Evolution	2024	link
LANGUAGE MODEL SELF-IMPROVEMENT BY REIN- FORCEMENT LEARNING CONTEMPLATION	Agent Evolution	2024	link
ProAgent: Building Proactive Cooperative Agents with Large Language Models	Agent Evolution	2024	link
Agent Planning with World Knowledge Model	Agent Evolution	2024	link
Refining Guideline Knowledge for Agent Planning Using Textgrad	Agent Evolution	2024	link
Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate	Agent Evolution	2024	link
LLMs in the Imaginarium: Tool Learning through Simulated Trial and Error	Agent Evolution	2024	link
AlpacaFarm: A Simulation Framework for Methods that Learn from Human Feedback	Agent Evolution	2023	link
SELF-REFINE: Iterative Refinement with Self-Feedback	Agent Evolution	2023	link
Self-Evolution Learning for Discriminative Language Model Pretraining	Agent Evolution	2023	link
Self-Evolved Diverse Data Sampling for Efficient Instruction Tuning	Agent Evolution	2023	link
SELFEVOLVE: A Code Evolution Framework via Large Language Models	Agent Evolution	2023	link
SELF-INSTRUCT: Aligning Language Models with Self-Generated Instructions	Agent Evolution	2023	link
Large Language Models are Better Reasoners with Self-Verification	Agent Evolution	2023	link
CODET: CODE GENERATION WITH GENERATED TESTS	Agent Evolution	2023	link
Evolving Diverse Red-team Language Models in Multi-round Multi-agent Games	Agent Evolution	2023	link
Improving Factuality and Reasoning in Language Models through Multiagent Debate	Agent Evolution	2023	link
CAMEL: Communicative Agents for "Mind" Exploration of Large Language Model Society	Agent Evolution	2023	link
STaR: Self-Taught Reasoner Bootstrapping Reasoning With Reasoning	Agent Evolution	2022	link
An active inference strategy for prompting reliable responses from large language models in medical practice	Applications	2025	link
An evaluation framework for clinical use of large language models in patient interaction tasks	Applications	2025	link
Large Language Models lack essential metacognition for reliable medical reasoning	Applications	2025	link
Balancing autonomy and expertise in autonomous synthesis laboratories	Applications	2025	link
Motif: Intrinsic Motivation from Artificial Intelligence Feedback	Applications	2024	link
Baba Is AI: Break the Rules to Beat the Benchmark	Applications	2024	link
Large language model-empowered agents for simulating macroeconomic activities	Applications	2024	link
CompeteAI: Understanding the Competition Dynamics in Large Language Model-based Agents	Applications	2024	link
Understanding the benefits and challenges of using large language model-based conversational agents for mental well-being support	Applications	2024	link
Exploring Collaboration Mechanisms for LLM Agents	Applications	2024	link
Simulating Human Society with Large Language Model Agents: City, Social Media, and Economic System	Applications	2024	link
Can large language models transform computational social science?	Applications	2024	link
AgentCF: Collaborative Learning with Autonomous Language Agents for Recommender Systems	Applications	2024	link
On Generative Agents in Recommendation	Applications	2024	link
ChatDev: Communicative Agents for Software Development	Applications	2024	link
CRISPR-GPT: An LLM Agent for Automated Design of Gene-Editing Experiments	Applications	2024	link
SciAgents: Automating Scientific Discovery Through Bioinspired Multi-Agent Intelligent Graph Reasoning	Applications	2024	link
Medical large language models are susceptible to targeted misinformation attacks	Applications	2024	link
Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agents	Applications	2023	link
Language Models Meet World Models: Embodied Experiences Enhance Language Models	Applications	2023	link
ChessGPT: Bridging Policy Learning and Language Modeling	Applications	2023	link
Mindagent: Emergent gaming interaction	Applications	2023	link
Exploring large language models for communication games: An empirical study on Werewolf	Applications	2023	link
Language as reality: a co-creative storytelling game experience in 1001 nights using generative AI	Applications	2023	link
TradingGPT: Multi-Agent System with Layered Memory and Distinct Characters for Enhanced Financial Trading Performance	Applications	2023	link
Using large language models to simulate multiple humans and replicate human subject studies	Applications	2023	link
Generative Agents: Interactive Simulacra of Human Behavior	Applications	2023	link
Self-collaboration Code Generation via ChatGPT	Applications	2023	link
Language models can solve computer tasks	Applications	2023	link
ChemCrow: Augmenting large-language models with chemistry tools	Applications	2023	link
AlphaFlow: autonomous discovery and optimization of multi-step chemistry using a self-driven fluidic lab guided by reinforcement learning	Applications	2023	link
Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents	Applications	2022	link
Stress-testing the resilience of the Austrian healthcare system using agent-based simulation	Applications	2022	link
AgentHarm: Benchmarking Robustness of LLM Agents on Harmful Tasks	Datasets & Benchmarks	2025	link
AI Hospital: Benchmarking Large Language Models in a Multi-agent Medical Interaction Simulator	Datasets & Benchmarks	2025	link
Benchmark Self-Evolving: A Multi-Agent Framework for Dynamic LLM Evaluation	Datasets & Benchmarks	2025	link
DCA-Bench: A Benchmark for Dataset Curation Agents	Datasets & Benchmarks	2025	link
MedAgentBench: A Realistic Virtual EHR Environment to Benchmark Medical LLM Agents	Datasets & Benchmarks	2025	link
MLE-Bench: Evaluating Machine Learning Agents on Machine Learning Engineering	Datasets & Benchmarks	2025	link
EgoLife: Towards Egocentric Life Assistant	Datasets & Benchmarks	2025	link
DSBench: How Far Are Data Science Agents to Becoming Data Science Experts?	Datasets & Benchmarks	2025	link
AgentBench: Evaluating LLMs as Agents	Datasets & Benchmarks	2024	link
AgentQuest: A Modular Benchmark Framework to Measure Progress and Improve LLM Agents	Datasets & Benchmarks	2024	link
BENCHAGENTS: Automated Benchmark Creation with Agent Interaction	Datasets & Benchmarks	2024	link
Benchmarking Data Science Agents	Datasets & Benchmarks	2024	link
Benchmarking Large Language Models as AI Research Agents	Datasets & Benchmarks	2024	link
Benchmarking Large Language Models for Multi-agent Systems: A Comparative Analysis of AutoGen, CrewAI, and TaskWeaver	Datasets & Benchmarks	2024	link
BLADE- Benchmarking Language Model Agents	Datasets & Benchmarks	2024	link
CRAB: Cross-platfrom agent benchmark for multi-modal embodied language model agents	Datasets & Benchmarks	2024	link
CToolEval: A Chinese Benchmark for LLM-Powered Agent Evaluation in Real-World API Interactions	Datasets & Benchmarks	2024	link
DA-Code: Agent Data Science Code Generation Benchmark for Large Language Models	Datasets & Benchmarks	2024	link
Embodied Agent Interface: Benchmarking LLMs for Embodied Decision Making	Datasets & Benchmarks	2024	link
GTA: A Benchmark for General Tool Agents	Datasets & Benchmarks	2024	link
LaMPilot: An Open Benchmark Dataset for Autonomous Driving with Language Model Programs	Datasets & Benchmarks	2024	link
ML Research Benchmark	Datasets & Benchmarks	2024	link
MMAU: A Holistic Benchmark of Agent Capabilities Across Diverse Domains	Datasets & Benchmarks	2024	link
OmniACT: A Dataset and Benchmark for Enabling Multimodal Generalist Autonomous Agents for Desktop and Web	Datasets & Benchmarks	2024	link
OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments	Datasets & Benchmarks	2024	link
Revisiting Benchmark and Assessment: An Agent-based Exploratory Dynamic Evaluation Framework for LLMs	Datasets & Benchmarks	2024	link
Seal-Tools: Self-instruct Tool Learning Dataset for Agent Tuning and Detailed Benchmark	Datasets & Benchmarks	2024	link
Tapilot-Crossing: Benchmarking and Evolving LLMs Towards Interactive Data Analysis Agents	Datasets & Benchmarks	2024	link
TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks	Datasets & Benchmarks	2024	link
Tur[k]ingBench: A Challenge Benchmark for Web Agents	Datasets & Benchmarks	2024	link
Agent-FLAN: Designing Data and Methods of Effective Agent Tuning for Large Language Models	Datasets & Benchmarks	2024	link
AgentBank: Towards Generalized LLM Agents via Fine-Tuning on 50000+ Interaction Trajectories	Datasets & Benchmarks	2024	link
AgentOhana: Design Unified Data and Training Pipeline for Effective Agent Learning	Datasets & Benchmarks	2024	link
AgentTuning: Enabling Generalized Agent Abilities for LLMs	Datasets & Benchmarks	2024	link
Executable Code Actions Elicit Better LLM Agents	Datasets & Benchmarks	2024	link
FireAct: Toward Language Agent Fine-tuning	Datasets & Benchmarks	2023	link
Medical large language models are vulnerable to data-poisoning attacks	Ethics	2025	link
Foundation Models and Fair Use	Ethics	2024	link
Estimating the Carbon Footprint of BLOOM, a 176B Parameter Language Model	Ethics	2023	link
LLaMA: Open and Efficient Foundation Language Models	Ethics	2023	link
Predictability and Surprise in Large Generative Models	Ethics	2022	link
On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜	Ethics	2021	link
Process for Adapting Language Models to Society (PALMS) with Values-Targeted Datasets	Ethics	2021	link
GPT-3: Its Nature, Scope, Limits, and Consequences	Ethics	2020	link
Energy and Policy Considerations for Modern Deep Learning Research	Ethics	2020	link
Defending Against Neural Fake News	Ethics	2019	link
RTBAS: Defending LLM Agents Against Prompt Injection and Privacy Leakage	Security	2025	link
Red-Teaming LLM Multi-Agent Systems via Communication Attacks	Security	2025	link
Unveiling Privacy Risks in LLM Agent Memory	Security	2025	link
AEIA-MN: Evaluating the Robustness of Multimodal LLM-Powered Mobile Agents Against Active Environmental Injection Attacks	Security	2025	link
Firewalls to Secure Dynamic LLM Agentic Networks	Security	2025	link
AUTOHIJACKER: AUTOMATIC INDIRECT PROMPT INJECTION AGAINST BLACK-BOX LLM AGENTS	Security	2025	link
AI Agents Under Threat: A Survey of Key Security Challenges and Future Pathways	Security	2025	link
DemonAgent: Dynamically Encrypted Multi-Backdoor Implantation Attack on LLM-based Agent	Security	2025	link
CORBA: Contagious Recursive Blocking Attacks on Multi-Agent Systems Based on Large Language Models	Security	2025	link
G-Safeguard: A Topology-Guided Security Lens and Treatment on LLM-based Multi-agent Systems	Security	2025	link
AgentHarm: Benchmarking Robustness of LLM Agents on Harmful Tasks	Security	2025	link
Commercial LLM Agents Are Already Vulnerable to Simple Yet Dangerous Attacks	Security	2025	link
Prompt Infection: LLM-to-LLM Prompt Injection within Multi-Agent Systems	Security	2025
LLM-based Multi-Agent Systems: Techniques and Business Perspectives	Security	2024	link
BlockAgents: Towards Byzantine-Robust LLM-Based Multi-Agent Coordination via Blockchain	Security	2024	link
PROMPT INFECTION: LLM-TO-LLM PROMPT INJECTION WITHIN MULTI-AGENT SYSTEMS	Security	2024	link
AgentDojo: A Dynamic Environment to Evaluate Prompt Injection Attacks and Defenses for LLM Agents	Security	2024	link
AGENTPOISON: Red-teaming LLM Agents via Poisoning Memory or Knowledge Bases	Security	2024	link
AutoDefense: Multi-Agent LLM Defense against Jailbreak Attacks	Security	2024	link
Imprompter- Tricking LLM Agents into Improper Tool Use	Security	2024	link
TARGETING THE CORE: A SIMPLE AND EFFECTIVE METHOD TO ATTACK RAG-BASED AGENTS VIA DIRECT LLM MANIPULATION	Security	2024	link
Prompt Injection as a Defense Against LLM-driven Cyberattacks	Security	2024	link
Evil Geniuses: Delving into the Safety of LLM-based Agents	Security	2024	link
AGENT SECURITY BENCH (ASB): FORMALIZING AND BENCHMARKING ATTACKS AND DEFENSES IN LLM-BASED AGENTS	Security	2024	link
AGENTHARM: A BENCHMARK FOR MEASURING HARMFULNESS OF LLM AGENTS	Security	2024	link
CLAS 2024: The Competition for LLM and Agent Safety	Security	2024	link
The Task Shield: Enforcing Task Alignment to Defend Against Indirect Prompt Injection in LLM Agents	Security	2024	link
WIPI: A New Web Threat for LLM-Driven Web Agents	Security	2024	link
Agent Smith: A Single Image Can Jailbreak One Million Multimodal LLM Agents Exponentially Fast	Security	2024	link
CORBA: Contagious Recursive Blocking Attacks on Multi-Agent Systems Based on Large Language Models	Security	2024	link
PsySafe: A Comprehensive Framework for Psychological-based Attack, Defense, and Evaluation of Multi-agent System Safety	Security	2024	link
Breaking ReAct Agents: Foot-in-the-Door Attack Will Get You In	Security	2024	link
AGENT-SAFETYBENCH: Evaluating the Safety of LLM Agents	Security	2024	link
INJECAGENT: Benchmarking Indirect Prompt Injections in Tool-Integrated Large Language Model Agents	Security	2024	link
PsySafe: A Comprehensive Framework for Psychological-based Attack, Defense, and Evaluation of Multi-agent System Safety	Security	2024	link
TrustAgent: Towards Safe and Trustworthy LLM-based Agents	Security	2024	link
Watch Out for Your Agents! Investigating Backdoor Threats to LLM-Based Agents	Security	2024	link
R-Judge: Benchmarking Safety Risk Awareness for LLM Agents	Security	2024	link
NetSafe: Exploring the Topological Safety of Multi-agent Networks	Security	2024	link
A Trembling House of Cards? Mapping Adversarial Attacks against Language Agents	Security	2024	link
Benchmark Evaluations, Applications, and Challenges of Large Vision Language Models: A Survey	Survey	2025	link
Commercial LLM Agents Are Already Vulnerable to Simple Yet Dangerous Attacks	Survey	2025	link
Multi-Agent Collaboration Mechanisms: A Survey of LLMs	Survey	2025	link
AI Agents Under Threat: A Survey of Key Security Challenges and Future Pathways	Survey	2025	link
Large Model Based Agents: State-of-the-Art, Cooperation Paradigms, Security and Privacy, and Future Trends	Survey	2024	link
Agent AI: Surveying the Horizons of Multimodal Interaction	Survey	2024	link
Large Language Model based Multi-Agents: A Survey of Progress and Challenges	Survey	2024	link
Large Multimodal Agents: A Survey	Survey	2024	link
Understanding the planning of LLM agents: A survey	Survey	2024	link
Computational Experiments Meet Large Language Model Based Agents: A Survey and Perspective	Survey	2024	link
Personal LLM Agents: Insights and Survey about the Capability, Efficiency and Security	Survey	2024	link
Large Model Based Agents: State-of-the-Art, Cooperation Paradigms, Security and Privacy, and Future Trends	Survey	2024	link
The Landscape of Emerging AI Agent Architectures for Reasoning, Planning, and Tool Calling: A Survey	Survey	2024	link
Exploring Large Language Model based Intelligent Agents: Definitions, Methods, and Prospects	Survey	2024	link
Position Paper: Agent AI Towards a Holistic Intelligence	Survey	2024	link
Large Language Model based Multi-Agents: A Survey of Progress and Challenges	Survey	2024	link
LLM With Tools: A Survey	Survey	2024	link
A Survey on the Memory Mechanism of Large Language Model based Agents	Survey	2024	link
Understanding the planning of LLM agents: A survey	Survey	2024	link
Large Language Model based Multi-Agents: A Survey of Progress and Challenges	Survey	2024	link
A Survey on Large Language Model-Based Game Agents	Survey	2024	link
Large Language Models and Games: A Survey and Roadmap	Survey	2024	link
Exploring Large Language Model based Intelligent Agents: Definitions, Methods, and Prospects	Survey	2024	link
Navigating the Risks: A Survey of Security, Privacy, and Ethics Threats in LLM-Based Agents	Survey	2024	link
Security of AI Agents	Survey	2024	link
PERSONAL LLM AGENTS: INSIGHTS AND SURVEY ABOUT THE CAPABILITY, EFFICIENCY AND SECURITY	Survey	2024	link
The Emerged Security and Privacy of LLM Agent: A Survey with Case Studies	Survey	2024	link
Inferring the Goals of Communicating Agents from Actions and Instructions	Survey	2024	link
Personal LLM Agents: Insights and Survey about the Capability, Efficiency and Security	Survey	2024	link
Recent advancements in LLM Red-Teaming: Techniques, Defenses, and Ethical Considerations	Survey	2024	link
Deconstructing The Ethics of Large Language Models from Long-standing Issues to New-emerging Dilemmas: A Surveyhttps://ui.adsabs.harvard.edu/	Survey	2024	link
A survey on large language model based autonomous agents	Survey	2023	link
The rise and potential of large language model based agents: a survey	Survey	2023	link
Large Language Model Alignment: A Survey	Survey	2023	link
Ethical and social risks of harm from Language Models	Survey	2021	link
On the Opportunities and Risks of Foundation Models	Survey	2021	link
Toward Trustworthy AI Development: Mechanisms for Supporting Verifiable Claims	Survey	2020	link
Actionable Auditing: Investigating the Impact of Publicly Naming Biased Performance Results of Commercial AI Products	Survey	2019	link
ToolCoder: A Systematic Code-Empowered Tool Learning Framework for Large Language Models	Tools	2025	link
Re-Invoke: Tool Invocation Rewriting for Zero-Shot Tool Retrieval	Tools	2024	link
Chain of Tools: Large Language Model is an Automatic Multi-tool Learner	Tools	2024	link
EASYTOOL: Enhancing LLM-based Agents with Concise Tool Instruction	Tools	2024	link
ToolGen: Unified Tool Retrieval and Calling via Generation	Tools	2024	link
ToolNet: Connecting Large Language Models with Massive Tools via Tool Graph	Tools	2024	link
ToolPlanner: A Tool Augmented LLM for Multi Granularity Instructions with Path Planning and Feedback	Tools	2024	link
Making Language Models Better Tool Learners with Execution Feedback	Tools	2024	link
Leveraging Large Language Models to Improve REST API Testing	Tools	2024	link
LLMs in the Imaginarium: Tool Learning through Simulated Trial and Error	Tools	2024	link
Skills-in-Context: Unlocking Compositionality in Large Language Models	Tools	2024	link
TaskMatrix.AI: Completing Tasks by Connecting Foundation Models with Millions of APIs	Tools	2024	link
Gorilla: Large Language Model Connected with Massive APIs	Tools	2024	link
LARGE LANGUAGE MODELS AS TOOL MAKERS	Tools	2024	link
Multi-Agent Collaboration: Harnessing the Power of Intelligent LLM Agents	Tools	2023	link
Recommender AI Agent: Integrating Large Language Models for Interactive Recommendations	Tools	2023	link
ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs	Tools	2023	link
TPTU-v2: Boosting Task Planning and Tool Usage of Large Language Model-based Agents in Real-world Systems	Tools	2023	link
TPTU: Large Language Model-based AI Agents for Task Planning and Tool Usage	Tools	2023	link
GPT4Tools: Teaching Large Language Model to Use Tools via Self-instruction	Tools	2023	link
API-Bank: A Comprehensive Benchmark for Tool-Augmented LLMs	Tools	2023	link
ChatCoT: Tool-Augmented Chain-of-Thought Reasoning on Chat-based Large Language Models	Tools	2023	link
ToolQA: A Dataset for LLM Question Answering with External Tools	Tools	2023	link
On the Tool Manipulation Capability of Open-source Large Language Models	Tools	2023	link
RestGPT: Connecting Large Language Models with Real-World RESTful APIs	Tools	2023	link
Toolformer: Language Models Can Teach Themselves to Use Tools	Tools	2023	link
WebCPM: Interactive Web Search for Chinese Long-form Question Answering	Tools	2023	link
ToolCoder: Teach Code Generation Models to use API search tools	Tools	2023	link
ToolAlpaca: Generalized Tool Learning for Language Models with 3000 Simulated Cases	Tools	2023	link
ToolkenGPT: Augmenting Frozen Language Models with Massive Tools via Tool Embeddings	Tools	2023	link
MultiTool-CoT: GPT-3 Can Use Multiple External Tools with Chain of Thought Prompting	Tools	2023	link
CREATOR: Tool Creation for Disentangling Abstract and Concrete Reasoning of Large Language Models	Tools	2023	link
GEAR: Augmenting Language Models with Generalizable and Efficient Tool Resolution	Tools	2023	link
Dify	Tools	2023	link
LangChain	Tools	2023	link
WebGPT: Browser-assisted question-answering with human feedback	Tools	2022	link
Task Bench: A Parameterized Benchmark for Evaluating Parallel Runtime Performance	Tools	2020	link

🤝 Contributing

We welcome contributions to expand our collection. You can:

Submit a pull request to add papers or resources
Open an issue to suggest additional papers or resources
Submit your paper at our submission form or email us at [email protected]

We regularly update the repository to include new research.

📝 Citation

If you find our survey helpful, please consider citing our work:


@article{agentsurvey2025,
  title={Large Language Model Agent: A Survey on Methodology, Applications and Challenges},
  author={Junyu Luo and Weizhi Zhang and Ye Yuan and Yusheng Zhao and Junwei Yang and Yiyang Gu and Bohan Wu and Binqi Chen and Ziyue Qiao and Qingqing Long and Rongcheng Tu and Xiao Luo and Wei Ju and Zhiping Xiao and Yifan Wang and Meng Xiao and Chenwu Liu and Jingyang Yuan and Shichang Zhang and Yiqiao Jin and Fan Zhang and Xian Wu and Hanqing Zhao and Dacheng Tao and Philip S. Yu and Ming Zhang},
  journal={arXiv preprint arXiv:2503.21460},
  year={2025}
}

For questions or suggestions, please open an issue or contact the repository maintainers.

For Tasks:

Click tags to check more tools for each tasks

For Jobs:

Alternative AI tools for Awesome-Agent-Papers

Similar Open Source Tools

Awesome-Agent-Papers

github

: 98

Awesome_LLM_System-PaperList

Since the emergence of chatGPT in 2022, the acceleration of Large Language Model has become increasingly important. Here is a list of papers on LLMs inference and serving.

github

: 184

Awesome-Resource-Efficient-LLM-Papers

A curated list of high-quality papers on resource-efficient Large Language Models (LLMs) with a focus on various aspects such as architecture design, pre-training, fine-tuning, inference, system design, and evaluation metrics. The repository covers topics like efficient transformer architectures, non-transformer architectures, memory efficiency, data efficiency, model compression, dynamic acceleration, deployment optimization, support infrastructure, and other related systems. It also provides detailed information on computation metrics, memory metrics, energy metrics, financial cost metrics, network communication metrics, and other metrics relevant to resource-efficient LLMs. The repository includes benchmarks for evaluating the efficiency of NLP models and references for further reading.

github

: 105

Awesome-LLM4IE-Papers

github

: 645

Awesome-LLM-Papers-Comprehensive-Topics

github

: 172

LLM4EC

LLM4EC is an interdisciplinary research repository focusing on the intersection of Large Language Models (LLM) and Evolutionary Computation (EC). It provides a comprehensive collection of papers and resources exploring various applications, enhancements, and synergies between LLM and EC. The repository covers topics such as LLM-assisted optimization, EA-based LLM architecture search, and applications in code generation, software engineering, neural architecture search, and other generative tasks. The goal is to facilitate research and development in leveraging LLM and EC for innovative solutions in diverse domains.

github

: 78

ai-game-devtools

github

: 735

AudioLLM

AudioLLMs is a curated collection of research papers focusing on developing, implementing, and evaluating language models for audio data. The repository aims to provide researchers and practitioners with a comprehensive resource to explore the latest advancements in AudioLLMs. It includes models for speech interaction, speech recognition, speech translation, audio generation, and more. Additionally, it covers methodologies like multitask audioLLMs and segment-level Q-Former, as well as evaluation benchmarks like AudioBench and AIR-Bench. Adversarial attacks such as VoiceJailbreak are also discussed.

github

: 71

ai-game-development-tools

Here we will keep track of the AI Game Development Tools, including LLM, Agent, Code, Writer, Image, Texture, Shader, 3D Model, Animation, Video, Audio, Music, Singing Voice and Analytics. 🔥 * Tool (AI LLM) * Game (Agent) * Code * Framework * Writer * Image * Texture * Shader * 3D Model * Avatar * Animation * Video * Audio * Music * Singing Voice * Speech * Analytics * Video Tool

github

: 312

ai-reference-models

The Intel® AI Reference Models repository contains links to pre-trained models, sample scripts, best practices, and tutorials for popular open-source machine learning models optimized by Intel to run on Intel® Xeon® Scalable processors and Intel® Data Center GPUs. The purpose is to quickly replicate complete software environments showcasing the AI capabilities of Intel platforms. It includes optimizations for popular deep learning frameworks like TensorFlow and PyTorch, with additional plugins/extensions for improved performance. The repository is licensed under Apache License Version 2.0.

github

: 676

models

The Intel® AI Reference Models repository contains links to pre-trained models, sample scripts, best practices, and tutorials for popular open-source machine learning models optimized by Intel to run on Intel® Xeon® Scalable processors and Intel® Data Center GPUs. It aims to replicate the best-known performance of target model/dataset combinations in optimally-configured hardware environments. The repository will be deprecated upon the publication of v3.2.0 and will no longer be maintained or published.

github

: 669

Cool-GenAI-Fashion-Papers

Cool-GenAI-Fashion-Papers is a curated list of resources related to GenAI-Fashion, including papers, workshops, companies, and products. It covers a wide range of topics such as fashion design synthesis, outfit recommendation, fashion knowledge extraction, trend analysis, and more. The repository provides valuable insights and resources for researchers, industry professionals, and enthusiasts interested in the intersection of AI and fashion.

github

: 129

LLM-PlayLab

LLM-PlayLab is a repository containing various projects related to LLM (Large Language Models) fine-tuning, generative AI, time-series forecasting, and crash courses. It includes projects for text generation, sentiment analysis, data analysis, chat assistants, image captioning, and more. The repository offers a wide range of tools and resources for exploring and implementing advanced AI techniques.

github

: 105

awesome-llm-planning-reasoning

The 'Awesome LLMs Planning Reasoning' repository is a curated collection focusing on exploring the capabilities of Large Language Models (LLMs) in planning and reasoning tasks. It includes research papers, code repositories, and benchmarks that delve into innovative techniques, reasoning limitations, and standardized evaluations related to LLMs' performance in complex cognitive tasks. The repository serves as a comprehensive resource for researchers, developers, and enthusiasts interested in understanding the advancements and challenges in leveraging LLMs for planning and reasoning in real-world scenarios.

github

: 117

LLM4Opt

LLM4Opt is a collection of references and papers focusing on applying Large Language Models (LLMs) for diverse optimization tasks. The repository includes research papers, tutorials, workshops, competitions, and related collections related to LLMs in optimization. It covers a wide range of topics such as algorithm search, code generation, machine learning, science, industry, and more. The goal is to provide a comprehensive resource for researchers and practitioners interested in leveraging LLMs for optimization tasks.

github

: 125

Data-and-AI-Concepts

This repository is a curated collection of data science and AI concepts and IQs, covering topics from foundational mathematics to cutting-edge generative AI concepts. It aims to support learners and professionals preparing for various data science roles by providing detailed explanations and notebooks for each concept.

github

: 152

For similar tasks

No tools available

For similar jobs

No tools available