
Awesome-Agent-Papers
[Up-to-date] Large Language Model Agent: A Survey on Methodology, Applications and Challenges
Stars: 98

README:
This repository contains a comprehensive collection of research papers on Large Language Model (LLM) agents. We organize papers across key categories including agent construction, collaboration mechanisms, evolution, tools, security, benchmarks, and applications.
Our taxonomy provides a structured framework for understanding the rapidly evolving field of LLM agents, from architectural foundations to practical implementations. The repository bridges fragmented research threads by highlighting connections between agent design principles and emergent behaviors.
Our survey covers the rapidly evolving field of LLM agents, with a significant increase in research publications since 2023.
- Agent Construction: Methodologies and architectures for building LLM agents
- Agent Collaboration: Frameworks for multi-agent interaction and cooperation
- Agent Evolution: Self-improvement and learning capabilities of agents
- Tools: Integration of external tools and APIs with LLM agents
- Security: Security concerns and protections for LLM agent systems
- Benchmarks: Evaluation frameworks and datasets for testing agent capabilities
- Applications: Real-world implementations and use cases
Title | Section_or_Category | Year | url |
---|---|---|---|
Adaptive Collaboration Strategy for LLMs in Medical Decision Making | Agent Collaboration | 2024 | link |
ReConcile: Round-Table Conference Improves Reasoning via Consensus among Diverse LLMs | Agent Collaboration | 2024 | link |
MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework | Agent Collaboration | 2024 | link |
Debating with More Persuasive LLMs Leads to More Truthful Answers | Agent Collaboration | 2024 | link |
Roco: Dialectic multi-robot collaboration with large language models | Agent Collaboration | 2024 | link |
AutoAct: Automatic Agent Learning from Scratch for QA via Self-Planning | Agent Collaboration | 2024 | link |
Meta-Prompting: Enhancing Language Models with Task-Agnostic Scaffolding | Agent Collaboration | 2024 | link |
Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate | Agent Collaboration | 2024 | link |
AgentVerse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors | Agent Collaboration | 2024 | link |
Dynamic LLM-Powered Agent Network for Task-Oriented Agent Collaboration | Agent Collaboration | 2024 | link |
ChatDev: Communicative Agents for Software Development | Agent Collaboration | 2024 | link |
ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate | Agent Collaboration | 2024 | link |
A Dynamic LLM-Powered Agent Network for Task-Oriented Agent Collaboration | Agent Collaboration | 2024 | link |
AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation | Agent Collaboration | 2023 | link |
Improving Factuality and Reasoning in Language Models through Multiagent Debate | Agent Collaboration | 2023 | link |
Autonomous chemical research with large language models | Agent Collaboration | 2023 | link |
Planning with Multi-Constraints via Collaborative Language Agents | Agent Construction | 2025 | link |
Embodied Agent Interface: Benchmarking LLMs for Embodied Decision Making | Agent Construction | 2025 | link |
AutoAgents: A Framework for Automatic Agent Generation | Agent Construction | 2024 | link |
MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework | Agent Construction | 2024 | link |
Cognitive Architectures for Language Agents | Agent Construction | 2024 | link |
Executable Code Actions Elicit Better LLM Agents | Agent Construction | 2024 | link |
ChatDev: Communicative Agents for Software Development | Agent Construction | 2024 | link |
Editable Scene Simulation for Autonomous Driving via Collaborative LLM-Agents | Agent Construction | 2024 | link |
A Dynamic LLM-Powered Agent Network for Task-Oriented Agent Collaboration | Agent Construction | 2024 | link |
More Agents Is All You Need | Agent Construction | 2024 | link |
Agent Hospital: A Simulacrum of Hospital with Evolvable Medical Agents | Agent Construction | 2024 | link |
Empowering biomedical discovery with AI agents | Agent Construction | 2024 | link |
SMART-LLM: Smart Multi-Agent Robot Task Planning using Large Language Models | Agent Construction | 2024 | link |
Perceive, Reflect, and Plan: Designing LLM Agent for Goal-Directed City Navigation without Instructions | Agent Construction | 2024 | link |
Enhancing the General Agent Capabilities of Low-Parameter LLMs through Tuning and Multi-Branch Reasoning | Agent Construction | 2024 | link |
PlanCritic: Formal Planning with Human Feedback | Agent Construction | 2024 | link |
Enhancing Robot Task Planning: Integrating Environmental Information and Feedback Insights through Large Language Models | Agent Construction | 2024 | link |
Devil's Advocate: Anticipatory Reflection for LLM Agents | Agent Construction | 2024 | link |
Planning, Creation, Usage: Benchmarking LLMs for Comprehensive Tool Utilization in Real-World Complex Scenarios | Agent Construction | 2024 | link |
CAMEL: Communicative Agents for "Mind" Exploration of Large Language Model Society | Agent Construction | 2023 | link |
AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation | Agent Construction | 2023 | link |
AgentCoder: Multi-Agent-based Code Generation with Iterative Testing and Optimisation | Agent Construction | 2023 | link |
War and Peace (WarAgent): Large Language Model-based Multi-Agent Simulation of World Wars | Agent Construction | 2023 | link |
Describe, Explain, Plan and Select: Interactive Planning with LLMs Enables Open-World Multi-Task Agents | Agent Construction | 2023 | link |
TPTU: Large Language Model-based AI Agents for Task Planning and Tool Usage | Agent Construction | 2023 | link |
Evolutionary optimization of model merging recipes | Agent Evolution | 2025 | link |
CREAM: Consistency Regularized Self-Rewarding Language Models | Agent Evolution | 2025 | link |
KnowAgent: Knowledge-Augmented Planning for LLM-Based Agents | Agent Evolution | 2025 | link |
Benchmark Self-Evolving: A Multi-Agent Framework for Dynamic LLM Evaluation | Agent Evolution | 2024 | link |
Agent-Pro: Learning to Evolve via Policy-Level Reflection and Optimization | Agent Evolution | 2024 | link |
Coevolving with the Other You: Fine-Tuning LLM with Sequential Cooperative Multi-Agent Reinforcement Learning | Agent Evolution | 2024 | link |
A Survey on Self-Evolution of Large Language Models | Agent Evolution | 2024 | link |
LLM-Evolve: Evaluation for LLM’s Evolving Capability on Benchmarks | Agent Evolution | 2024 | link |
CRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing | Agent Evolution | 2024 | link |
Iterative Translation Refinement with Large Language Models | Agent Evolution | 2024 | link |
Agent Alignment in Evolving Social Norms | Agent Evolution | 2024 | link |
Mitigating the Alignment Tax of RLHF | Agent Evolution | 2024 | link |
Self-Rewarding Language Models | Agent Evolution | 2024 | link |
V-STaR: Training Verifiers for Self-Taught Reasoners | Agent Evolution | 2024 | link |
RLCD: Reinforcement learning from contrastive distillation for LM alignment | Agent Evolution | 2024 | link |
LANGUAGE MODEL SELF-IMPROVEMENT BY REIN- FORCEMENT LEARNING CONTEMPLATION | Agent Evolution | 2024 | link |
ProAgent: Building Proactive Cooperative Agents with Large Language Models | Agent Evolution | 2024 | link |
Agent Planning with World Knowledge Model | Agent Evolution | 2024 | link |
Refining Guideline Knowledge for Agent Planning Using Textgrad | Agent Evolution | 2024 | link |
Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate | Agent Evolution | 2024 | link |
LLMs in the Imaginarium: Tool Learning through Simulated Trial and Error | Agent Evolution | 2024 | link |
AlpacaFarm: A Simulation Framework for Methods that Learn from Human Feedback | Agent Evolution | 2023 | link |
SELF-REFINE: Iterative Refinement with Self-Feedback | Agent Evolution | 2023 | link |
Self-Evolution Learning for Discriminative Language Model Pretraining | Agent Evolution | 2023 | link |
Self-Evolved Diverse Data Sampling for Efficient Instruction Tuning | Agent Evolution | 2023 | link |
SELFEVOLVE: A Code Evolution Framework via Large Language Models | Agent Evolution | 2023 | link |
SELF-INSTRUCT: Aligning Language Models with Self-Generated Instructions | Agent Evolution | 2023 | link |
Large Language Models are Better Reasoners with Self-Verification | Agent Evolution | 2023 | link |
CODET: CODE GENERATION WITH GENERATED TESTS | Agent Evolution | 2023 | link |
Evolving Diverse Red-team Language Models in Multi-round Multi-agent Games | Agent Evolution | 2023 | link |
Improving Factuality and Reasoning in Language Models through Multiagent Debate | Agent Evolution | 2023 | link |
CAMEL: Communicative Agents for "Mind" Exploration of Large Language Model Society | Agent Evolution | 2023 | link |
STaR: Self-Taught Reasoner Bootstrapping Reasoning With Reasoning | Agent Evolution | 2022 | link |
An active inference strategy for prompting reliable responses from large language models in medical practice | Applications | 2025 | link |
An evaluation framework for clinical use of large language models in patient interaction tasks | Applications | 2025 | link |
Large Language Models lack essential metacognition for reliable medical reasoning | Applications | 2025 | link |
Balancing autonomy and expertise in autonomous synthesis laboratories | Applications | 2025 | link |
Motif: Intrinsic Motivation from Artificial Intelligence Feedback | Applications | 2024 | link |
Baba Is AI: Break the Rules to Beat the Benchmark | Applications | 2024 | link |
Large language model-empowered agents for simulating macroeconomic activities | Applications | 2024 | link |
CompeteAI: Understanding the Competition Dynamics in Large Language Model-based Agents | Applications | 2024 | link |
Understanding the benefits and challenges of using large language model-based conversational agents for mental well-being support | Applications | 2024 | link |
Exploring Collaboration Mechanisms for LLM Agents | Applications | 2024 | link |
Simulating Human Society with Large Language Model Agents: City, Social Media, and Economic System | Applications | 2024 | link |
Can large language models transform computational social science? | Applications | 2024 | link |
AgentCF: Collaborative Learning with Autonomous Language Agents for Recommender Systems | Applications | 2024 | link |
On Generative Agents in Recommendation | Applications | 2024 | link |
ChatDev: Communicative Agents for Software Development | Applications | 2024 | link |
CRISPR-GPT: An LLM Agent for Automated Design of Gene-Editing Experiments | Applications | 2024 | link |
SciAgents: Automating Scientific Discovery Through Bioinspired Multi-Agent Intelligent Graph Reasoning | Applications | 2024 | link |
Medical large language models are susceptible to targeted misinformation attacks | Applications | 2024 | link |
Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agents | Applications | 2023 | link |
Language Models Meet World Models: Embodied Experiences Enhance Language Models | Applications | 2023 | link |
ChessGPT: Bridging Policy Learning and Language Modeling | Applications | 2023 | link |
Mindagent: Emergent gaming interaction | Applications | 2023 | link |
Exploring large language models for communication games: An empirical study on Werewolf | Applications | 2023 | link |
Language as reality: a co-creative storytelling game experience in 1001 nights using generative AI | Applications | 2023 | link |
TradingGPT: Multi-Agent System with Layered Memory and Distinct Characters for Enhanced Financial Trading Performance | Applications | 2023 | link |
Using large language models to simulate multiple humans and replicate human subject studies | Applications | 2023 | link |
Generative Agents: Interactive Simulacra of Human Behavior | Applications | 2023 | link |
Self-collaboration Code Generation via ChatGPT | Applications | 2023 | link |
Language models can solve computer tasks | Applications | 2023 | link |
ChemCrow: Augmenting large-language models with chemistry tools | Applications | 2023 | link |
AlphaFlow: autonomous discovery and optimization of multi-step chemistry using a self-driven fluidic lab guided by reinforcement learning | Applications | 2023 | link |
Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents | Applications | 2022 | link |
Stress-testing the resilience of the Austrian healthcare system using agent-based simulation | Applications | 2022 | link |
AgentHarm: Benchmarking Robustness of LLM Agents on Harmful Tasks | Datasets & Benchmarks | 2025 | link |
AI Hospital: Benchmarking Large Language Models in a Multi-agent Medical Interaction Simulator | Datasets & Benchmarks | 2025 | link |
Benchmark Self-Evolving: A Multi-Agent Framework for Dynamic LLM Evaluation | Datasets & Benchmarks | 2025 | link |
DCA-Bench: A Benchmark for Dataset Curation Agents | Datasets & Benchmarks | 2025 | link |
MedAgentBench: A Realistic Virtual EHR Environment to Benchmark Medical LLM Agents | Datasets & Benchmarks | 2025 | link |
MLE-Bench: Evaluating Machine Learning Agents on Machine Learning Engineering | Datasets & Benchmarks | 2025 | link |
EgoLife: Towards Egocentric Life Assistant | Datasets & Benchmarks | 2025 | link |
DSBench: How Far Are Data Science Agents to Becoming Data Science Experts? | Datasets & Benchmarks | 2025 | link |
AgentBench: Evaluating LLMs as Agents | Datasets & Benchmarks | 2024 | link |
AgentQuest: A Modular Benchmark Framework to Measure Progress and Improve LLM Agents | Datasets & Benchmarks | 2024 | link |
BENCHAGENTS: Automated Benchmark Creation with Agent Interaction | Datasets & Benchmarks | 2024 | link |
Benchmarking Data Science Agents | Datasets & Benchmarks | 2024 | link |
Benchmarking Large Language Models as AI Research Agents | Datasets & Benchmarks | 2024 | link |
Benchmarking Large Language Models for Multi-agent Systems: A Comparative Analysis of AutoGen, CrewAI, and TaskWeaver | Datasets & Benchmarks | 2024 | link |
BLADE- Benchmarking Language Model Agents | Datasets & Benchmarks | 2024 | link |
CRAB: Cross-platfrom agent benchmark for multi-modal embodied language model agents | Datasets & Benchmarks | 2024 | link |
CToolEval: A Chinese Benchmark for LLM-Powered Agent Evaluation in Real-World API Interactions | Datasets & Benchmarks | 2024 | link |
DA-Code: Agent Data Science Code Generation Benchmark for Large Language Models | Datasets & Benchmarks | 2024 | link |
Embodied Agent Interface: Benchmarking LLMs for Embodied Decision Making | Datasets & Benchmarks | 2024 | link |
GTA: A Benchmark for General Tool Agents | Datasets & Benchmarks | 2024 | link |
LaMPilot: An Open Benchmark Dataset for Autonomous Driving with Language Model Programs | Datasets & Benchmarks | 2024 | link |
ML Research Benchmark | Datasets & Benchmarks | 2024 | link |
MMAU: A Holistic Benchmark of Agent Capabilities Across Diverse Domains | Datasets & Benchmarks | 2024 | link |
OmniACT: A Dataset and Benchmark for Enabling Multimodal Generalist Autonomous Agents for Desktop and Web | Datasets & Benchmarks | 2024 | link |
OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments | Datasets & Benchmarks | 2024 | link |
Revisiting Benchmark and Assessment: An Agent-based Exploratory Dynamic Evaluation Framework for LLMs | Datasets & Benchmarks | 2024 | link |
Seal-Tools: Self-instruct Tool Learning Dataset for Agent Tuning and Detailed Benchmark | Datasets & Benchmarks | 2024 | link |
Tapilot-Crossing: Benchmarking and Evolving LLMs Towards Interactive Data Analysis Agents | Datasets & Benchmarks | 2024 | link |
TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks | Datasets & Benchmarks | 2024 | link |
Tur[k]ingBench: A Challenge Benchmark for Web Agents | Datasets & Benchmarks | 2024 | link |
Agent-FLAN: Designing Data and Methods of Effective Agent Tuning for Large Language Models | Datasets & Benchmarks | 2024 | link |
AgentBank: Towards Generalized LLM Agents via Fine-Tuning on 50000+ Interaction Trajectories | Datasets & Benchmarks | 2024 | link |
AgentOhana: Design Unified Data and Training Pipeline for Effective Agent Learning | Datasets & Benchmarks | 2024 | link |
AgentTuning: Enabling Generalized Agent Abilities for LLMs | Datasets & Benchmarks | 2024 | link |
Executable Code Actions Elicit Better LLM Agents | Datasets & Benchmarks | 2024 | link |
FireAct: Toward Language Agent Fine-tuning | Datasets & Benchmarks | 2023 | link |
Medical large language models are vulnerable to data-poisoning attacks | Ethics | 2025 | link |
Foundation Models and Fair Use | Ethics | 2024 | link |
Estimating the Carbon Footprint of BLOOM, a 176B Parameter Language Model | Ethics | 2023 | link |
LLaMA: Open and Efficient Foundation Language Models | Ethics | 2023 | link |
Predictability and Surprise in Large Generative Models | Ethics | 2022 | link |
On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜 | Ethics | 2021 | link |
Process for Adapting Language Models to Society (PALMS) with Values-Targeted Datasets | Ethics | 2021 | link |
GPT-3: Its Nature, Scope, Limits, and Consequences | Ethics | 2020 | link |
Energy and Policy Considerations for Modern Deep Learning Research | Ethics | 2020 | link |
Defending Against Neural Fake News | Ethics | 2019 | link |
RTBAS: Defending LLM Agents Against Prompt Injection and Privacy Leakage | Security | 2025 | link |
Red-Teaming LLM Multi-Agent Systems via Communication Attacks | Security | 2025 | link |
Unveiling Privacy Risks in LLM Agent Memory | Security | 2025 | link |
AEIA-MN: Evaluating the Robustness of Multimodal LLM-Powered Mobile Agents Against Active Environmental Injection Attacks | Security | 2025 | link |
Firewalls to Secure Dynamic LLM Agentic Networks | Security | 2025 | link |
AUTOHIJACKER: AUTOMATIC INDIRECT PROMPT INJECTION AGAINST BLACK-BOX LLM AGENTS | Security | 2025 | link |
AI Agents Under Threat: A Survey of Key Security Challenges and Future Pathways | Security | 2025 | link |
DemonAgent: Dynamically Encrypted Multi-Backdoor Implantation Attack on LLM-based Agent | Security | 2025 | link |
CORBA: Contagious Recursive Blocking Attacks on Multi-Agent Systems Based on Large Language Models | Security | 2025 | link |
G-Safeguard: A Topology-Guided Security Lens and Treatment on LLM-based Multi-agent Systems | Security | 2025 | link |
AgentHarm: Benchmarking Robustness of LLM Agents on Harmful Tasks | Security | 2025 | link |
Commercial LLM Agents Are Already Vulnerable to Simple Yet Dangerous Attacks | Security | 2025 | link |
Prompt Infection: LLM-to-LLM Prompt Injection within Multi-Agent Systems | Security | 2025 | |
LLM-based Multi-Agent Systems: Techniques and Business Perspectives | Security | 2024 | link |
BlockAgents: Towards Byzantine-Robust LLM-Based Multi-Agent Coordination via Blockchain | Security | 2024 | link |
PROMPT INFECTION: LLM-TO-LLM PROMPT INJECTION WITHIN MULTI-AGENT SYSTEMS | Security | 2024 | link |
AgentDojo: A Dynamic Environment to Evaluate Prompt Injection Attacks and Defenses for LLM Agents | Security | 2024 | link |
AGENTPOISON: Red-teaming LLM Agents via Poisoning Memory or Knowledge Bases | Security | 2024 | link |
AutoDefense: Multi-Agent LLM Defense against Jailbreak Attacks | Security | 2024 | link |
Imprompter- Tricking LLM Agents into Improper Tool Use | Security | 2024 | link |
TARGETING THE CORE: A SIMPLE AND EFFECTIVE METHOD TO ATTACK RAG-BASED AGENTS VIA DIRECT LLM MANIPULATION | Security | 2024 | link |
Prompt Injection as a Defense Against LLM-driven Cyberattacks | Security | 2024 | link |
Evil Geniuses: Delving into the Safety of LLM-based Agents | Security | 2024 | link |
AGENT SECURITY BENCH (ASB): FORMALIZING AND BENCHMARKING ATTACKS AND DEFENSES IN LLM-BASED AGENTS | Security | 2024 | link |
AGENTHARM: A BENCHMARK FOR MEASURING HARMFULNESS OF LLM AGENTS | Security | 2024 | link |
CLAS 2024: The Competition for LLM and Agent Safety | Security | 2024 | link |
The Task Shield: Enforcing Task Alignment to Defend Against Indirect Prompt Injection in LLM Agents | Security | 2024 | link |
WIPI: A New Web Threat for LLM-Driven Web Agents | Security | 2024 | link |
Agent Smith: A Single Image Can Jailbreak One Million Multimodal LLM Agents Exponentially Fast | Security | 2024 | link |
CORBA: Contagious Recursive Blocking Attacks on Multi-Agent Systems Based on Large Language Models | Security | 2024 | link |
PsySafe: A Comprehensive Framework for Psychological-based Attack, Defense, and Evaluation of Multi-agent System Safety | Security | 2024 | link |
Breaking ReAct Agents: Foot-in-the-Door Attack Will Get You In | Security | 2024 | link |
AGENT-SAFETYBENCH: Evaluating the Safety of LLM Agents | Security | 2024 | link |
INJECAGENT: Benchmarking Indirect Prompt Injections in Tool-Integrated Large Language Model Agents | Security | 2024 | link |
PsySafe: A Comprehensive Framework for Psychological-based Attack, Defense, and Evaluation of Multi-agent System Safety | Security | 2024 | link |
TrustAgent: Towards Safe and Trustworthy LLM-based Agents | Security | 2024 | link |
Watch Out for Your Agents! Investigating Backdoor Threats to LLM-Based Agents | Security | 2024 | link |
R-Judge: Benchmarking Safety Risk Awareness for LLM Agents | Security | 2024 | link |
NetSafe: Exploring the Topological Safety of Multi-agent Networks | Security | 2024 | link |
A Trembling House of Cards? Mapping Adversarial Attacks against Language Agents | Security | 2024 | link |
Benchmark Evaluations, Applications, and Challenges of Large Vision Language Models: A Survey | Survey | 2025 | link |
Commercial LLM Agents Are Already Vulnerable to Simple Yet Dangerous Attacks | Survey | 2025 | link |
Multi-Agent Collaboration Mechanisms: A Survey of LLMs | Survey | 2025 | link |
AI Agents Under Threat: A Survey of Key Security Challenges and Future Pathways | Survey | 2025 | link |
Large Model Based Agents: State-of-the-Art, Cooperation Paradigms, Security and Privacy, and Future Trends | Survey | 2024 | link |
Agent AI: Surveying the Horizons of Multimodal Interaction | Survey | 2024 | link |
Large Language Model based Multi-Agents: A Survey of Progress and Challenges | Survey | 2024 | link |
Large Multimodal Agents: A Survey | Survey | 2024 | link |
Understanding the planning of LLM agents: A survey | Survey | 2024 | link |
Computational Experiments Meet Large Language Model Based Agents: A Survey and Perspective | Survey | 2024 | link |
Personal LLM Agents: Insights and Survey about the Capability, Efficiency and Security | Survey | 2024 | link |
Large Model Based Agents: State-of-the-Art, Cooperation Paradigms, Security and Privacy, and Future Trends | Survey | 2024 | link |
The Landscape of Emerging AI Agent Architectures for Reasoning, Planning, and Tool Calling: A Survey | Survey | 2024 | link |
Exploring Large Language Model based Intelligent Agents: Definitions, Methods, and Prospects | Survey | 2024 | link |
Position Paper: Agent AI Towards a Holistic Intelligence | Survey | 2024 | link |
Large Language Model based Multi-Agents: A Survey of Progress and Challenges | Survey | 2024 | link |
LLM With Tools: A Survey | Survey | 2024 | link |
A Survey on the Memory Mechanism of Large Language Model based Agents | Survey | 2024 | link |
Understanding the planning of LLM agents: A survey | Survey | 2024 | link |
Large Language Model based Multi-Agents: A Survey of Progress and Challenges | Survey | 2024 | link |
A Survey on Large Language Model-Based Game Agents | Survey | 2024 | link |
Large Language Models and Games: A Survey and Roadmap | Survey | 2024 | link |
Exploring Large Language Model based Intelligent Agents: Definitions, Methods, and Prospects | Survey | 2024 | link |
Navigating the Risks: A Survey of Security, Privacy, and Ethics Threats in LLM-Based Agents | Survey | 2024 | link |
Security of AI Agents | Survey | 2024 | link |
PERSONAL LLM AGENTS: INSIGHTS AND SURVEY ABOUT THE CAPABILITY, EFFICIENCY AND SECURITY | Survey | 2024 | link |
The Emerged Security and Privacy of LLM Agent: A Survey with Case Studies | Survey | 2024 | link |
Inferring the Goals of Communicating Agents from Actions and Instructions | Survey | 2024 | link |
Personal LLM Agents: Insights and Survey about the Capability, Efficiency and Security | Survey | 2024 | link |
Recent advancements in LLM Red-Teaming: Techniques, Defenses, and Ethical Considerations | Survey | 2024 | link |
Deconstructing The Ethics of Large Language Models from Long-standing Issues to New-emerging Dilemmas: A Surveyhttps://ui.adsabs.harvard.edu/ | Survey | 2024 | link |
A survey on large language model based autonomous agents | Survey | 2023 | link |
The rise and potential of large language model based agents: a survey | Survey | 2023 | link |
Large Language Model Alignment: A Survey | Survey | 2023 | link |
Ethical and social risks of harm from Language Models | Survey | 2021 | link |
On the Opportunities and Risks of Foundation Models | Survey | 2021 | link |
Toward Trustworthy AI Development: Mechanisms for Supporting Verifiable Claims | Survey | 2020 | link |
Actionable Auditing: Investigating the Impact of Publicly Naming Biased Performance Results of Commercial AI Products | Survey | 2019 | link |
ToolCoder: A Systematic Code-Empowered Tool Learning Framework for Large Language Models | Tools | 2025 | link |
Re-Invoke: Tool Invocation Rewriting for Zero-Shot Tool Retrieval | Tools | 2024 | link |
Chain of Tools: Large Language Model is an Automatic Multi-tool Learner | Tools | 2024 | link |
EASYTOOL: Enhancing LLM-based Agents with Concise Tool Instruction | Tools | 2024 | link |
ToolGen: Unified Tool Retrieval and Calling via Generation | Tools | 2024 | link |
ToolNet: Connecting Large Language Models with Massive Tools via Tool Graph | Tools | 2024 | link |
ToolPlanner: A Tool Augmented LLM for Multi Granularity Instructions with Path Planning and Feedback | Tools | 2024 | link |
Making Language Models Better Tool Learners with Execution Feedback | Tools | 2024 | link |
Leveraging Large Language Models to Improve REST API Testing | Tools | 2024 | link |
LLMs in the Imaginarium: Tool Learning through Simulated Trial and Error | Tools | 2024 | link |
Skills-in-Context: Unlocking Compositionality in Large Language Models | Tools | 2024 | link |
TaskMatrix.AI: Completing Tasks by Connecting Foundation Models with Millions of APIs | Tools | 2024 | link |
Gorilla: Large Language Model Connected with Massive APIs | Tools | 2024 | link |
LARGE LANGUAGE MODELS AS TOOL MAKERS | Tools | 2024 | link |
Multi-Agent Collaboration: Harnessing the Power of Intelligent LLM Agents | Tools | 2023 | link |
Recommender AI Agent: Integrating Large Language Models for Interactive Recommendations | Tools | 2023 | link |
ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs | Tools | 2023 | link |
TPTU-v2: Boosting Task Planning and Tool Usage of Large Language Model-based Agents in Real-world Systems | Tools | 2023 | link |
TPTU: Large Language Model-based AI Agents for Task Planning and Tool Usage | Tools | 2023 | link |
GPT4Tools: Teaching Large Language Model to Use Tools via Self-instruction | Tools | 2023 | link |
API-Bank: A Comprehensive Benchmark for Tool-Augmented LLMs | Tools | 2023 | link |
ChatCoT: Tool-Augmented Chain-of-Thought Reasoning on Chat-based Large Language Models | Tools | 2023 | link |
ToolQA: A Dataset for LLM Question Answering with External Tools | Tools | 2023 | link |
On the Tool Manipulation Capability of Open-source Large Language Models | Tools | 2023 | link |
RestGPT: Connecting Large Language Models with Real-World RESTful APIs | Tools | 2023 | link |
Toolformer: Language Models Can Teach Themselves to Use Tools | Tools | 2023 | link |
WebCPM: Interactive Web Search for Chinese Long-form Question Answering | Tools | 2023 | link |
ToolCoder: Teach Code Generation Models to use API search tools | Tools | 2023 | link |
ToolAlpaca: Generalized Tool Learning for Language Models with 3000 Simulated Cases | Tools | 2023 | link |
ToolkenGPT: Augmenting Frozen Language Models with Massive Tools via Tool Embeddings | Tools | 2023 | link |
MultiTool-CoT: GPT-3 Can Use Multiple External Tools with Chain of Thought Prompting | Tools | 2023 | link |
CREATOR: Tool Creation for Disentangling Abstract and Concrete Reasoning of Large Language Models | Tools | 2023 | link |
GEAR: Augmenting Language Models with Generalizable and Efficient Tool Resolution | Tools | 2023 | link |
Dify | Tools | 2023 | link |
LangChain | Tools | 2023 | link |
WebGPT: Browser-assisted question-answering with human feedback | Tools | 2022 | link |
Task Bench: A Parameterized Benchmark for Evaluating Parallel Runtime Performance | Tools | 2020 | link |
We welcome contributions to expand our collection. You can:
- Submit a pull request to add papers or resources
- Open an issue to suggest additional papers or resources
- Submit your paper at our submission form or email us at [email protected]
We regularly update the repository to include new research.
If you find our survey helpful, please consider citing our work:
@article{agentsurvey2025,
title={Large Language Model Agent: A Survey on Methodology, Applications and Challenges},
author={Junyu Luo and Weizhi Zhang and Ye Yuan and Yusheng Zhao and Junwei Yang and Yiyang Gu and Bohan Wu and Binqi Chen and Ziyue Qiao and Qingqing Long and Rongcheng Tu and Xiao Luo and Wei Ju and Zhiping Xiao and Yifan Wang and Meng Xiao and Chenwu Liu and Jingyang Yuan and Shichang Zhang and Yiqiao Jin and Fan Zhang and Xian Wu and Hanqing Zhao and Dacheng Tao and Philip S. Yu and Ming Zhang},
journal={arXiv preprint arXiv:2503.21460},
year={2025}
}
For questions or suggestions, please open an issue or contact the repository maintainers.
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for Awesome-Agent-Papers
Similar Open Source Tools

Awesome_LLM_System-PaperList
Since the emergence of chatGPT in 2022, the acceleration of Large Language Model has become increasingly important. Here is a list of papers on LLMs inference and serving.

Awesome-Resource-Efficient-LLM-Papers
A curated list of high-quality papers on resource-efficient Large Language Models (LLMs) with a focus on various aspects such as architecture design, pre-training, fine-tuning, inference, system design, and evaluation metrics. The repository covers topics like efficient transformer architectures, non-transformer architectures, memory efficiency, data efficiency, model compression, dynamic acceleration, deployment optimization, support infrastructure, and other related systems. It also provides detailed information on computation metrics, memory metrics, energy metrics, financial cost metrics, network communication metrics, and other metrics relevant to resource-efficient LLMs. The repository includes benchmarks for evaluating the efficiency of NLP models and references for further reading.

LLM4EC
LLM4EC is an interdisciplinary research repository focusing on the intersection of Large Language Models (LLM) and Evolutionary Computation (EC). It provides a comprehensive collection of papers and resources exploring various applications, enhancements, and synergies between LLM and EC. The repository covers topics such as LLM-assisted optimization, EA-based LLM architecture search, and applications in code generation, software engineering, neural architecture search, and other generative tasks. The goal is to facilitate research and development in leveraging LLM and EC for innovative solutions in diverse domains.

AudioLLM
AudioLLMs is a curated collection of research papers focusing on developing, implementing, and evaluating language models for audio data. The repository aims to provide researchers and practitioners with a comprehensive resource to explore the latest advancements in AudioLLMs. It includes models for speech interaction, speech recognition, speech translation, audio generation, and more. Additionally, it covers methodologies like multitask audioLLMs and segment-level Q-Former, as well as evaluation benchmarks like AudioBench and AIR-Bench. Adversarial attacks such as VoiceJailbreak are also discussed.

ai-game-development-tools
Here we will keep track of the AI Game Development Tools, including LLM, Agent, Code, Writer, Image, Texture, Shader, 3D Model, Animation, Video, Audio, Music, Singing Voice and Analytics. 🔥 * Tool (AI LLM) * Game (Agent) * Code * Framework * Writer * Image * Texture * Shader * 3D Model * Avatar * Animation * Video * Audio * Music * Singing Voice * Speech * Analytics * Video Tool

ai-reference-models
The Intel® AI Reference Models repository contains links to pre-trained models, sample scripts, best practices, and tutorials for popular open-source machine learning models optimized by Intel to run on Intel® Xeon® Scalable processors and Intel® Data Center GPUs. The purpose is to quickly replicate complete software environments showcasing the AI capabilities of Intel platforms. It includes optimizations for popular deep learning frameworks like TensorFlow and PyTorch, with additional plugins/extensions for improved performance. The repository is licensed under Apache License Version 2.0.

models
The Intel® AI Reference Models repository contains links to pre-trained models, sample scripts, best practices, and tutorials for popular open-source machine learning models optimized by Intel to run on Intel® Xeon® Scalable processors and Intel® Data Center GPUs. It aims to replicate the best-known performance of target model/dataset combinations in optimally-configured hardware environments. The repository will be deprecated upon the publication of v3.2.0 and will no longer be maintained or published.

Cool-GenAI-Fashion-Papers
Cool-GenAI-Fashion-Papers is a curated list of resources related to GenAI-Fashion, including papers, workshops, companies, and products. It covers a wide range of topics such as fashion design synthesis, outfit recommendation, fashion knowledge extraction, trend analysis, and more. The repository provides valuable insights and resources for researchers, industry professionals, and enthusiasts interested in the intersection of AI and fashion.

LLM-PlayLab
LLM-PlayLab is a repository containing various projects related to LLM (Large Language Models) fine-tuning, generative AI, time-series forecasting, and crash courses. It includes projects for text generation, sentiment analysis, data analysis, chat assistants, image captioning, and more. The repository offers a wide range of tools and resources for exploring and implementing advanced AI techniques.

awesome-llm-planning-reasoning
The 'Awesome LLMs Planning Reasoning' repository is a curated collection focusing on exploring the capabilities of Large Language Models (LLMs) in planning and reasoning tasks. It includes research papers, code repositories, and benchmarks that delve into innovative techniques, reasoning limitations, and standardized evaluations related to LLMs' performance in complex cognitive tasks. The repository serves as a comprehensive resource for researchers, developers, and enthusiasts interested in understanding the advancements and challenges in leveraging LLMs for planning and reasoning in real-world scenarios.

LLM4Opt
LLM4Opt is a collection of references and papers focusing on applying Large Language Models (LLMs) for diverse optimization tasks. The repository includes research papers, tutorials, workshops, competitions, and related collections related to LLMs in optimization. It covers a wide range of topics such as algorithm search, code generation, machine learning, science, industry, and more. The goal is to provide a comprehensive resource for researchers and practitioners interested in leveraging LLMs for optimization tasks.

Data-and-AI-Concepts
This repository is a curated collection of data science and AI concepts and IQs, covering topics from foundational mathematics to cutting-edge generative AI concepts. It aims to support learners and professionals preparing for various data science roles by providing detailed explanations and notebooks for each concept.