LLM-Agents-Papers

A repo lists papers related to LLM based agent

Stars: 1280

Visit

A repository that lists papers related to Large Language Model (LLM) based agents. The repository covers various topics including survey, planning, feedback & reflection, memory mechanism, role playing, game playing, tool usage & human-agent interaction, benchmark & evaluation, environment & platform, agent framework, multi-agent system, and agent fine-tuning. It provides a comprehensive collection of research papers on LLM-based agents, exploring different aspects of AI agent architectures and applications.

README:

LLM-Agents-Papers

✍️ Description

Last Updated Time: 2025/3/2

A repo lists papers related to LLM based agent. Includes

Survey
Technique For Enhancement
Interaction
Application
- Math
- Chemistry
- Biology
- Physics
- Geography
- Art
- Medicine
- Finance
- Software Engineering
- Research
Automation
- Workflow
- Automatic Evaluation
Training
- Fine tuning
- RL
- DPO
Scaling
- Single-Agent Framework
- Multi-Agent System
Stability
Infrastructure
Others

💛 Recommendation

For more comprehensive reading, we also recommend other paper lists:

zjunlp/LLMAgentPapers: Must-read Papers on Large Language Model Agents.
teacherpeterpan/self-correction-llm-papers: This is a collection of research papers for Self-Correcting Large Language Models with Automated Feedback.
Paitesanshi/LLM-Agent-Survey: A Survey on LLM-based Autonomous Agents.
woooodyy/llm-agent-paper-list: Must-read papers for LLM-based agents.
git-disl/awesome-LLM-game-agent-papers: Must-read papers for LLM-based Game agents.

📰 Papers

Survey

[2025/02/20] Beyond Self-Talk: A Communication-Centric Survey of LLM-Based Multi-Agent Systems | [paper] | [code]
[2025/02/18] Towards a Design Guideline for RPA Evaluation: A Survey of Large Language Model-Based Role-Playing Agents | [paper] | [code]
[2025/02/16] A Survey of LLM-based Agents in Medicine: How far are we from Baymax? | [paper] | [code]
[2025/01/15] Agentic Retrieval-Augmented Generation: A Survey on Agentic RAG | [paper] | [code]
[2024/12/23] A Survey on LLM-based Multi-Agent System: Recent Advances and New Frontiers in Application | [paper] | [code]
[2024/12/18] A Survey on Large Language Model-based Agents for Statistics and Data Science | [paper] | [code]
[2024/12/05] A Survey on Large Language Model-Based Social Agents in Game-Theoretic Scenarios | [paper] | [code]
[2024/12/04] From Individual to Society: A Survey on Social Simulation Driven by Large Language Model-based Agents | [paper] | [code]
[2024/11/27] Large Language Model-Brained GUI Agents: A Survey | [paper] | [code]
[2024/09/27] A Survey on Complex Tasks for Goal-Directed Interactive Agents | [paper] | [code]
[2024/09/13] Agents in Software Engineering: Survey, Landscape, and Vision | [paper] | [code]
[2024/09/04] A Survey on Emergent Language | [paper] | [code]
[2024/08/05] From LLMs to LLM-based Agents for Software Engineering: A Survey of Current, Challenges and Future | [paper] | [code]
[2024/07/26] Large Language Model Agent in Financial Trading: A Survey | [paper] | [code]
[2024/06/03] Two Tales of Persona in LLMs: A Survey of Role-Playing and Personalization | [paper] | [code]
[2024/06/01] Towards Rationality in Language and Multimodal Agents: A Survey | [paper] | [code]
[2024/04/17] Advancing Social Intelligence in AI Agents: Technical Challenges and Open Questions | [paper] | [code]
[2024/04/02] A Survey on Large Language Model-Based Game Agents | [paper] | [code]
[2024/03/26] Leveraging Large Language Models in Human-Robot Interaction: A Critical Analysis of Potential and Pitfalls | [paper] | [code]
[2024/03/07] Promising and worth-to-try future directions for advancing state-of-the-art surrogates methods of agent-based models in social and health computational sciences | [paper] | [code]
[2024/02/28] Large Language Models and Games: A Survey and Roadmap | [paper] | [code]
[2024/02/28] A Survey on Recent Advances in LLM-Based Multi-turn Dialogue Systems | [paper] | [code]
[2024/02/05] Understanding the planning of LLM agents: A survey | [paper] | [code]
[2024/01/01] If LLM Is the Wizard, Then Code Is the Wand: A Survey on How Code Empowers Large Language Models to Serve as Intelligent Agents | [paper] | [code]
[2023/12/31] A Survey of Personality, Persona, and Profile in Conversational Agents and Chatbots | [paper] | [code]
[2023/12/19] Large Language Models Empowered Agent-based Modeling and Simulation: A Survey and Perspectives | [paper] | [code]
[2023/09/14] The Rise and Potential of Large Language Model Based Agents: A Survey | [paper] | [code]
[2023/08/22] A Survey on Large Language Model based Autonomous Agents | [paper] | [code]
[2023/06/27] Next Steps for Human-Centered Generative AI: A Technical Perspective | [paper] | [code]

Technique For Enhancement

Planning

[2025/02/08] CODESIM: Multi-Agent Code Generation and Problem Solving through Simulation-Driven Planning and Debugging | [paper] | [code]
[2025/02/06] Robotouille: An Asynchronous Planning Benchmark for LLM Agents | [paper] | [code]
[2025/01/27] MADP: Multi-Agent Deductive Planning for Enhanced Cognitive-Behavioral Mental Health Question Answer | [paper] | [code]
[2025/01/14] Talk to Right Specialists: Routing and Planning in Multi-agent System for Question Answering | [paper] | [code]
[2024/12/30] Plancraft: an evaluation dataset for planning with LLM agents | [paper] | [code]
[2024/12/28] Efficient Multi-Agent Collaboration with Tool Use for Online Planning in Complex Table Question Answering | [paper] | [code]
[2024/12/13] Script-Based Dialog Policy Planning for LLM-Powered Conversational Agents: A Basic Architecture for an "AI Therapist" | [paper] | [code]
[2024/11/13] One STEP at a time: Language Agents are Stepwise Planners | [paper] | [code]
[2024/11/05] Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset and Self-adaptive Planning Agent | [paper] | [code]
[2024/10/12] CAMPHOR: Collaborative Agents for Multi-input Planning and High-Order Reasoning On Device | [paper] | [code]
[2024/10/01] Self-controller: Controlling LLMs with Multi-round Step-by-step Self-awareness | [paper] | [code]
[2024/09/30] Interactive Speculative Planning: Enhance Agent Efficiency through Co-design of System and User Interface | [paper] | [code]
[2024/09/28] SELP: Generating Safe and Efficient Task Plans for Robot Agents with Large Language Models | [paper] | [code]
[2024/09/25] MSI-Agent: Incorporating Multi-Scale Insight into Embodied Agents for Superior Planning and Decision-Making | [paper] | [code]
[2024/08/15] VerilogCoder: Autonomous Verilog Coding Agents with Graph-based Planning and Abstract Syntax Tree (AST)-based Waveform Tracing Tool | [paper] | [code]
[2024/08/12] Towards Autonomous Agents: Adaptive-planning, Reasoning, and Acting in Language Models | [paper] | [code]
[2024/08/01] AgentGen: Enhancing Planning Abilities for Large Language Model based Agent via Environment and Task Generation | [paper] | [code]
[2024/07/04] Controllable Conversations: Planning-Based Dialogue Agent with Large Language Models | [paper] | [code]
[2024/06/17] RePrompt: Planning by Automatic Prompt Engineering for Large Language Models Agents | [paper] | [code]
[2024/06/09] A Review of Prominent Paradigms for LLM-Based Agents: Tool Use (Including RAG), Planning, and Feedback Learning | [paper] | [code]
[2024/06/06] Tool-Planner: Task Planning with Clusters across Multiple Tools | [paper] | [code]
[2024/05/28] A Human-Like Reasoning Framework for Multi-Phases Planning Task with Large Language Models | [paper] | [code]
[2024/05/27] REVECA: Adaptive Planning and Trajectory-based Validation in Cooperative Language Agents using Information Relevance and Relative Proximity | [paper] | [code]
[2024/04/21] Socratic Planner: Inquiry-Based Zero-Shot Planning for Embodied Instruction Following | [paper] | [code]
[2024/04/17] The Landscape of Emerging AI Agent Architectures for Reasoning, Planning, and Tool Calling: A Survey | [paper] | [code]
[2024/03/11] Strength Lies in Differences! Improving Strategy Planning for Non-collaborative Dialogues via Diversified User Simulation | [paper] | [code]
[2024/03/10] TRAD: Enhancing LLM Agents with Step-Wise Thought Retrieval and Aligned Decision | [paper] | [code]
[2024/03/05] KnowAgent: Knowledge-Augmented Planning for LLM-Based Agents | [paper] | [code]
[2024/02/29] PlanGPT: Enhancing Urban Planning with Tailored Language Model and Efficient Retrieval | [paper] | [code]
[2024/02/18] What's the Plan? Evaluating and Developing Planning-Aware Techniques for Language Models | [paper] | [code]
[2024/02/18] PreAct: Prediction Enhances Agent's Planning Ability | [paper] | [code]
[2024/02/16] When is Tree Search Useful for LLM Planning? It Depends on the Discriminator | [paper] | [code]
[2024/02/15] TDAG: A Multi-Agent Framework based on Dynamic Task Decomposition and Agent Generation | [paper] | [code]
[2024/02/09] Introspective Planning: Aligning Robots' Uncertainty with Inherent Task Ambiguity | [paper] | [code]
[2024/02/06] RAP: Retrieval-Augmented Planning with Contextual Memory for Multimodal LLM Agents | [paper] | [code]
[2024/02/02] TravelPlanner: A Benchmark for Real-World Planning with Language Agents | [paper] | [code]
[2024/01/10] AutoAct: Automatic Agent Learning from Scratch for QA via Self-Planning | [paper] | [code]
[2023/11/19] TPTU-v2: Boosting Task Planning and Tool Usage of Large Language Model-based Agents in Real-world Systems | [paper] | [code]
[2023/10/12] Tree-Planner: Efficient Close-loop Task Planning with Large Language Models | [paper] | [code]
[2023/10/09] Put Your Money Where Your Mouth Is: Evaluating Strategic Planning and Execution of LLM Agents in an Auction Arena | [paper] | [code]
[2023/08/07] TPTU: Large Language Model-based AI Agents for Task Planning and Tool Usage | [paper] | [code]
[2023/08/01] SelfCheck: Using LLMs to Zero-Shot Check Their Own Step-by-Step Reasoning | [paper] | [code]
[2023/05/26] AdaPlanner: Adaptive Planning from Feedback with Language Models | [paper] | [code]
[2023/05/24] Reasoning with Language Model is Planning with World Model | [paper] | [code]
[2023/05/24] Leveraging Pre-trained Large Language Models to Construct and Utilize World Models for Model-based Task Planning | [paper] | [code]
[2023/03/29] Skill Reinforcement Learning and Planning for Open-World Long-Horizon Tasks | [paper] | [code]
[2023/02/03] Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agents | [paper] | [code]
[2022/12/08] LLM-Planner: Few-Shot Grounded Planning for Embodied Agents with Large Language Models | [paper] | [code]

Memory Mechanism

[2025/02/17] A-MEM: Agentic Memory for LLM Agents | [paper] | [code]
[2025/02/08] On Memory Construction and Retrieval for Personalized Conversational Agents | [paper] | [code]
[2025/01/20] Zep: A Temporal Knowledge Graph Architecture for Agent Memory | [paper] | [code]
[2025/01/15] Doc-Guided Sent2Sent++: A Sent2Sent++ Agent with Doc-Guided memory for Document-level Machine Translation | [paper] | [code]
[2024/12/17] On the Structural Memory of LLM Agents | [paper] | [code]
[2024/12/17] Memory-Augmented Agent Training for Business Document Understanding | [paper] | [code]
[2024/10/10] DelTA: An Online Document-Level Translation Agent Based on Multi-Level Memory | [paper] | [code]
[2024/09/28] Crafting Personalized Agents through Retrieval-Augmented Generation on Editable Memory Graphs | [paper] | [code]
[2024/09/11] Agent Workflow Memory | [paper] | [code]
[2024/09/01] Self-evolving Agents with reflective and memory-augmented abilities | [paper] | [code]
[2024/08/18] HiAgent: Hierarchical Working Memory Management for Solving Long-Horizon Agent Tasks with Large Language Model | [paper] | [code]
[2024/08/07] Optimus-1: Hybrid Multimodal Memory Empowered Agents Excel in Long-Horizon Tasks | [paper] | [code]
[2024/05/29] Toward Conversational Agents with Context and Time Sensitive Long-term Memory | [paper] | [code]
[2024/04/15] Memory Sharing for Large Language Model based Agents | [paper] | [code]
[2024/02/19] Compress to Impress: Unleashing the Potential of Compressive Memory in Real-World Long-Term Conversations | [paper] | [code]
[2024/02/07] InfLLM: Training-Free Long-Context Extrapolation for LLMs with an Efficient Context Memory | [paper] | [code]
[2024/02/06] RAP: Retrieval-Augmented Planning with Contextual Memory for Multimodal LLM Agents | [paper] | [code]
[2024/01/05] From LLM to Conversational Agent: A Memory Enhanced Architecture with Fine-Tuning of Large Language Models | [paper] | [code]
[2023/12/22] Empowering Working Memory for Large Language Model Agents | [paper] | [code]
[2023/12/22] Personalized Large Language Model Assistant with Evolving Conditional Memory | [paper] | [code]
[2023/11/10] JARVIS-1: Open-World Multi-task Agents with Memory-Augmented Multimodal Language Models | [paper] | [code]
[2023/06/06] ChatDB: Augmenting LLMs with Databases as Their Symbolic Memory | [paper] | [code]
[2023/05/23] RET-LLM: Towards a General Read-Write Memory for Large Language Models | [paper] | [code]
[2023/05/17] MemoryBank: Enhancing Large Language Models with Long-Term Memory | [paper] | [code]
[2023/05/02] The Role of Summarization in Generative Agents: A Preliminary Perspective | [paper] | [code]
[2023/05/01] Learning to Reason and Memorize with Self-Notes | [paper] | [code]
[2023/04/26] Enhancing Large Language Model with Self-Controlled Memory Framework | [paper] | [code]
[2023/04/21] Emergent and Predictable Memorization in Large Language Models | [paper] | [code]

Feedback&Reflection

[2025/02/20] STeCa: Step-level Trajectory Calibration for LLM Agent Learning | [paper] | [code]
[2025/02/17] Table-Critic: A Multi-Agent Framework for Collaborative Criticism and Refinement in Table Reasoning | [paper] | [code]
[2025/02/17] A Study on Leveraging Search and Self-Feedback for Agent Reasoning | [paper] | [code]
[2025/02/03] PlotGen: Multi-Agent LLM-based Scientific Data Visualization via Multimodal Feedback | [paper] | [code]
[2025/01/26] Large Language Models as Theory of Mind Aware Generative Agents with Counterfactual Reflection | [paper] | [code]
[2025/01/23] AgentRec: Agent Recommendation Using Sentence Embeddings Aligned to Human Feedback | [paper] | [code]
[2025/01/08] InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning and Reflection | [paper] | [code]
[2024/12/31] Enhancing LLM Reasoning with Multi-Path Collaborative Reactive and Reflection agents | [paper] | [code]
[2024/12/22] A Multi-AI Agent System for Autonomous Optimization of Agentic AI Solutions via Iterative Refinement and LLM-Driven Feedback Loops | [paper] | [code]
[2024/11/29] Training Agents with Weakly Supervised Feedback from Large Language Models | [paper] | [code]
[2024/11/21] Enhancing LLMs for Power System Simulations: A Feedback-driven Multi-agent Framework | [paper] | [code]
[2024/11/11] Using Generative AI and Multi-Agents to Provide Automatic Feedback | [paper] | [code]
[2024/11/04] Positive Experience Reflection for Agents in Interactive Text Environments | [paper] | [code]
[2024/10/29] Enhancing Financial Question Answering with a Multi-Agent Reflection Framework | [paper] | [code]
[2024/10/28] CRAT: A Multi-Agent Framework for Causality-Enhanced Reflective and Retrieval-Augmented Translation with Large Language Models | [paper] | [code]
[2024/10/25] OpenWebVoyager: Building Multimodal Web Agents via Iterative Real-World Exploration, Feedback and Optimization | [paper] | [code]
[2024/10/23] ReflecTool: Towards Reflection-Aware Tool-Augmented Clinical Agents | [paper] | [code]
[2024/10/20] Training Language Models to Critique With Multi-agent Feedback | [paper] | [code]
[2024/10/16] PRefLexOR: Preference-based Recursive Language Modeling for Exploratory Optimization of Reasoning and Agentic Thinking | [paper] | [code]
[2024/10/08] DataEnvGym: Data Generation Agents in Teacher Environments with Student Feedback | [paper] | [code]
[2024/10/02] ExACT: Teaching AI Agents to Explore with Reflective-MCTS and Exploratory Learning | [paper] | [code]
[2024/10/02] RGD: Multi-LLM Based Agent Debugger via Refinement and Generation Guidance | [paper] | [code]
[2024/09/18] MAgICoRe: Multi-Agent, Iterative, Coarse-to-Fine Refinement for Reasoning | [paper] | [code]
[2024/09/05] E2CL: Exploration-based Error Correction Learning for Embodied Agents | [paper] | [code]
[2024/09/01] Self-evolving Agents with reflective and memory-augmented abilities | [paper] | [code]
[2024/08/30] Tool-Assisted Agent on SQL Inspection and Refinement in Real-World Scenarios | [paper] | [code]
[2024/08/15] MAG-SQL: Multi-Agent Generative Approach with Soft Schema Linking and Iterative Sub-SQL Refinement for Text-to-SQL | [paper] | [code]
[2024/07/25] Recursive Introspection: Teaching Language Model Agents How to Self-Improve | [paper] | [code]
[2024/06/09] A Review of Prominent Paradigms for LLM-Based Agents: Tool Use (Including RAG), Planning, and Feedback Learning | [paper] | [code]
[2024/06/05] LLM-based Rewriting of Inappropriate Argumentation using Reinforcement Learning from Machine Feedback | [paper] | [code]
[2024/06/03] Re-ReST: Reflection-Reinforced Self-Training for Language Agents | [paper] | [code]
[2024/03/18] QueryAgent: A Reliable and Efficient Reasoning Framework with Environmental Feedback-based Self-Correction | [paper] | [code]
[2024/03/17] Improving Dialogue Agents by Decomposing One Global Explicit Annotation with Local Implicit Multimodal Feedback | [paper] | [code]
[2024/03/08] ChatASU: Evoking LLM's Reflexion to Truly Understand Aspect Sentiment in Dialogues | [paper] | [code]
[2024/03/04] Trial and Error: Exploration-Based Trajectory Optimization for LLM Agents | [paper] | [code]
[2024/02/27] Agent-Pro: Learning to Evolve via Policy-Level Reflection and Optimization | [paper] | [code]
[2024/02/26] SelectIT: Selective Instruction Tuning for LLMs via Uncertainty-Aware Self-Reflection | [paper] | [code]
[2024/02/22] Mirror: A Multiple-perspective Self-Reflection Method for Knowledge-rich Reasoning | [paper] | [code]
[2024/02/19] A Critical Evaluation of AI Feedback for Aligning Large Language Models | [paper] | [code]
[2024/02/06] AnyTool: Self-Reflective, Hierarchical Agents for Large-Scale API Calls | [paper] | [code]
[2024/02/02] StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback | [paper] | [code]
[2023/11/14] The ART of LLM Refinement: Ask, Refine, and Trust | [paper] | [code]
[2023/10/31] Learning From Mistakes Makes LLM Better Reasoner | [paper] | [code]
[2023/10/12] A Zero-Shot Language Agent for Computer Control with Structured Reflection | [paper] | [code]
[2023/07/27] PanGu-Coder2: Boosting Large Language Models for Code with Ranking Feedback | [paper] | [code]
[2023/05/22] Making Language Models Better Tool Learners with Execution Feedback | [paper] | [code]
[2023/05/17] Improving Language Model Negotiation with Self-Play and In-Context Learning from AI Feedback | [paper] | [code]
[2023/04/21] Improving Grounded Language Understanding in a Collaborative Environment by Interacting with Agents Through Help Feedback | [paper] | [code]
[2023/04/11] Teaching Large Language Models to Self-Debug | [paper] | [code]
[2023/03/30] Self-Refine: Iterative Refinement with Self-Feedback | [paper] | [code]

RAG

[2025/02/25] ViDoRAG: Visual Document Retrieval-Augmented Generation via Dynamic Iterative Reasoning Agents | [paper] | [code]
[2025/02/19] RAG-Gym: Optimizing Reasoning and Search Agents with Process Supervision | [paper] | [code]
[2025/02/08] On Memory Construction and Retrieval for Personalized Conversational Agents | [paper] | [code]
[2025/02/06] Enhancing Online Learning Efficiency Through Heterogeneous Resource Integration with a Multi-Agent RAG System | [paper] | [code]
[2025/01/25] Improving Retrieval-Augmented Generation through Multi-Agent Reinforcement Learning | [paper] | [code]
[2024/12/31] MAIN-RAG: Multi-Agent Filtering Retrieval-Augmented Generation | [paper] | [code]
[2024/12/24] GeAR: Graph-enhanced Agent for Retrieval-augmented Generation | [paper] | [code]
[2024/12/20] Towards Interpretable Radiology Report Generation via Concept Bottlenecks using a Multi-Agentic RAG | [paper] | [code]
[2024/12/16] BioRAGent: A Retrieval-Augmented Generation System for Showcasing Generative Query Expansion and Domain-Specific Search for Scientific Q&A | [paper] | [code]
[2024/12/07] SLA Management in Reconfigurable Multi-Agent RAG: A Systems Approach to Question Answering | [paper] | [code]
[2024/11/05] Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset and Self-adaptive Planning Agent | [paper] | [code]
[2024/10/28] CRAT: A Multi-Agent Framework for Causality-Enhanced Reflective and Retrieval-Augmented Translation with Large Language Models | [paper] | [code]
[2024/10/18] Toolshed: Scale Tool-Equipped Agents with Advanced RAG-Tool Fusion and Tool Knowledge Bases | [paper] | [code]
[2024/10/01] Conversational Exploratory Search of Scholarly Publications Using Knowledge Graphs | [paper] | [code]
[2024/09/28] Crafting Personalized Agents through Retrieval-Augmented Generation on Editable Memory Graphs | [paper] | [code]
[2024/08/18] Agentic Retrieval-Augmented Generation for Time Series Analysis | [paper] | [code]
[2024/08/05] LLM Agents Improve Semantic Code Search | [paper] | [code]
[2024/08/03] MALADE: Orchestration of LLM-powered Agents with Retrieval Augmented Generation for Pharmacovigilance | [paper] | [code]
[2024/07/20] Golden-Retriever: High-Fidelity Agentic Retrieval Augmented Generation for Industrial Knowledge Base | [paper] | [code]
[2024/06/26] Geode: A Zero-shot Geospatial Question-Answering Agent with Explicit Reasoning and Precise Spatio-Temporal Retrieval | [paper] | [code]
[2024/06/19] StackRAG Agent: Improving Developer Answers with Retrieval-Augmented Generation | [paper] | [code]
[2024/06/09] A Review of Prominent Paradigms for LLM-Based Agents: Tool Use (Including RAG), Planning, and Feedback Learning | [paper] | [code]
[2024/03/05] AgentsCourt: Building Judicial Decision-Making Agents with Court Debate Simulation and Legal Knowledge Augmentation | [paper] | [code]
[2024/02/06] RAP: Retrieval-Augmented Planning with Contextual Memory for Multimodal LLM Agents | [paper] | [code]
[2023/12/27] Automating Knowledge Acquisition for Content-Centric Cognitive Agents Using LLMs | [paper] | [code]

Search

[2025/02/20] I-MCTS: Enhancing Agentic AutoML via Introspective Monte Carlo Tree Search | [paper] | [code]
[2025/02/18] R2-KG: General-Purpose Dual-Agent Framework for Reliable Reasoning on Knowledge Graphs | [paper] | [code]
[2025/02/18] Agentic Deep Graph Reasoning Yields Self-Organizing Knowledge Networks | [paper] | [code]
[2025/02/17] A Study on Leveraging Search and Self-Feedback for Agent Reasoning | [paper] | [code]
[2025/02/05] SymAgent: A Neural-Symbolic Self-Learning Agent Framework for Complex Reasoning over Knowledge Graphs | [paper] | [code]
[2025/02/02] Efficient Multi-Agent System Training with Data Influence-Oriented Tree Search | [paper] | [code]
[2025/01/31] KBQA-o1: Agentic Knowledge Base Question Answering with Monte Carlo Tree Search | [paper] | [code]
[2025/01/09] Search-o1: Agentic Search-Enhanced Large Reasoning Models | [paper] | [code]
[2024/12/24] A Novel Task-Driven Method with Evolvable Interactive Agents Using Event Trees for Enhanced Emergency Decision Support | [paper] | [code]
[2024/12/22] Multi-Agent Sampling: Scaling Inference Compute for Data Synthesis with Tree Search-Based Agentic Collaboration | [paper] | [code]
[2024/12/05] Agent AI with LangGraph: A Modular Framework for Enhancing Machine Translation Using Large Language Models | [paper] | [code]
[2024/11/07] CodeTree: Agent-guided Tree Search for Code Generation with Large Language Models | [paper] | [code]
[2024/10/29] Synergizing LLM Agents and Knowledge Graph for Socioeconomic Prediction in LBSN | [paper] | [code]
[2024/10/25] AGENT-CQ: Automatic Generation and Evaluation of Clarifying Questions for Conversational Search with LLMs | [paper] | [code]
[2024/10/22] SELA: Tree-Search Enhanced LLM Agents for Automated Machine Learning | [paper] | [code]
[2024/10/13] Expanding Search Space with Diverse Prompting Agents: An Efficient Sampling Approach for LLM Mathematical Reasoning | [paper] | [code]
[2024/10/13] LLM-Based Multi-Agent Systems are Scalable Graph Generative Models | [paper] | [code]
[2024/10/02] ExACT: Teaching AI Agents to Explore with Reflective-MCTS and Exploratory Learning | [paper] | [code]
[2024/09/09] SciAgents: Automating scientific discovery through multi-agent intelligent graph reasoning | [paper] | [code]
[2024/07/01] Tree Search for Language Model Agents | [paper] | [code]
[2024/06/17] Input Conditioned Graph Generation for Language Agents | [paper] | [code]
[2024/02/17] KG-Agent: An Efficient Autonomous Agent Framework for Complex Reasoning over Knowledge Graph | [paper] | [code]
[2024/02/16] When is Tree Search Useful for LLM Planning? It Depends on the Discriminator | [paper] | [code]
[2024/02/09] CoSearchAgent: A Lightweight Collaborative Search Agent with Large Language Models | [paper] | [code]
[2023/05/17] Tree of Thoughts: Deliberate Problem Solving with Large Language Models | [paper] | [code]

Interaction

Role Playing

[2025/02/20] InstructAgent: Building User Controllable Recommender via LLM Agent | [paper] | [code]
[2025/02/18] SEFL: Harnessing Large Language Model Agents to Improve Educational Feedback Systems | [paper] | [code]
[2025/02/17] Can LLM Agents Maintain a Persona in Discourse? | [paper] | [code]
[2025/02/17] LM Agents for Coordinating Multi-User Information Gathering | [paper] | [code]
[2025/02/16] SCALE: Towards Collaborative Content Analysis in Social Science with Large Language Model Agents and Human Intervention | [paper] | [code]
[2025/02/13] Language Agents as Digital Representatives in Collective Decision-Making | [paper] | [code]
[2025/02/06] PsyPlay: Personality-Infused Role-Playing Conversational Agents | [paper] | [code]
[2025/02/03] Plan-Then-Execute: An Empirical Study of User Trust and Team Performance When Using LLM Agents As A Daily Assistant | [paper] | [code]
[2025/01/23] AgentRec: Agent Recommendation Using Sentence Embeddings Aligned to Human Feedback | [paper] | [code]
[2025/01/15] Personality Modeling for Persuasion of Misinformation using AI Agent | [paper] | [code]
[2024/12/28] BaiJia: A Large-Scale Role-Playing Agent Corpus of Chinese Historical Characters | [paper] | [code]
[2024/12/22] Modular Conversational Agents for Surveys and Interviews | [paper] | [code]
[2024/12/11] SweetieChat: A Strategy-Enhanced Role-playing Framework for Diverse Scenarios Handling Emotional Support Agent | [paper] | [code]
[2024/12/10] My Words Imply Your Opinion: Reader Agent-Based Propagation Enhancement for Personalized Implicit Emotion Analysis | [paper] | [code]
[2024/11/21] Towards Full Delegation: Designing Ideal Agentic Behaviors for Travel Planning | [paper] | [code]
[2024/11/19] Probing the Capacity of Language Model Agents to Operationalize Disparate Experiential Context Despite Distraction | [paper] | [code]
[2024/11/12] SHARP: Unlocking Interactive Hallucination via Stance Transfer in Role-Playing Agents | [paper] | [code]
[2024/11/04] A Multi-Task Role-Playing Agent Capable of Imitating Character Linguistic Styles | [paper] | [code]
[2024/10/28] Guide-LLM: An Embodied LLM Agent and Text-Based Topological Map for Robotic Guidance of People with Visual Impairments | [paper] | [code]
[2024/10/24] Schema-Guided Culture-Aware Complex Event Simulation with Multi-Agent Role-Play | [paper] | [code]
[2024/09/23] ERABAL: Enhancing Role-Playing Agents through Boundary-Aware Learning | [paper] | [code]
[2024/09/19] FoodPuzzle: Developing Large Language Model Agents as Flavor Scientists | [paper] | [code]
[2024/09/12] TravelAgent: An AI Assistant for Personalized Travel Planning | [paper] | [code]
[2024/09/11] Using Generative Agents to Create Tip Sheets for Investigative Data Reporting | [paper] | [code]
[2024/08/28] Interactive Agents: Simulating Counselor-Client Psychological Counseling via Role-Playing LLM-to-LLM Interactions | [paper] | [code]
[2024/08/21] Drama Engine: A Framework for Narrative Agents | [paper] | [code]
[2024/06/24] The Effects of Embodiment and Personality Expression on Learning in LLM-based Educational Agents | [paper] | [code]
[2024/06/17] HoLLMwood: Unleashing the Creativity of Large Language Models in Screenwriting via Role Playing | [paper] | [code]
[2024/06/11] Agent-SiMT: Agent-assisted Simultaneous Machine Translation with Large Language Models | [paper] | [code]
[2024/06/09] Peer Review as A Multi-Turn and Long-Context Dialogue with Role-Based Interactions | [paper] | [code]
[2024/05/28] TimeChara: Evaluating Point-in-Time Character Hallucination of Role-Playing Large Language Models | [paper] | [code]
[2024/05/10] LLM Discussion: Enhancing the Creativity of Large Language Models via Discussion Framework and Role-Play | [paper] | [code]
[2024/05/08] LLMs with Personalities in Multi-issue Negotiation Games | [paper] | [code]
[2024/05/06] Large Language Models (LLMs) as Agents for Augmented Democracy | [paper] | [code]
[2024/05/02] GAIA: A General AI Assistant for Intelligent Accelerator Operations | [paper] | [code]
[2024/05/01] "Ask Me Anything": How Comcast Uses LLMs to Assist Agents in Real Time | [paper] | [code]
[2024/04/26] Large Language Model Agent as a Mechanical Designer | [paper] | [code]
[2024/04/19] Cooperative Sentiment Agents for Multimodal Sentiment Analysis | [paper] | [code]
[2024/03/31] DiffAgent: Fast and Accurate Text-to-Image API Selection with Large Language Model | [paper] | [code]
[2024/03/23] EduAgent: Generative Student Agents in Learning | [paper] | [code]
[2024/03/19] Characteristic AI Agents via Large Language Models | [paper] | [code]
[2024/03/15] VideoAgent: Long-form Video Understanding with Large Language Model as Agent | [paper] | [code]
[2024/03/13] Evaluating Large Language Models as Generative User Simulators for Conversational Recommendation | [paper] | [code]
[2024/02/29] On the Decision-Making Abilities in Role-Playing using Large Language Models | [paper] | [code]
[2024/02/28] Prospect Personalized Recommendation on Large Language Model-based Agent Platform | [paper] | [code]
[2024/02/26] Language Agents as Optimizable Graphs | [paper] | [code]
[2024/02/22] Triad: A Framework Leveraging a Multi-Role LLM-based Agent to Solve Knowledge Base Question Answering | [paper] | [code]
[2024/02/22] Large Language Models as Urban Residents: An LLM Agent Framework for Personal Mobility Generation | [paper] | [code]
[2024/02/21] Neeko: Leveraging Dynamic LoRA for Efficient Multi-Character Role-Playing Agent | [paper] | [code]
[2024/02/19] Stick to your Role! Stability of Personal Values Expressed in Large Language Models | [paper] | [code]
[2024/02/18] Modelling Political Coalition Negotiations Using LLM-based Agents | [paper] | [code]
[2024/02/06] Professional Agents -- Evolving Large Language Models into Autonomous Experts with Human-Level Competencies | [paper] | [code]
[2024/02/06] Can Generative Agents Predict Emotion? | [paper] | [code]
[2024/02/05] GUARD: Role-playing to Generate Natural-language Jailbreakings to Test Guideline Adherence of Large Language Models | [paper] | [code]
[2024/01/31] LLMs Simulate Big Five Personality Traits: Further Evidence | [paper] | [code]
[2023/12/22] Personalized Large Language Model Assistant with Evolving Conditional Memory | [paper] | [code]
[2023/12/21] ChatGPT as a commenter to the news: can LLMs generate human-like opinions? | [paper] | [code]
[2023/12/20] Machine Mindset: An MBTI Exploration of Large Language Models | [paper] | [code]
[2023/12/19] Can ChatGPT be Your Personal Medical Assistant? | [paper] | [code]
[2023/10/13] AgentCF: Collaborative Learning with Autonomous Language Agents for Recommender Systems | [paper] | [code]
[2023/10/01] RoleLLM: Benchmarking, Eliciting, and Enhancing Role-Playing Abilities of Large Language Models | [paper] | [code]
[2023/09/02] ModelScope-Agent: Building Your Customizable Agent System with Open-source Large Language Models | [paper] | [code]
[2023/08/22] Towards an On-device Agent for Text Rewriting | [paper] | [code]
[2023/08/10] LLM As DBA | [paper] | [code]
[2023/08/03] InterAct: Exploring the Potentials of ChatGPT as a Cooperative Agent | [paper] | [code]
[2023/07/11] Unleashing the Emergent Cognitive Synergy in Large Language Models: A Task-Solving Agent through Multi-Persona Self-Collaboration | [paper] | [code]
[2023/07/05] Building Cooperative Embodied Agents Modularly with Large Language Models | [paper] | [code]
[2023/05/25] Role-Play with Large Language Models | [paper] | [code]
[2023/05/09] TidyBot: Personalized Robot Assistance with Large Language Models | [paper] | [code]

Conversation

[2025/02/24] Turning Conversations into Workflows: A Framework to Extract and Evaluate Dialog Workflows for Service AI Agents | [paper] | [code]
[2025/02/20] Enhancing Conversational Agents with Theory of Mind: Aligning Beliefs, Desires, and Intentions for Human-Like Interaction | [paper] | [code]
[2025/02/18] One Size doesn't Fit All: A Personalized Conversational Tutoring Agent for Mathematics Instruction | [paper] | [code]
[2025/02/18] Training Turn-by-Turn Verifiers for Dialogue Tutoring Agents: The Curious Case of LLMs as Your Coding Tutors | [paper] | [code]
[2025/02/18] You need to MIMIC to get FAME: Solving Meeting Transcript Scarcity with a Multi-Agent Conversations | [paper] | [code]
[2025/02/17] InfoQuest: Evaluating Multi-Turn Dialogue Agents for Open-Ended Conversations with Hidden Context | [paper] | [code]
[2025/02/13] Reliable Conversational Agents under ASP Control that Understand Natural Language | [paper] | [code]
[2025/02/12] Can a Single Model Master Both Multi-turn Conversations and Tool Use? CoALM: A Unified Conversational Agentic Language Model | [paper] | [code]
[2025/02/09] MTPChat: A Multimodal Time-Aware Persona Dataset for Conversational Agents | [paper] | [code]
[2025/02/09] HamRaz: A Culture-Based Persian Conversation Dataset for Person-Centered Therapy Using LLM Agents | [paper] | [code]
[2025/02/08] On Memory Construction and Retrieval for Personalized Conversational Agents | [paper] | [code]
[2025/02/06] PsyPlay: Personality-Infused Role-Playing Conversational Agents | [paper] | [code]
[2025/01/24] Unmasking Conversational Bias in AI Multiagent Systems | [paper] | [code]
[2025/01/23] Communicating Activations Between Language Model Agents | [paper] | [code]
[2025/01/19] IntellAgent: A Multi-Agent Framework for Evaluating Conversational AI Systems | [paper] | [code]
[2025/01/14] Developing Enhanced Conversational Agents for Social Virtual Worlds | [paper] | [code]
[2025/01/03] PSYCHE: A Multi-faceted Patient Simulation Framework for Evaluation of Psychiatric Assessment Conversational Agents | [paper] | [code]
[2024/12/30] Exploring and Controlling Diversity in LLM-Agent Conversation | [paper] | [code]
[2024/12/24] Extracting triples from dialogues for conversational social agents | [paper] | [code]
[2024/12/22] Modular Conversational Agents for Surveys and Interviews | [paper] | [code]
[2024/12/21] InfoTech Assistant : A Multimodal Conversational Agent for InfoTechnology Web Portal Queries | [paper] | [code]
[2024/12/13] Script-Based Dialog Policy Planning for LLM-Powered Conversational Agents: A Basic Architecture for an "AI Therapist" | [paper] | [code]
[2024/12/06] CALICO: Conversational Agent Localization via Synthetic Data Generation | [paper] | [code]
[2024/12/05] Educational-Psychological Dialogue Robot Based on Multi-Agent Collaboration | [paper] | [code]
[2024/12/01] Examining Identity Drift in Conversations of LLM Agents | [paper] | [code]
[2024/11/07] Thanos: Enhancing Conversational Agents with Skill-of-Mind-Infused Large Language Model | [paper] | [code]
[2024/11/07] Interactive Dialogue Agents via Reinforcement Learning on Hindsight Regenerations | [paper] | [code]
[2024/11/06] MRJ-Agent: An Effective Jailbreak Agent for Multi-Round Dialogue | [paper] | [code]
[2024/11/01] DARD: A Multi-Agent Approach for Task-Oriented Dialog Systems | [paper] | [code]
[2024/11/01] ReSpAct: Harmonizing Reasoning, Speaking, and Acting Towards Building Large Language Model-Based Conversational AI Agents | [paper] | [code]
[2024/10/29] MARCO: Multi-Agent Real-time Chat Orchestration | [paper] | [code]
[2024/10/25] AGENT-CQ: Automatic Generation and Evaluation of Clarifying Questions for Conversational Search with LLMs | [paper] | [code]
[2024/10/18] Coherence-Driven Multimodal Safety Dialogue with Active Learning for Embodied Agents | [paper] | [code]
[2024/10/15] HR-Agent: A Task-Oriented Dialogue (TOD) LLM Agent Tailored for HR Applications | [paper] | [code]
[2024/10/10] Rewriting Conversational Utterances with Instructed Large Language Models | [paper] | [code]
[2024/09/24] Automated test generation to evaluate tool-augmented LLMs as conversational AI agents | [paper] | [code]
[2024/09/23] Beyond Turn-Based Interfaces: Synchronous LLMs as Full-Duplex Dialogue Agents | [paper] | [code]
[2024/09/13] AI-LieDar: Examine the Trade-off Between Utility and Truthfulness in LLM Agents | [paper] | [code]
[2024/09/06] Sparse Rewards Can Self-Train Dialogue Agents | [paper] | [code]
[2024/09/02] Co-Learning: Code Learning for Multi-Agent Reinforcement Collaborative Framework with Conversational Natural Language Interfaces | [paper] | [code]
[2024/08/27] Into the Unknown Unknowns: Engaged Human Learning through Participation in Language Model Agent Conversations | [paper] | [code]
[2024/08/22] MDD-5k: A New Diagnostic Conversation Dataset for Mental Disorders Synthesized via Neuro-Symbolic LLM Agents | [paper] | [code]
[2024/08/13] What should I wear to a party in a Greek taverna? Evaluation for Conversational Agents in the Fashion Domain | [paper] | [code]
[2024/08/06] OpenOmni: A Collaborative Open Source Tool for Building Future-Ready Multimodal Conversational Agents | [paper] | [code]
[2024/08/03] Self-Emotion Blended Dialogue Generation in Social Simulation Agents | [paper] | [code]
[2024/07/31] Towards Achieving Human Parity on End-to-end Simultaneous Speech Translation via LLM Agent | [paper] | [code]
[2024/07/13] Cohesive Conversations: Enhancing Authenticity in Multi-Agent Simulated Dialogues | [paper] | [code]
[2024/07/04] Controllable Conversations: Planning-Based Dialogue Agent with Large Language Models | [paper] | [code]
[2024/07/01] Empathic Grounding: Explorations using Multimodal Interaction and Large Language Models with Conversational Agents | [paper] | [code]
[2024/06/30] CAMON: Cooperative Agents for Multi-Object Navigation with LLM-based Conversations | [paper] | [code]
[2024/06/09] Peer Review as A Multi-Turn and Long-Context Dialogue with Role-Based Interactions | [paper] | [code]
[2024/05/29] Toward Conversational Agents with Context and Time Sensitive Long-term Memory | [paper] | [code]
[2024/05/16] Speaker Verification in Agent-Generated Conversations | [paper] | [code]
[2024/04/19] Towards Human-centered Proactive Conversational Agents | [paper] | [code]
[2024/04/10] Apollonion: Profile-centric Dialog Agent | [paper] | [code]
[2024/03/17] Improving Dialogue Agents by Decomposing One Global Explicit Annotation with Local Implicit Multimodal Feedback | [paper] | [code]
[2024/03/08] ChatASU: Evoking LLM's Reflexion to Truly Understand Aspect Sentiment in Dialogues | [paper] | [code]
[2024/02/25] Understanding Public Perceptions of AI Conversational Agents: A Cross-Cultural Analysis | [paper] | [code]
[2024/02/23] On the Multi-turn Instruction Following for Conversational Web Agents | [paper] | [code]
[2024/02/20] CHATATC: Large Language Model-Driven Conversational Agents for Supporting Strategic Air Traffic Flow Management | [paper] | [code]
[2024/01/29] Assistive Large Language Model Agents for Socially-Aware Negotiation Dialogues | [paper] | [code]
[2024/01/10] Bootstrapping LLM-based Task-Oriented Dialogue Agents via Self-Talk | [paper] | [code]
[2024/01/02] CharacterEval: A Chinese Benchmark for Role-Playing Conversational Agent Evaluation | [paper] | [code]
[2023/12/21] Team Flow at DRC2023: Building Common Ground and Text-based Turn-taking in a Travel Agent Spoken Dialogue System | [paper] | [code]
[2023/11/15] ToolTalk: Evaluating Tool-Usage in a Conversational Setting | [paper] | [code]
[2023/10/01] Adapting LLM Agents Through Communication | [paper] | [code]
[2023/06/28] Inferring the Goals of Communicating Agents from Actions and Instructions | [paper] | [code]
[2023/04/26] Multi-Party Chat: Conversational Agents in Group Settings with Humans and Models | [paper] | [code]
[2023/03/31] CAMEL: Communicative Agents for "Mind" Exploration of Large Language Model Society | [paper] | [code]

Game Playing

[2025/02/01] Who's the MVP? A Game-Theoretic Evaluation Benchmark for Modular Attribution in LLM Agents | [paper] | [code]
[2025/01/24] Multi-agent KTO: Reinforcing Strategic Interactions of Large Language Model in Language Game | [paper] | [code]
[2024/12/06] TeamCraft: A Benchmark for Multi-Modal Multi-Agent Systems in Minecraft | [paper] | [code]
[2024/11/08] Game-theoretic LLM: Agent Workflow for Negotiation Games | [paper] | [code]
[2024/10/28] Can Machines Think Like Humans? A Behavioral Evaluation of LLM-Agents in Dictator Games | [paper] | [code]
[2024/09/03] An Implementation of Werewolf Agent That does not Truly Trust LLMs | [paper] | [code]
[2024/08/05] Evaluating and Enhancing LLMs Agent based on Theory of Mind in Guandan: A Multi-Player Cooperative Game under Imperfect Information | [paper] | [code]
[2024/07/23] AMONGAGENTS: Evaluating Large Language Models in the Interactive Text-Based Social Deduction Game | [paper] | [code]
[2024/07/17] A LLM Benchmark based on the Minecraft Builder Dialog Agent Task | [paper] | [code]
[2024/06/27] OmniJARVIS: Unified Vision-Language-Action Tokenization Enables Open-World Instruction Following Agents | [paper] | [code]
[2024/06/07] GameBench: Evaluating Strategic Reasoning Abilities of LLM Agents | [paper] | [code]
[2024/06/05] The Good, the Bad, and the Hulk-like GPT: Analyzing Emotional Decisions of Large Language Models in Cooperation and Bargaining Games | [paper] | [code]
[2024/05/24] Hacc-Man: An Arcade Game for Jailbreaking LLMs | [paper] | [code]
[2024/05/23] Human-Agent Cooperation in Games under Incomplete Information through Natural Language Communication | [paper] | [code]
[2024/05/08] LLMs with Personalities in Multi-issue Negotiation Games | [paper] | [code]
[2024/04/30] PANGeA: Procedural Artificial Narrative using Generative AI for Turn-Based Video Games | [paper] | [code]
[2024/04/03] Learn to Disguise: Avoid Refusal Responses in LLM's Defense via a Multi-agent Attacker-Disguiser Game | [paper] | [code]
[2024/03/28] MineLand: Simulating Large-Scale Multi-Agent Interactions with Limited Multimodal Senses and Physical Needs | [paper] | [code]
[2024/03/18] How Far Are We on the Decision-Making of LLMs? Evaluating LLMs' Gaming Ability in Multi-Agent Environments | [paper] | [code]
[2024/02/19] PsychoGAT: A Novel Psychological Measurement Paradigm through Interactive Fiction Games with LLM Agents | [paper] | [code]
[2024/02/13] Large Language Models as Minecraft Agents | [paper] | [code]
[2024/02/12] Large Language Models as Agents in Two-Player Games | [paper] | [code]
[2024/02/04] Enhance Reasoning for Large Language Models in the Game Werewolf | [paper] | [code]
[2024/02/02] PokeLLMon: A Human-Parity Agent for Pokemon Battles with Large Language Models | [paper] | [code]
[2023/12/29] Cooperation on the Fly: Exploring Language Agents for Ad Hoc Teamwork in the Avalon Game | [paper] | [code]
[2023/12/01] Deciphering Digital Detectives: Understanding LLM Behaviors and Capabilities in Multi-Agent Mystery Games | [paper] | [code]
[2023/10/31] Leveraging Word Guessing Games to Assess the Intelligence of Large Language Models | [paper] | [code]
[2023/09/29] Suspicion-Agent: Playing Imperfect Information Games with Theory of Mind Aware GPT-4 | [paper] | [code]
[2023/09/18] MindAgent: Emergent Gaming Interaction | [paper] | [code]
[2023/09/10] An Appraisal-Based Chain-Of-Emotion Architecture for Affective Language Model Game Agents | [paper] | [code]
[2023/09/09] Exploring Large Language Models for Communication Games: An Empirical Study on Werewolf | [paper] | [code]
[2023/08/23] Are ChatGPT and GPT-4 Good Poker Players? -- A Pre-Flop Analysis | [paper] | [code]
[2023/05/31] Recursive Metropolis-Hastings Naming Game: Symbol Emergence in a Multi-agent System based on Probabilistic Generative Models | [paper] | [code]
[2023/05/26] Playing repeated games with Large Language Models | [paper] | [code]
[2023/05/25] Ghost in the Minecraft: Generally Capable Agents for Open-World Environments via Large Language Models with Text-based Knowledge and Memory | [paper] | [code]
[2023/05/08] Knowledge-enhanced Agents for Interactive Text Games | [paper] | [code]
[2023/04/06] Can Large Language Models Play Text Games Well? Current State-of-the-Art and Open Questions | [paper] | [code]

Human-Agent Interaction

[2025/02/17] Leveraging Dual Process Theory in Language Agent Framework for Real-time Simultaneous Human-AI Collaboration | [paper] | [code]
[2025/01/28] CowPilot: A Framework for Autonomous and Human-Agent Collaborative Web Navigation | [paper] | [code]
[2024/12/20] Collaborative Gym: A Framework for Enabling and Evaluating Human-Agent Collaboration | [paper] | [code]
[2024/06/28] Designing and Evaluating Multi-Chatbot Interface for Human-AI Communication: Preliminary Findings from a Persuasion Task | [paper] | [code]
[2024/06/11] Towards Human-AI Collaboration in Healthcare: Guided Deferral Systems with Large Language Models | [paper] | [code]
[2024/06/02] Towards a copilot in BIM authoring tool using a large language model-based agent for intelligent human-machine interaction | [paper] | [code]
[2024/03/05] ChatCite: LLM Agent with Human Workflow Guidance for Comparative Literature Summary | [paper] | [code]
[2024/02/20] Large Language Model-based Human-Agent Collaboration for Complex Task Solving | [paper] | [code]
[2024/02/18] Shaping Human-AI Collaboration: Varied Scaffolding Levels in Co-writing with Language Models | [paper] | [code]
[2024/02/17] MONAL: Model Autophagy Analysis for Modeling Human-AI Interactions | [paper] | [code]
[2023/09/22] Learning to Coordinate with Anyone | [paper] | [code]
[2023/07/31] HAGRID: A Human-LLM Collaborative Dataset for Generative Information-Seeking with Attribution | [paper] | [code]
[2023/04/26] Multi-Party Chat: Conversational Agents in Group Settings with Humans and Models | [paper] | [code]

Tool Usage

[2025/02/27] Why Are Web AI Agents More Vulnerable Than Standalone LLMs? A Security Analysis | [paper] | [code]
[2025/02/24] MobileSteward: Integrating Multiple App-Oriented Agents with Self-Evolution to Automate Cross-App Instructions | [paper] | [code]
[2025/02/24] Mobile-Agent-V: Learning Mobile Device Operation Through Video-Guided Multi-Agent Collaboration | [paper] | [code]
[2025/02/17] LLM Agents Making Agent Tools | [paper] | [code]
[2025/02/17] SMART: Self-Aware Agent for Tool Overuse Mitigation | [paper] | [code]
[2025/02/16] OctoTools: An Agentic Framework with Extensible Tools for Complex Reasoning | [paper] | [code]
[2025/02/12] Can a Single Model Master Both Multi-turn Conversations and Tool Use? CoALM: A Unified Conversational Agentic Language Model | [paper] | [code]
[2025/02/07] Agentic Reasoning: Reasoning LLMs with Tools for the Deep Research | [paper] | [code]
[2025/02/06] Division-of-Thoughts: Harnessing Hybrid Language Model Synergy for Efficient On-Device Agents | [paper] | [code]
[2025/02/05] ReachAgent: Enhancing Mobile Agent via Page Reaching and Operation | [paper] | [code]
[2025/01/28] CowPilot: A Framework for Autonomous and Human-Agent Collaborative Web Navigation | [paper] | [code]
[2025/01/21] UI-TARS: Pioneering Automated GUI Interaction with Native Agents | [paper] | [code]
[2025/01/20] Mobile-Agent-E: Self-Evolving Mobile Assistant for Complex Tasks | [paper] | [code]
[2025/01/20] PlotEdit: Natural Language-Driven Accessible Chart Editing in PDFs via Multimodal LLM Agents | [paper] | [code]
[2025/01/08] InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning and Reflection | [paper] | [code]
[2025/01/08] FinSphere: A Conversational Stock Analysis Agent Equipped with Quantitative Tools based on Real-Time Database | [paper] | [code]
[2025/01/07] PPTAgent: Generating and Evaluating Presentations Beyond Text-to-Slides | [paper] | [code]
[2024/12/28] Efficient Multi-Agent Collaboration with Tool Use for Online Planning in Complex Table Question Answering | [paper] | [code]
[2024/12/21] InfoTech Assistant : A Multimodal Conversational Agent for InfoTechnology Web Portal Queries | [paper] | [code]
[2024/12/12] AgentTrek: Agent Trajectory Synthesis via Guiding Replay with Web Tutorials | [paper] | [code]
[2024/12/08] Cooperative SQL Generation for Segmented Databases By Using Multi-functional LLM Agents | [paper] | [code]
[2024/12/05] Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction | [paper] | [code]
[2024/11/26] ShowUI: One Vision-Language-Action Model for GUI Visual Agent | [paper] | [code]
[2024/11/22] ScribeAgent: Towards Specialized Web Agents Using Production-Scale Workflow Data | [paper] | [code]
[2024/11/20] AdaptAgent: Adapting Multimodal Web Agents with Few-Shot Learning from Human Demonstrations | [paper] | [code]
[2024/11/15] The Dawn of GUI Agent: A Preliminary Case Study with Claude 3.5 Computer Use | [paper] | [code]
[2024/11/04] WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning | [paper] | [code]
[2024/11/04] Attacking Vision-Language Computer Agents via Pop-ups | [paper] | [code]
[2024/11/02] Infant Agent: A Tool-Integrated, Logic-Driven Agent with Cost-Effective API Usage | [paper] | [code]
[2024/10/28] AutoGLM: Autonomous Foundation Agents for GUIs | [paper] | [code]
[2024/10/25] OpenWebVoyager: Building Multimodal Web Agents via Iterative Real-World Exploration, Feedback and Optimization | [paper] | [code]
[2024/10/24] Infogent: An Agent-Based Framework for Web Information Aggregation | [paper] | [code]
[2024/10/23] ReflecTool: Towards Reflection-Aware Tool-Augmented Clinical Agents | [paper] | [code]
[2024/10/22] Large Language Models Empowered Personalized Web Agents | [paper] | [code]
[2024/10/21] VipAct: Visual-Perception Enhancement via Specialized VLM Agent Collaboration and Tool-use | [paper] | [code]
[2024/10/21] Beyond Browsing: API-Based Web Agents | [paper] | [code]
[2024/10/18] Toolshed: Scale Tool-Equipped Agents with Advanced RAG-Tool Fusion and Tool Knowledge Bases | [paper] | [code]
[2024/10/17] Web Agents with World Models: Learning and Leveraging Environment Dynamics in Web Navigation | [paper] | [code]
[2024/10/17] MeNTi: Bridging Medical Calculator and LLM Agent with Nested Tool Calling | [paper] | [code]
[2024/10/17] MobA: A Two-Level Agent System for Efficient Mobile Task Automation | [paper] | [code]
[2024/10/17] AgentOccam: A Simple Yet Strong Baseline for LLM-Based Web Agents | [paper] | [code]
[2024/10/16] Agent Skill Acquisition for Large Language Models via CycleQD | [paper] | [code]
[2024/10/10] Agent S: An Open Agentic Framework that Uses Computers Like a Human | [paper] | [code]
[2024/10/07] Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents | [paper] | [code]
[2024/10/03] NNetNav: Unsupervised Learning of Browser Agents Through Environment Interaction in the Wild | [paper] | [code]
[2024/09/24] Automated test generation to evaluate tool-augmented LLMs as conversational AI agents | [paper] | [code]
[2024/09/17] EIA: Environmental Injection Attack on Generalist Web Agents for Privacy Leakage | [paper] | [code]
[2024/09/01] TinyAgent: Function Calling at the Edge | [paper] | [code]
[2024/08/30] Tool-Assisted Agent on SQL Inspection and Refinement in Real-World Scenarios | [paper] | [code]
[2024/08/15] VerilogCoder: Autonomous Verilog Coding Agents with Graph-based Planning and Abstract Syntax Tree (AST)-based Waveform Tracing Tool | [paper] | [code]
[2024/08/05] Caution for the Environment: Multimodal Agents are Susceptible to Environmental Distractions | [paper] | [code]
[2024/08/01] OmniParser for Pure Vision Based GUI Agent | [paper] | [code]
[2024/07/26] AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agents | [paper] | [code]
[2024/07/22] AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks? | [paper] | [code]
[2024/07/11] GTA: A Benchmark for General Tool Agents | [paper] | [code]
[2024/07/01] Mobile-Bench: An Evaluation Benchmark for LLM-based Mobile Agents | [paper] | [code]
[2024/06/17] GUICourse: From General Vision Language Models to Versatile GUI Agents | [paper] | [code]
[2024/06/16] GUI-WORLD: A Dataset for GUI-oriented Multimodal LLM-based Agents | [paper] | [code]
[2024/06/06] Tool-Planner: Task Planning with Clusters across Multiple Tools | [paper] | [code]
[2024/06/03] Mobile-Agent-v2: Mobile Device Operation Assistant with Effective Navigation via Multi-Agent Collaboration | [paper] | [code]
[2024/06/02] Towards a copilot in BIM authoring tool using a large language model-based agent for intelligent human-machine interaction | [paper] | [code]
[2024/05/30] Large Language Models Can Self-Improve At Web Agent Tasks | [paper] | [code]
[2024/05/17] Latent State Estimation Helps UI Agents to Reason | [paper] | [code]
[2024/05/06] SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering | [paper] | [code]
[2024/05/02] CACTUS: Chemistry Agent Connecting Tool-Usage to Science | [paper] | [code]
[2024/05/01] Navigating WebAI: Training Agents to Complete Web Tasks with Large Language Models and Reinforcement Learning | [paper] | [code]
[2024/04/23] Evaluating Tool-Augmented Agents in Remote Sensing Platforms | [paper] | [code]
[2024/04/17] The Landscape of Emerging AI Agent Architectures for Reasoning, Planning, and Tool Calling: A Survey | [paper] | [code]
[2024/04/17] Octopus v3: Technical Report for On-device Sub-billion Multimodal AI Agent | [paper] | [code]
[2024/04/16] Grounded Language Agent for Product Search via Intelligent Web Interactions | [paper] | [code]
[2024/04/04] AutoWebGLM: A Large Language Model-based Web Navigating Agent | [paper] | [code]
[2024/04/01] Rapid Mobile App Development for Generative AI Agents on MIT App Inventor | [paper] | [code]
[2024/03/05] InjecAgent: Benchmarking Indirect Prompt Injections in Tool-Integrated Large Language Model Agents | [paper] | [code]
[2024/03/05] Android in the Zoo: Chain-of-Action-Thought for GUI Agents | [paper] | [code]
[2024/02/27] BASES: Large-scale Web Search User Simulation with Large Language Model based Agents | [paper] | [code]
[2024/02/26] Look Before You Leap: Towards Decision-Aware and Generalizable Tool-Usage for Large Language Models | [paper] | [code]
[2024/02/23] On the Multi-turn Instruction Following for Conversational Web Agents | [paper] | [code]
[2024/02/20] AgentMD: Empowering Language Agents for Risk Prediction with Large-Scale Clinical Tool Learning | [paper] | [code]
[2024/02/18] SciAgent: Tool-augmented Language Models for Scientific Reasoning | [paper] | [code]
[2024/02/16] ToolSword: Unveiling Safety Issues of Large Language Models in Tool Learning Across Three Stages | [paper] | [code]
[2024/02/08] UFO: A UI-Focused Agent for Windows OS Interaction | [paper] | [code]
[2024/02/06] AnyTool: Self-Reflective, Hierarchical Agents for Large-Scale API Calls | [paper] | [code]
[2024/01/11] EASYTOOL: Enhancing LLM-based Agents with Concise Tool Instruction | [paper] | [code]
[2024/01/03] GPT-4V(ision) is a Generalist Web Agent, if Grounded | [paper] | [code]
[2023/12/21] AppAgent: Multimodal Agents as Smartphone Users | [paper] | [code]
[2023/12/18] CLOVA: A Closed-Loop Visual Assistant with Tool Usage and Update | [paper] | [code]
[2023/12/14] CogAgent: A Visual Language Model for GUI Agents | [paper] | [code]
[2023/11/19] TPTU-v2: Boosting Task Planning and Tool Usage of Large Language Model-based Agents in Real-world Systems | [paper] | [code]
[2023/11/15] ToolTalk: Evaluating Tool-Usage in a Conversational Setting | [paper] | [code]
[2023/11/10] Smart Agent-Based Modeling: On the Use of Large Language Models in Computer Simulations | [paper] | [code]
[2023/10/12] A Zero-Shot Language Agent for Computer Control with Structured Reflection | [paper] | [code]
[2023/08/07] TPTU: Large Language Model-based AI Agents for Task Planning and Tool Usage | [paper] | [code]
[2023/06/09] Mind2Web: Towards a Generalist Agent for the Web | [paper] | [code]
[2023/05/22] Making Language Models Better Tool Learners with Execution Feedback | [paper] | [code]
[2023/05/19] ToolkenGPT: Augmenting Frozen Language Models with Massive Tools via Tool Embeddings | [paper] | [code]

Simulation

[2025/02/06] Simulating the Emergence of Differential Case Marking with Communicating Neural-Network Agents | [paper] | [code]
[2025/02/03] Eliciting Language Model Behaviors with Investigator Agents | [paper] | [code]
[2025/01/25] Are Human Interactions Replicable by Generative Agents? A Case Study on Pronoun Usage in Hierarchical Interactions | [paper] | [code]
[2025/01/19] Self-Explanation in Social AI Agents | [paper] | [code]
[2025/01/12] LLMs Model Non-WEIRD Populations: Experiments with Synthetic Cultural Agents | [paper] | [code]
[2024/12/10] Political Actor Agent: Simulating Legislative System for Roll Call Votes Prediction with Large Language Models | [paper] | [code]
[2024/11/18] OASIS: Open Agent Social Interaction Simulations with One Million Agents | [paper] | [code]
[2024/10/28] ElectionSim: Massive Population Election Simulation Powered by Large Language Model Driven Agents | [paper] | [code]
[2024/10/24] Schema-Guided Culture-Aware Complex Event Simulation with Multi-Agent Role-Play | [paper] | [code]
[2024/10/18] SRAP-Agent: Simulating and Optimizing Scarce Resource Allocation Policy with LLM-based Agent | [paper] | [code]
[2024/10/05] Large Language Models can Achieve Social Balance | [paper] | [code]
[2024/09/25] Plurals: A System for Guiding LLMs Via Simulated Social Ensembles | [paper] | [code]
[2024/09/14] Synergistic Simulations: Multi-Agent Problem Solving with Large Language Models | [paper] | [code]
[2024/09/02] Agentic Society: Merging skeleton from real world and texture from Large Language Model | [paper] | [code]
[2024/08/28] Logic-Enhanced Language Model Agents for Trustworthy Social Simulations | [paper] | [code]
[2024/08/15] AgentCourt: Simulating Court with Adversarial Evolvable Lawyer Agents | [paper] | [code]
[2024/08/03] Self-Emotion Blended Dialogue Generation in Social Simulation Agents | [paper] | [code]
[2024/06/26] Simulating The U.S. Senate: An LLM-Driven Agent Approach to Modeling Legislative Behavior and Bipartisanship | [paper] | [code]
[2024/06/20] Artificial Leviathan: Exploring Social Evolution of LLM Agents Through the Lens of Hobbesian Social Contract Theory | [paper] | [code]
[2024/06/10] Can Language Models Serve as Text-Based World Simulators? | [paper] | [code]
[2024/05/12] Exploring the Potential of Conversational AI Support for Agent-Based Social Simulation Model Design | [paper] | [code]
[2024/04/23] BattleAgent: Multi-modal Dynamic Emulation on Historical Battles to Complement Historical Analysis | [paper] | [code]
[2024/03/20] AgentGroupChat: An Interactive Group Chat Simulacra For Better Eliciting Emergent Behavior | [paper] | [code]
[2024/03/05] AgentsCourt: Building Judicial Decision-Making Agents with Court Debate Simulation and Legal Knowledge Augmentation | [paper] | [code]
[2024/02/26] Unveiling the Truth and Facilitating Change: Towards Agent-based Large-scale Social Movement Simulation | [paper] | [code]
[2024/02/20] What if LLMs Have Different World Views: Simulating Alien Civilizations with LLM-based Agents | [paper] | [code]
[2024/02/07] Can Large Language Model Agents Simulate Human Trust Behavior? | [paper] | [code]
[2024/01/08] SpeechAgents: Human-Communication Simulation with Multi-Modal Multi-Agent Systems | [paper] | [code]
[2023/12/06] LLM as OS, Agents as Apps: Envisioning AIOS, Agents and the AIOS-Agent Ecosystem | [paper] | [code]
[2023/11/28] War and Peace (WarAgent): Large Language Model-based Multi-Agent Simulation of World Wars | [paper] | [code]
[2023/10/10] MetaAgents: Simulating Interactions of Human Behaviors for LLM-based Task-oriented Coordination via Collaborative Generative Agents | [paper] | [code]
[2023/06/05] User Behavior Simulation with Large Language Model based Agents | [paper] | [code]
[2023/05/26] Training Socially Aligned Language Models on Simulated Social Interactions | [paper] | [code]
[2023/04/07] Generative Agents: Interactive Simulacra of Human Behavior | [paper] | [code]

Application

Math

[2025/02/25] LLM Knows Geometry Better than Algebra: Numerical Understanding of LLM-Based Agents in A Trading Arena | [paper] | [code]
[2025/02/18] One Size doesn't Fit All: A Personalized Conversational Tutoring Agent for Mathematics Instruction | [paper] | [code]
[2024/10/29] Flow-DPO: Improving LLM Mathematical Reasoning through Online Multi-Agent Learning | [paper] | [code]
[2024/10/13] Expanding Search Space with Diverse Prompting Agents: An Efficient Sampling Approach for LLM Mathematical Reasoning | [paper] | [code]
[2024/08/03] MathLearner: A Large Language Model Agent Framework for Learning to Solve Mathematical Problems | [paper] | [code]
[2024/04/10] MathVC: An LLM-Simulated Multi-Character Virtual Classroom for Mathematics Education | [paper] | [code]
[2024/04/06] MACM: Utilizing a Multi-Agent System for Condition Mining in Solving Complex Mathematical Problems | [paper] | [code]

Chemistry

[2025/01/23] Hypothesis Generation for Materials Discovery and Design Using Goal-Driven and Constraint-Guided LLM Agents | [paper] | [code]
[2025/01/11] ChemAgent: Self-updating Library in Large Language Models Improves Chemical Reasoning | [paper] | [code]
[2024/08/29] HoneyComb: A Flexible LLM-Based Agent System for Materials Science | [paper] | [code]
[2024/06/26] A Review of Large Language Models and Autonomous Agents in Chemistry | [paper] | [code]

Biology

[2024/10/16] PRefLexOR: Preference-based Recursive Language Modeling for Exploratory Optimization of Reasoning and Agentic Thinking | [paper] | [code]
[2024/06/29] BioKGBench: A Knowledge Graph Checking Benchmark of AI Agent for Biomedical Science | [paper] | [code]
[2024/05/25] GeneAgent: Self-verification Language Agent for Gene Set Knowledge Discovery using Domain Databases | [paper] | [code]
[2024/04/27] CRISPR-GPT: An LLM Agent for Automated Design of Gene-Editing Experiments | [paper] | [code]
[2024/04/03] Empowering Biomedical Discovery with AI Agents | [paper] | [code]
[2024/01/27] ProtAgents: Protein discovery via large language model multi-agent collaborations combining physics and machine learning | [paper] | [code]

Physics

[2025/01/23] Hypothesis Generation for Materials Discovery and Design Using Goal-Driven and Constraint-Guided LLM Agents | [paper] | [code]
[2024/12/09] StarWhisper Telescope: Agent-Based Observation Assistant System to Approach AI Astrophysicist | [paper] | [code]
[2024/08/29] HoneyComb: A Flexible LLM-Based Agent System for Materials Science | [paper] | [code]
[2024/01/27] ProtAgents: Protein discovery via large language model multi-agent collaborations combining physics and machine learning | [paper] | [code]

Geography

[2024/12/23] MineAgent: Towards Remote-Sensing Mineral Exploration with Multimodal Large Language Models | [paper] | [code]
[2024/07/13] An Autonomous GIS Agent Framework for Geospatial Data Retrieval | [paper] | [code]

Art

[2025/01/22] FilmAgent: A Multi-Agent Framework for End-to-End Film Automation in Virtual 3D Spaces | [paper] | [code]
[2024/10/02] Agent-Driven Large Language Models for Mandarin Lyric Generation | [paper] | [code]
[2024/09/05] LLM-based multi-agent poetry generation in non-cooperative environments | [paper] | [code]
[2024/08/13] What should I wear to a party in a Greek taverna? Evaluation for Conversational Agents in the Fashion Domain | [paper] | [code]
[2024/07/01] IBSEN: Director-Actor Agent Collaboration for Controllable and Interactive Drama Script Generation | [paper] | [code]
[2024/04/28] ComposerX: Multi-Agent Symbolic Music Composition with LLMs | [paper] | [code]
[2024/03/12] AesopAgent: Agent-driven Evolutionary System on Story-to-Video Production | [paper] | [code]
[2023/10/18] MusicAgent: An AI Agent for Music Understanding and Generation with Large Language Models | [paper] | [code]

Medicine

[2025/02/27] M^3Builder: A Multi-Agent System for Automated Machine Learning in Medical Imaging | [paper] | [code]
[2025/02/26] MEDDxAgent: A Unified Modular Agent Framework for Explainable Automatic Differential Diagnosis | [paper] | [code]
[2025/02/25] Scaffolding Empathy: Training Counselors with Simulated Patients and Utterance-level Performance Visualizations | [paper] | [code]
[2025/02/19] LIDDIA: Language-based Intelligent Drug Discovery Agent | [paper] | [code]
[2025/02/18] An LLM-Powered Agent for Physiological Data Analysis: A Case Study on PPG-based Heart Rate Estimation | [paper] | [code]
[2025/02/18] Sleepless Nights, Sugary Days: Creating Synthetic Users with Health Conditions for Realistic Coaching Agent Interactions | [paper] | [code]
[2025/02/13] PathFinder: A Multi-Modal Multi-Agent System for Medical Diagnostic Decision-Making Applied to Histopathology | [paper] | [code]
[2025/02/09] HamRaz: A Culture-Based Persian Conversation Dataset for Person-Centered Therapy Using LLM Agents | [paper] | [code]
[2025/02/05] CAMI: A Counselor Agent Supporting Motivational Interviewing through State Inference and Topic Exploration | [paper] | [code]
[2025/02/02] Agent-Based Uncertainty Awareness Improves Automated Radiology Report Labeling with an Open-Source Large Language Model | [paper] | [code]
[2025/01/27] MADP: Multi-Agent Deductive Planning for Enhanced Cognitive-Behavioral Mental Health Question Answer | [paper] | [code]
[2025/01/16] AutoCBT: An Autonomous Multi-agent Framework for Cognitive Behavioral Therapy in Psychological Counseling | [paper] | [code]
[2025/01/03] PSYCHE: A Multi-faceted Patient Simulation Framework for Evaluation of Psychiatric Assessment Conversational Agents | [paper] | [code]
[2024/12/19] PsyDraw: A Multi-Agent Multimodal System for Mental Health Screening in Left-Behind Children | [paper] | [code]
[2024/12/17] RareAgents: Advancing Rare Disease Care through LLM-Empowered Multi-disciplinary Team | [paper] | [code]
[2024/12/16] LLMs Can Simulate Standardized Patients via Agent Coevolution | [paper] | [code]
[2024/12/13] Script-Based Dialog Policy Planning for LLM-Powered Conversational Agents: A Basic Architecture for an "AI Therapist" | [paper] | [code]
[2024/12/05] Educational-Psychological Dialogue Robot Based on Multi-Agent Collaboration | [paper] | [code]
[2024/12/02] Medchain: Bridging the Gap Between LLM Agents and Clinical Practice through Interactive Sequential Benchmarking | [paper] | [code]
[2024/11/21] PIORS: Personalized Intelligent Outpatient Reception based on Large Language Model with Multi-Agents Medical Scenario Simulation | [paper] | [code]
[2024/11/16] Towards Next-Generation Medical Agent: How o1 is Reshaping Decision-Making in Medical Scenarios | [paper] | [code]
[2024/11/03] EcoAct: Economic Agent Determines When to Register What Action | [paper] | [code]
[2024/10/25] $\texttt{PatentAgent}$: Intelligent Agent for Automated Pharmaceutical Patent Analysis | [paper] | [code]
[2024/10/23] ReflecTool: Towards Reflection-Aware Tool-Augmented Clinical Agents | [paper] | [code]
[2024/10/17] MeNTi: Bridging Medical Calculator and LLM Agent with Nested Tool Calling | [paper] | [code]
[2024/10/16] MedAide: Towards an Omni Medical Aide via Specialized LLM-based Multi-Agent Collaboration | [paper] | [code]
[2024/10/02] Zodiac: A Cardiologist-Level LLM Framework for Multi-Agent Diagnostics | [paper] | [code]
[2024/08/28] Interactive Agents: Simulating Counselor-Client Psychological Counseling via Role-Playing LLM-to-LLM Interactions | [paper] | [code]
[2024/08/23] DrugAgent: Explainable Drug Repurposing Agent with Large Language Model-based Reasoning | [paper] | [code]
[2024/08/14] Development of a Large Language Model-based Multi-Agent Clinical Decision Support System for Korean Triage and Acuity Scale (KTAS)-Based Triage and Treatment Planning in Emergency Departments | [paper] | [code]
[2024/07/18] CoD, Towards an Interpretable Medical Agent using Chain of Diagnosis | [paper] | [code]
[2024/07/10] Virtual Agents for Alcohol Use Counseling: Exploring LLM-Powered Motivational Interviewing | [paper] | [code]
[2024/07/03] MentalAgora: A Gateway to Advanced Personalized Care in Mental Health through Multi-Agent Debating and Attribute Control | [paper] | [code]
[2024/07/02] MMedAgent: Learning to Use Medical Tools with Multi-modal Agent | [paper] | [code]
[2024/04/23] ClinicalAgent: Clinical Trial Multi-Agent System with Large Language Model-based Reasoning | [paper] | [code]
[2024/04/03] Empowering Biomedical Discovery with AI Agents | [paper] | [code]
[2024/02/20] Can Large Language Models be Used to Provide Psychological Counselling? An Analysis of GPT-4-Generated Responses Using Role-play Dialogues | [paper] | [code]
[2024/02/20] AgentMD: Empowering Language Agents for Risk Prediction with Large-Scale Clinical Tool Learning | [paper] | [code]
[2024/02/15] Knowledge-Infused LLM-Powered Conversational Health Agent: A Case Study for Diabetes Patients | [paper] | [code]
[2024/02/01] Generation, Distillation and Evaluation of Motivational Interviewing-Style Reflections with a Foundational Language Model | [paper] | [code]
[2023/12/19] Can ChatGPT be Your Personal Medical Assistant? | [paper] | [code]
[2023/10/03] Exploring Collaboration Mechanisms for LLM Agents: A Social Psychology View | [paper] | [code]

Finance

[2025/02/25] LLM Knows Geometry Better than Algebra: Numerical Understanding of LLM-Based Agents in A Trading Arena | [paper] | [code]
[2025/02/08] Agentic AI Systems Applied to tasks in Financial Services: Modeling and model risk management crews | [paper] | [code]
[2025/02/01] MarketSenseAI 2.0: Enhancing Stock Analysis through LLM Agents | [paper] | [code]
[2025/01/08] FinSphere: A Conversational Stock Analysis Agent Equipped with Quantitative Tools based on Real-Time Database | [paper] | [code]
[2024/12/27] OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis | [paper] | [code]
[2024/12/19] Beyond the Sum: Unlocking AI Agents Potential Through Market Forces | [paper] | [code]
[2024/11/07] Enhancing Investment Analysis: Optimizing AI-Agent Collaboration in Financial Research | [paper] | [code]
[2024/10/29] Enhancing Financial Question Answering with a Multi-Agent Reflection Framework | [paper] | [code]
[2024/09/19] Strategic Collusion of LLM Agents: Market Division in Multi-Commodity Competitions | [paper] | [code]
[2024/07/18] dzFinNlp at AraFinNLP: Improving Intent Detection in Financial Conversational Agents | [paper] | [code]
[2024/07/09] FinCon: A Synthesized LLM Multi-Agent System with Conceptual Verbal Reinforcement for Enhanced Financial Decision Making | [paper] | [code]
[2024/07/05] Towards Automated Functional Equation Proving: A Benchmark Dataset and A Domain-Specific In-Context Agent | [paper] | [code]
[2024/05/07] Enhancing the Efficiency and Accuracy of Underlying Asset Reviews in Structured Finance: The Application of Multi-agent Framework | [paper] | [code]

Software Engineering

[2025/02/19] An LLM-based Agent for Reliable Docker Environment Configuration | [paper] | [code]
[2025/02/18] Training Turn-by-Turn Verifiers for Dialogue Tutoring Agents: The Curious Case of LLMs as Your Coding Tutors | [paper] | [code]
[2025/02/18] UXAgent: An LLM Agent-Based Usability Testing Framework for Web Design | [paper] | [code]
[2025/02/14] The Ann Arbor Architecture for Agent-Oriented Programming | [paper] | [code]
[2025/02/11] Multi-Agent Collaboration for Multilingual Code Instruction Tuning | [paper] | [code]
[2025/02/10] SyncMind: Measuring Agent Out-of-Sync Recovery in Collaborative Software Engineering | [paper] | [code]
[2025/02/08] CODESIM: Multi-Agent Code Generation and Problem Solving through Simulation-Driven Planning and Debugging | [paper] | [code]
[2024/12/30] Training Software Engineering Agents and Verifiers with SWE-Gym | [paper] | [code]
[2024/12/24] Molly: Making Large Language Model Agents Solve Python Problem More Logically | [paper] | [code]
[2024/12/16] Seeker: Towards Exception Safety Code Generation with Intermediate Language Agents Framework | [paper] | [code]
[2024/11/07] CodeTree: Agent-guided Tree Search for Code Generation with Large Language Models | [paper] | [code]
[2024/10/29] SceneGenAgent: Precise Industrial Scene Generation with Coding Agent | [paper] | [code]
[2024/10/09] DA-Code: Agent Data Science Code Generation Benchmark for Large Language Models | [paper] | [code]
[2024/10/09] Seeker: Enhancing Exception Handling in Code with LLM-based Multi-Agent Approach | [paper] | [code]
[2024/09/02] Co-Learning: Code Learning for Multi-Agent Reinforcement Collaborative Framework with Conversational Natural Language Interfaces | [paper] | [code]
[2024/08/19] GoNoGo: An Efficient LLM-based Multi-Agent System for Streamlining Automotive Software Release Decision-Making | [paper] | [code]
[2024/08/13] Diversity Empowers Intelligence: Integrating Expertise of Software Engineering Agents | [paper] | [code]
[2024/08/05] LLM Agents Improve Semantic Code Search | [paper] | [code]
[2024/07/26] AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agents | [paper] | [code]
[2024/07/01] Agentless: Demystifying LLM-based Software Engineering Agents | [paper] | [code]
[2024/06/13] Multi-Agent Software Development through Cross-Team Collaboration | [paper] | [code]
[2024/05/06] SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering | [paper] | [code]
[2024/04/11] Behavior Trees Enable Structured Programming of Language Model Agents | [paper] | [code]
[2024/04/02] Self-Organized Agents: A LLM Multi-Agent Framework toward Ultra Large-Scale Code Generation and Optimization | [paper] | [code]
[2024/03/02] SceneCraft: An LLM Agent for Synthesizing 3D Scene as Blender Code | [paper] | [code]
[2024/02/26] RepoAgent: An LLM-Powered Open-Source Framework for Repository-level Code Documentation Generation | [paper] | [code]
[2024/02/19] WorldCoder, a Model-Based LLM Agent: Building World Models by Writing Code and Interacting with the Environment | [paper] | [code]
[2024/02/02] StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback | [paper] | [code]
[2024/02/01] Executable Code Actions Elicit Better LLM Agents | [paper] | [code]
[2023/12/28] Experiential Co-Learning of Software-Developing Agents | [paper] | [code]
[2023/12/20] AgentCoder: Multi-Agent-based Code Generation with Iterative Testing and Optimisation | [paper] | [code]
[2023/07/27] PanGu-Coder2: Boosting Large Language Models for Code with Ranking Feedback | [paper] | [code]
[2023/07/16] ChatDev: Communicative Agents for Software Development | [paper] | [code]
[2023/04/15] Self-collaboration Code Generation via ChatGPT | [paper] | [code]

Research

[2025/02/25] LAG: LLM agents for Leaderboard Auto Generation on Demanding | [paper] | [code]
[2025/02/20] MLGym: A New Framework and Benchmark for Advancing AI Research Agents | [paper] | [code]
[2025/02/07] Agentic Reasoning: Reasoning LLMs with Tools for the Deep Research | [paper] | [code]
[2025/01/08] Agent Laboratory: Using LLM Agents as Research Assistants | [paper] | [code]
[2024/10/17] Chain of Ideas: Revolutionizing Research Via Novel Idea Development with LLM Agents | [paper] | [code]
[2024/10/12] Many Heads Are Better Than One: Improved Scientific Idea Generation by A LLM-Based Multi-Agent System | [paper] | [code]
[2024/10/07] ScienceAgentBench: Toward Rigorous Assessment of Language Agents for Data-Driven Scientific Discovery | [paper] | [code]
[2024/10/07] ImProver: Agent-Based Automated Proof Optimization | [paper] | [code]
[2024/09/23] Towards a Realistic Long-Term Benchmark for Open-Web Research Agents | [paper] | [code]
[2024/09/17] CORE-Bench: Fostering the Credibility of Published Research Through a Computational Reproducibility Agent Benchmark | [paper] | [code]
[2024/09/12] DSBench: How Far Are Data Science Agents to Becoming Data Science Experts? | [paper] | [code]
[2024/09/11] SUPER: Evaluating Agents on Setting Up and Executing Tasks from Research Repositories | [paper] | [code]
[2024/09/10] Language agents achieve superhuman synthesis of scientific knowledge | [paper] | [code]
[2024/09/09] SciAgents: Automating scientific discovery through multi-agent intelligent graph reasoning | [paper] | [code]
[2024/08/26] MLR-Copilot: Autonomous Machine Learning Research based on Large Language Models Agents | [paper] | [code]
[2024/08/20] Automating Knowledge Discovery from Scientific Literature via LLMs: A Dual-Agent Approach with Progressive Ontology Prompting | [paper] | [code]
[2024/06/13] ResearchArena: Benchmarking Large Language Models' Ability to Collect and Organize Information as Research Agents | [paper] | [code]
[2024/05/02] CACTUS: Chemistry Agent Connecting Tool-Usage to Science | [paper] | [code]
[2024/04/09] SurveyAgent: A Conversational System for Personalized and Efficient Research Survey | [paper] | [code]
[2024/02/28] Data Interpreter: An LLM Agent For Data Science | [paper] | [code]
[2024/02/18] SciAgent: Tool-augmented Language Models for Scientific Reasoning | [paper] | [code]
[2024/02/06] Prioritizing Safeguarding Over Autonomy: Risks of LLM Agents for Science | [paper] | [code]
[2024/01/08] MARG: Multi-Agent Review Generation for Scientific Papers | [paper] | [code]

Automation

Workflow

[2025/02/24] Turning Conversations into Workflows: A Framework to Extract and Evaluate Dialog Workflows for Service AI Agents | [paper] | [code]
[2025/02/11] EvoFlow: Evolving Diverse Agentic Workflows On The Fly | [paper] | [code]
[2025/02/07] nvAgent: Automated Data Visualization from Natural Language via Collaborative Agent Workflow | [paper] | [code]
[2025/02/06] ScoreFlow: Mastering LLM Agent Workflows via Score-based Preference Optimization | [paper] | [code]
[2024/12/17] An Agentic Approach to Automatic Creation of P&ID Diagrams from Natural Language Descriptions | [paper] | [code]
[2024/12/15] LAW: Legal Agentic Workflows for Custody and Fund Services Contracts | [paper] | [code]
[2024/11/22] ScribeAgent: Towards Specialized Web Agents Using Production-Scale Workflow Data | [paper] | [code]
[2024/11/12] BudgetMLAgent: A Cost-Effective LLM Multi-Agent system for Automating Machine Learning Tasks | [paper] | [code]
[2024/11/08] Game-theoretic LLM: Agent Workflow for Negotiation Games | [paper] | [code]
[2024/10/24] An LLM Agent for Automatic Geospatial Data Analysis | [paper] | [code]
[2024/10/17] From Barriers to Tactics: A Behavioral Science-Informed Agentic Workflow for Personalized Nutrition Coaching | [paper] | [code]
[2024/10/17] ControlAgent: Automating Control System Design via Novel Integration of LLM Agents and Domain Expertise | [paper] | [code]
[2024/10/16] Proactive Agent: Shifting LLM Agents from Reactive Responses to Active Assistance | [paper] | [code]
[2024/10/14] AFlow: Automating Agentic Workflow Generation | [paper] | [code]
[2024/10/10] Benchmarking Agentic Workflow Generation | [paper] | [code]
[2024/10/03] AutoML-Agent: A Multi-Agent LLM Framework for Full-Pipeline AutoML | [paper] | [code]
[2024/09/11] Agent Workflow Memory | [paper] | [code]
[2024/08/16] The Fellowship of the LLMs: Multi-Agent Workflows for Synthetic Preference Optimization Dataset Generation | [paper] | [code]
[2024/07/15] Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows? | [paper] | [code]
[2024/07/03] AgentInstruct: Toward Generative Teaching with Agentic Flows | [paper] | [code]
[2024/07/01] AutoFlow: Automated Workflow Generation for Large Language Model Agents | [paper] | [code]
[2024/06/21] Autonomous Agents for Collaborative Task under Information Asymmetry | [paper] | [code]
[2024/03/13] AutoGuide: Automated Generation and Selection of Context-Aware Guidelines for Large Language Model Agents | [paper] | [code]
[2024/03/05] ChatCite: LLM Agent with Human Workflow Guidance for Comparative Literature Summary | [paper] | [code]

Automatic Evaluation

[2025/02/26] Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems | [paper] | [code]
[2025/02/25] Debt Collection Negotiations with Large Language Models: An Evaluation System and Optimizing Decision Making with Multi-Agent | [paper] | [code]
[2025/02/25] FACT-AUDIT: An Adaptive Multi-Agent Framework for Dynamic Fact-Checking Evaluation of Large Language Models | [paper] | [code]
[2025/02/14] Automated Hypothesis Validation with Agentic Sequential Falsifications | [paper] | [code]
[2025/01/19] IntellAgent: A Multi-Agent Framework for Evaluating Conversational AI Systems | [paper] | [code]
[2025/01/17] Agent-as-Judge for Factual Summarization of Long Narratives | [paper] | [code]
[2025/01/03] PSYCHE: A Multi-faceted Patient Simulation Framework for Evaluation of Psychiatric Assessment Conversational Agents | [paper] | [code]
[2024/12/28] M-MAD: Multidimensional Multi-Agent Debate for Advanced Machine Translation Evaluation | [paper] | [code]
[2024/12/10] Evaluation Agent: Efficient and Promptable Evaluation Framework for Visual Generative Models | [paper] | [code]
[2024/11/25] SAGEval: The frontiers of Satisfactory Agent based NLG Evaluation for reference-free open-ended text | [paper] | [code]
[2024/11/15] Large Language Models as User-Agents for Evaluating Task-Oriented-Dialogue Systems | [paper] | [code]
[2024/09/24] Automated test generation to evaluate tool-augmented LLMs as conversational AI agents | [paper] | [code]
[2024/09/22] The Ability of Large Language Models to Evaluate Constraint-satisfaction in Agent Responses to Open-ended Requests | [paper] | [code]
[2024/09/13] Safeguarding Decentralized Social Media: LLM Agents for Automating Community Rule Compliance | [paper] | [code]
[2024/05/23] ALI-Agent: Assessing LLMs' Alignment with Human Values via Agent-based Evaluation | [paper] | [code]
[2024/03/28] MATEval: A Multi-Agent Discussion Framework for Advancing Open-Ended Text Evaluation | [paper] | [code]
[2023/08/14] ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate | [paper] | [code]

Training

Fine tuning

[2025/02/24] Training a Generally Curious Agent | [paper] | [code]
[2025/02/19] UM_FHS at TREC 2024 PLABA: Exploration of Fine-tuning and AI agent approach for plain language adaptations of biomedical text | [paper] | [code]
[2025/02/18] Training Turn-by-Turn Verifiers for Dialogue Tutoring Agents: The Curious Case of LLMs as Your Coding Tutors | [paper] | [code]
[2025/02/11] Multi-Agent Collaboration for Multilingual Code Instruction Tuning | [paper] | [code]
[2025/02/10] Hephaestus: Improving Fundamental Agent Capabilities of Large Language Models through Continual Pre-Training | [paper] | [code]
[2025/01/10] Multiagent Finetuning: Self Improvement with Diverse Reasoning Chains | [paper] | [code]
[2025/01/03] AgentRefine: Enhancing Agent Generalization through Refinement Tuning | [paper] | [code]
[2024/12/30] Training Software Engineering Agents and Verifiers with SWE-Gym | [paper] | [code]
[2024/12/30] Aviary: training language agents on challenging scientific tasks | [paper] | [code]
[2024/12/16] Virtual Agent-Based Communication Skills Training to Facilitate Health Persuasion Among Peers | [paper] | [code]
[2024/11/29] Training Agents with Weakly Supervised Feedback from Large Language Models | [paper] | [code]
[2024/11/21] Star-Agents: Automatic Data Optimization with LLM Agents for Instruction Tuning | [paper] | [code]
[2024/10/20] Training Language Models to Critique With Multi-agent Feedback | [paper] | [code]
[2024/10/16] Proactive Agent: Shifting LLM Agents from Reactive Responses to Active Assistance | [paper] | [code]
[2024/10/10] AgentBank: Towards Generalized LLM Agents via Fine-Tuning on 50000+ Interaction Trajectories | [paper] | [code]
[2024/07/25] Recursive Introspection: Teaching Language Model Agents How to Self-Improve | [paper] | [code]
[2024/06/11] CoEvol: Constructing Better Responses for Instruction Finetuning through Multi-Agent Cooperation | [paper] | [code]
[2024/04/05] Social Skill Training with Large Language Models | [paper] | [code]
[2024/04/02] CMAT: A Multi-Agent Collaboration Tuning Framework for Enhancing Small Language Models | [paper] | [code]
[2024/03/29] Enhancing the General Agent Capabilities of Low-Parameter LLMs through Tuning and Multi-Branch Reasoning | [paper] | [code]
[2024/03/21] ReAct Meets ActRe: When Language Agents Enjoy Training Data Autonomy | [paper] | [code]
[2024/03/19] Agent-FLAN: Designing Data and Methods of Effective Agent Tuning for Large Language Models | [paper] | [code]
[2024/02/23] AgentOhana: Design Unified Data and Training Pipeline for Effective Agent Learning | [paper] | [code]
[2024/02/21] Neeko: Leveraging Dynamic LoRA for Efficient Multi-Character Role-Playing Agent | [paper] | [code]
[2024/02/18] Learning From Failure: Integrating Negative Examples when Fine-tuning Large Language Models as Agents | [paper] | [code]
[2024/01/10] Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training | [paper] | [code]
[2024/01/05] From LLM to Conversational Agent: A Memory Enhanced Architecture with Fine-Tuning of Large Language Models | [paper] | [code]
[2023/12/22] Pangu-Agent: A Fine-Tunable Generalist Agent with Structured Reasoning | [paper] | [code]
[2023/11/28] Embodied Multi-Modal Agent trained by an LLM from a Parallel TextWorld | [paper] | [code]
[2023/10/19] AgentTuning: Enabling Generalized Agent Abilities for LLMs | [paper] | [code]
[2023/10/09] FireAct: Toward Language Agent Fine-tuning | [paper] | [code]
[2023/05/26] Training Socially Aligned Language Models on Simulated Social Interactions | [paper] | [code]

RL

[2025/02/25] AgentRM: Enhancing Agent Generalization with Reward Modeling | [paper] | [code]
[2025/02/09] Training Language Models for Social Deduction with Multi-Agent Reinforcement Learning | [paper] | [code]
[2025/02/06] Multi-Agent Reinforcement Learning with Focal Diversity Optimization | [paper] | [code]
[2025/01/25] Improving Retrieval-Augmented Generation through Multi-Agent Reinforcement Learning | [paper] | [code]
[2024/11/26] LLM-Based Offline Learning for Embodied Agents via Consistency-Guided Reward Ensemble | [paper] | [code]
[2024/11/07] Interactive Dialogue Agents via Reinforcement Learning on Hindsight Regenerations | [paper] | [code]
[2024/11/06] From Novice to Expert: LLM Agent Policy Optimization via Step-wise Reinforcement Learning | [paper] | [code]
[2024/11/04] WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning | [paper] | [code]
[2024/10/11] Words as Beacons: Guiding RL Agents with High-Level Language Prompts | [paper] | [code]
[2024/10/10] MACPO: Weak-to-Strong Alignment via Multi-Agent Contrastive Preference Optimization | [paper] | [code]
[2024/07/02] Predicting vs. Acting: A Trade-off Between World Modeling & Agent Modeling | [paper] | [code]
[2024/06/26] Mental Modeling of Reinforcement Learning Agents by Language Models | [paper] | [code]
[2024/06/17] Input Conditioned Graph Generation for Language Agents | [paper] | [code]
[2024/06/05] LLM-based Rewriting of Inappropriate Argumentation using Reinforcement Learning from Machine Feedback | [paper] | [code]
[2024/06/03] Re-ReST: Reflection-Reinforced Self-Training for Language Agents | [paper] | [code]
[2024/05/30] Safe Multi-agent Reinforcement Learning with Natural Language Constraints | [paper] | [code]
[2024/05/17] LLM-based Multi-Agent Reinforcement Learning: Current and Future Directions | [paper] | [code]
[2024/05/16] Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learning | [paper] | [code]
[2024/05/01] Navigating WebAI: Training Agents to Complete Web Tasks with Large Language Models and Reinforcement Learning | [paper] | [code]
[2024/03/05] Language Guided Exploration for RL Agents in Text Environments | [paper] | [code]
[2024/02/17] Offline Training of Language Model Agents with Functions as Learnable Weights | [paper] | [code]
[2024/02/02] StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback | [paper] | [code]
[2023/10/25] MultiPrompter: Cooperative Prompt Optimization with Multi-Agent Reinforcement Learning | [paper] | [code]
[2023/03/29] Skill Reinforcement Learning and Planning for Open-World Long-Horizon Tasks | [paper] | [code]

DPO

[2025/02/26] Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems | [paper] | [code]
[2025/01/03] SDPO: Segment-Level Direct Preference Optimization for Social Agents | [paper] | [code]
[2024/10/29] Flow-DPO: Improving LLM Mathematical Reasoning through Online Multi-Agent Learning | [paper] | [code]
[2024/05/31] Learning to Clarify: Multi-turn Conversations with Action-Based Contrastive Self-Training | [paper] | [code]

Scaling

Single-Agent Framework

[2025/02/26] TheoremExplainAgent: Towards Multimodal Explanations for LLM Theorem Understanding | [paper] | [code]
[2025/02/14] Agentic Verification for Ambiguous Query Disambiguation | [paper] | [code]
[2025/02/12] SPeCtrum: A Grounded Framework for Multidimensional Identity Representation in LLM-Based Agent | [paper] | [code]
[2025/02/09] AutoAgent: A Fully-Automated and Zero-Code Framework for LLM Agents | [paper] | [code]
[2025/02/04] Adaptive Self-improvement LLM Agentic System for ML Library Development | [paper] | [code]
[2025/01/31] Enabling Autonomic Microservice Management through Self-Learning Agents | [paper] | [code]
[2024/12/28] OneKE: A Dockerized Schema-Guided LLM Agent-based Knowledge Extraction System | [paper] | [code]
[2024/12/21] Self-guided Knowledgeable Network of Thoughts: Amplifying Reasoning with Large Language Models | [paper] | [code]
[2024/12/15] AgentPS: Agentic Process Supervision for Multi-modal Content Quality Assurance through Multi-round QA | [paper] | [code]
[2024/12/11] A Multimodal Social Agent | [paper] | [code]
[2024/12/11] Federated In-Context LLM Agent Learning | [paper] | [code]
[2024/12/04] How to Correctly do Semantic Backpropagation on Language-based Agentic Systems | [paper] | [code]
[2024/12/02] SAUP: Situation Awareness Uncertainty Propagation on LLM Agent | [paper] | [code]
[2024/12/01] Towards Adaptive Mechanism Activation in Language Agent | [paper] | [code]
[2024/11/20] MindForge: Empowering Embodied Agents with Theory of Mind for Lifelong Collaborative Learning | [paper] | [code]
[2024/11/16] IntentGPT: Few-shot Intent Discovery with Large Language Models | [paper] | [code]
[2024/11/04] DynaSaur: Large Language Agents Beyond Predefined Actions | [paper] | [code]
[2024/11/04] CRMArena: Understanding the Capacity of LLM Agents to Perform Professional CRM Tasks in Realistic Environments | [paper] | [code]
[2024/10/29] ADAM: An Embodied Causal Agent in Open-World Environments | [paper] | [code]
[2024/10/27] TrajAgent: An Agent Framework for Unified Trajectory Modelling | [paper] | [code]
[2024/10/22] Adsorb-Agent: Autonomous Identification of Stable Adsorption Configurations via Large Language Model Agent | [paper] | [code]
[2024/10/11] Encoding Agent Trajectories as Representations with Sequence Transformers | [paper] | [code]
[2024/10/10] Agents Thinking Fast and Slow: A Talker-Reasoner Architecture | [paper] | [code]
[2024/10/08] AgentSquare: Automatic LLM Agent Search in Modular Design Space | [paper] | [code]
[2024/10/08] Applying Refusal-Vector Ablation to Llama 3.1 70B Agents | [paper] | [code]
[2024/09/24] MOSS: Enabling Code-Driven Evolution and Context Management for AI Agents | [paper] | [code]
[2024/09/19] Textualized Agent-Style Reasoning for Complex Tasks by Multiple Round LLM Generation | [paper] | [code]
[2024/09/15] Automatic Control With Human-Like Reasoning: Exploring Language Model Embodied Air Traffic Agents | [paper] | [code]
[2024/09/12] Self-Supervised Inference of Agents in Trustless Environments | [paper] | [code]
[2024/09/05] From MOOC to MAIC: Reshaping Online Teaching and Learning through LLM-driven Agents | [paper] | [code]
[2024/09/05] Rx Strategist: Prescription Verification using LLM Agents System | [paper] | [code]
[2024/09/03] AgentRE: An Agent-Based Framework for Navigating Complex Information Landscapes in Relation Extraction | [paper] | [code]
[2024/08/26] AgentMove: A Large Language Model based Agentic Framework for Zero-shot Next Location Prediction | [paper] | [code]
[2024/08/19] Anim-Director: A Large Multimodal Model Powered Agent for Controllable Animation Video Generation | [paper] | [code]
[2024/08/13] Causal Agent based on Large Language Model | [paper] | [code]
[2024/08/02] Coalitions of Large Language Models Increase the Robustness of AI Agents | [paper] | [code]
[2024/07/27] AgentPeerTalk: Empowering Students through Agentic-AI-Driven Discernment of Bullying and Joking in Peer Interactions in Schools | [paper] | [code]
[2024/07/25] Enhancing Agent Learning through World Dynamics Modeling | [paper] | [code]
[2024/07/25] RestoreAgent: Autonomous Image Restoration Agent via Multimodal Large Language Models | [paper] | [code]
[2024/07/16] Preemptive Detection and Correction of Misaligned Actions in LLM Agents | [paper] | [code]
[2024/07/15] Sibyl: Simple yet Effective Agent Framework for Complex Real-world Reasoning | [paper] | [code]
[2024/07/02] Beyond Numeric Awards: In-Context Dueling Bandits with LLM Agents | [paper] | [code]
[2024/06/24] OmAgent: A Multi-modal Agent Framework for Complex Video Understanding with Task Divide-and-Conquer | [paper] | [code]
[2024/06/07] SelfGoal: Your Language Agents Already Know How to Achieve High-level Goals | [paper] | [code]
[2024/05/25] AutoManual: Constructing Instruction Manuals by LLM Agents via Interactive Environmental Learning | [paper] | [code]
[2024/05/24] Intelligent Go-Explore: Standing on the Shoulders of Giant Foundation Models | [paper] | [code]
[2024/05/16] Agent Design Pattern Catalogue: A Collection of Architectural Patterns for Foundation Model based Agents | [paper] | [code]
[2024/04/30] Large Language Model Agent for Fake News Detection | [paper] | [code]
[2024/04/28] Logic Agent: Enhancing Validity with Logic Rule Invocation | [paper] | [code]
[2024/04/13] LLMSat: A Large Language Model-Based Goal-Oriented Agent for Autonomous Space Exploration | [paper] | [code]
[2024/04/01] TraveLER: A Modular Multi-LMM Agent Framework for Video Question-Answering | [paper] | [code]
[2024/03/29] ITCMA: A Generative Agent Based on a Computational Consciousness Structure | [paper] | [code]
[2024/02/25] Bootstrapping Cognitive Agents with a Large Language Model | [paper] | [code]
[2024/02/24] Empowering Large Language Model Agents through Action Learning | [paper] | [code]
[2024/02/20] Soft Self-Consistency Improves Language Model Agents | [paper] | [code]
[2024/02/04] NavHint: Vision and Language Navigation Agent with a Hint Generator | [paper] | [code]
[2024/01/05] AFSPP: Agent Framework for Shaping Preference and Personality with Large Language Models | [paper] | [code]
[2023/11/23] Controlling Large Language Model-based Agents for Large-Scale Decision-Making: An Actor-Critic Approach | [paper] | [code]
[2023/11/02] ProAgent: From Robotic Process Automation to Agentic Process Automation | [paper] | [code]
[2023/10/16] CLIN: A Continually Learning Language Agent for Rapid Task Adaptation and Generalization | [paper] | [code]
[2023/09/29] Reason for Future, Act for Now: A Principled Framework for Autonomous LLM Agents with Provable Sample Efficiency | [paper] | [code]
[2023/09/14] Agents: An Open-source Framework for Autonomous Language Agents | [paper] | [code]
[2023/09/08] A Versatile Graph Learning Approach through LLM-based Agent | [paper] | [code]
[2023/09/05] Cognitive Architectures for Language Agents | [paper] | [code]
[2023/05/27] SwiftSage: A Generative Agent with Fast and Slow Thinking for Complex Interactive Tasks | [paper] | [code]
[2023/05/25] Voyager: An Open-Ended Embodied Agent with Large Language Models | [paper] | [code]

Multi-Agent System

[2025/02/27] M^3Builder: A Multi-Agent System for Automated Machine Learning in Medical Imaging | [paper] | [code]
[2025/02/26] Stay Focused: Problem Drift in Multi-Agent Debate | [paper] | [code]
[2025/02/26] Voting or Consensus? Decision-Making in Multi-Agent Debate | [paper] | [code]
[2025/02/25] Enhancing Text Classification with a Novel Multi-Agent Collaboration Framework Leveraging BERT | [paper] | [code]
[2025/02/25] A Cooperative Multi-Agent Framework for Zero-Shot Named Entity Recognition | [paper] | [code]
[2025/02/25] Debt Collection Negotiations with Large Language Models: An Evaluation System and Optimizing Decision Making with Multi-Agent | [paper] | [code]
[2025/02/25] FACT-AUDIT: An Adaptive Multi-Agent Framework for Dynamic Fact-Checking Evaluation of Large Language Models | [paper] | [code]
[2025/02/24] MobileSteward: Integrating Multiple App-Oriented Agents with Self-Evolution to Automate Cross-App Instructions | [paper] | [code]
[2025/02/24] Mobile-Agent-V: Learning Mobile Device Operation Through Video-Guided Multi-Agent Collaboration | [paper] | [code]
[2025/02/24] METAL: A Multi-Agent Framework for Chart Generation with Test-Time Scaling | [paper] | [code]
[2025/02/23] The Hidden Strength of Disagreement: Unraveling the Consensus-Diversity Tradeoff in Adaptive Multi-Agent Systems | [paper] | [code]
[2025/02/20] Enhancing Language Multi-Agent Learning with Multi-Agent Credit Re-Assignment for Interactive Environment Generalization | [paper] | [code]
[2025/02/20] CORBA: Contagious Recursive Blocking Attacks on Multi-Agent Systems Based on Large Language Models | [paper] | [code]
[2025/02/17] Table-Critic: A Multi-Agent Framework for Collaborative Criticism and Refinement in Table Reasoning | [paper] | [code]
[2025/02/17] HARBOR: Exploring Persona Dynamics in Multi-Agent Competition | [paper] | [code]
[2025/02/15] Divergent Thoughts toward One Goal: LLM-based Multi-Agent Collaboration System for Electronic Design Automation | [paper] | [code]
[2025/02/13] PathFinder: A Multi-Modal Multi-Agent System for Medical Diagnostic Decision-Making Applied to Histopathology | [paper] | [code]
[2025/02/13] Mind the Gaps: Logical English, Prolog, and Multi-agent Systems for Autonomous Vehicles | [paper] | [code]
[2025/02/12] Faithful, Unfaithful or Ambiguous? Multi-Agent Debate with Initial Stance for Summary Evaluation | [paper] | [code]
[2025/02/12] If Multi-Agent Debate is the Answer, What is the Question? | [paper] | [code]
[2025/02/11] Don't Just Demo, Teach Me the Principles: A Principle-Based Multi-Agent Prompting Strategy for Text Classification | [paper] | [code]
[2025/02/11] Multi-Agent Collaboration for Multilingual Code Instruction Tuning | [paper] | [code]
[2025/02/10] KARMA: Leveraging Multi-Agent LLMs for Automated Knowledge Graph Enrichment | [paper] | [code]
[2025/02/09] Preventing Rogue Agents Improves Multi-Agent Collaboration | [paper] | [code]
[2025/02/08] CODESIM: Multi-Agent Code Generation and Problem Solving through Simulation-Driven Planning and Debugging | [paper] | [code]
[2025/02/08] Multi-Agent Simulator Drives Language Models for Legal Intensive Interaction | [paper] | [code]
[2025/02/07] S$^2$-MAD: Breaking the Token Barrier to Enhance Multi-Agent Debate Efficiency | [paper] | [code]
[2025/02/06] Multi-Agent Reinforcement Learning with Focal Diversity Optimization | [paper] | [code]
[2025/02/06] Enhancing Online Learning Efficiency Through Heterogeneous Resource Integration with a Multi-Agent RAG System | [paper] | [code]
[2025/02/06] Multi-agent Architecture Search via Agentic Supernet | [paper] | [code]
[2025/02/04] Position: Scaling LLM Agents Requires Asymptotic Analysis with LLM Primitives | [paper] | [code]
[2025/02/04] Multi-Agent Design: Optimizing Agents with Better Prompts and Topologies | [paper] | [code]
[2025/02/03] PlotGen: Multi-Agent LLM-based Scientific Data Visualization via Multimodal Feedback | [paper] | [code]
[2025/02/03] ChartCitor: Multi-Agent Framework for Fine-Grained Chart Visual Attribution | [paper] | [code]
[2025/02/02] Rethinking Mixture-of-Agents: Is Mixing Different Large Language Models Beneficial? | [paper] | [code]
[2025/02/02] Efficient Multi-Agent System Training with Data Influence-Oriented Tree Search | [paper] | [code]
[2025/01/29] Layered Chain-of-Thought Prompting for Multi-Agent LLM Systems: A Comprehensive Approach to Explainable Large Language Models | [paper] | [code]
[2025/01/27] MADP: Multi-Agent Deductive Planning for Enhanced Cognitive-Behavioral Mental Health Question Answer | [paper] | [code]
[2025/01/25] Improving Retrieval-Augmented Generation through Multi-Agent Reinforcement Learning | [paper] | [code]
[2025/01/24] Multi-agent KTO: Reinforcing Strategic Interactions of Large Language Model in Language Game | [paper] | [code]
[2025/01/24] Unmasking Conversational Bias in AI Multiagent Systems | [paper] | [code]
[2025/01/22] FilmAgent: A Multi-Agent Framework for End-to-End Film Automation in Virtual 3D Spaces | [paper] | [code]
[2025/01/19] IntellAgent: A Multi-Agent Framework for Evaluating Conversational AI Systems | [paper] | [code]
[2025/01/16] AutoCBT: An Autonomous Multi-agent Framework for Cognitive Behavioral Therapy in Psychological Counseling | [paper] | [code]
[2025/01/14] Talk to Right Specialists: Routing and Planning in Multi-agent System for Question Answering | [paper] | [code]
[2025/01/05] LatteReview: A Multi-Agent Framework for Systematic Review Automation Using Large Language Models | [paper] | [code]
[2025/01/02] Harnessing Multi-Agent LLMs for Complex Engineering Problem-Solving: A Framework for Senior Design Projects | [paper] | [code]
[2024/12/30] Distributed Mixture-of-Agents for Edge Inference with Large Language Models | [paper] | [code]
[2024/12/28] M-MAD: Multidimensional Multi-Agent Debate for Advanced Machine Translation Evaluation | [paper] | [code]
[2024/12/28] Efficient Multi-Agent Collaboration with Tool Use for Online Planning in Complex Table Question Answering | [paper] | [code]
[2024/12/24] Multi-Agents Based on Large Language Models for Knowledge-based Visual Question Answering | [paper] | [code]
[2024/12/22] Multi-Agent Sampling: Scaling Inference Compute for Data Synthesis with Tree Search-Based Agentic Collaboration | [paper] | [code]
[2024/12/22] A Multi-AI Agent System for Autonomous Optimization of Agentic AI Solutions via Iterative Refinement and LLM-Driven Feedback Loops | [paper] | [code]
[2024/12/20] Mitigating Social Bias in Large Language Models: A Multi-Objective Approach within a Multi-Agent Framework | [paper] | [code]
[2024/12/19] PsyDraw: A Multi-Agent Multimodal System for Mental Health Screening in Left-Behind Children | [paper] | [code]
[2024/12/18] Gradual Vigilance and Interval Communication: Enhancing Value Alignment in Multi-Agent Debates | [paper] | [code]
[2024/12/15] Cultural Palette: Pluralising Culture Alignment via Multi-agent Palette | [paper] | [code]
[2024/12/13] AutoPatent: A Multi-Agent Framework for Automatic Patent Generation | [paper] | [code]
[2024/12/12] DiverseAgentEntropy: Quantifying Black-Box LLM Uncertainty through Diverse Perspectives and Multi-Agent Interaction | [paper] | [code]
[2024/12/11] NAT-NL2GQL: A Novel Multi-Agent Framework for Translating Natural Language to Graph Query Language | [paper] | [code]
[2024/12/10] AutoPrep: Natural Language Question-Aware Data Preparation with a Multi-Agent Framework | [paper] | [code]
[2024/12/07] SLA Management in Reconfigurable Multi-Agent RAG: A Systems Approach to Question Answering | [paper] | [code]
[2024/12/06] Breaking Event Rumor Detection via Stance-Separated Multi-Agent Debate | [paper] | [code]
[2024/12/06] Towards Effective GenAI Multi-Agent Collaboration: Design and Evaluation for Enterprise Applications | [paper] | [code]
[2024/12/06] Enhancing LLMs for Impression Generation in Radiology Reports through a Multi-Agent System | [paper] | [code]
[2024/12/06] TeamCraft: A Benchmark for Multi-Modal Multi-Agent Systems in Minecraft | [paper] | [code]
[2024/12/05] Educational-Psychological Dialogue Robot Based on Multi-Agent Collaboration | [paper] | [code]
[2024/12/01] Multi-Agent Collaboration in Incident Response with Large Language Models | [paper] | [code]
[2024/11/28] MAG-V: A Multi-Agent Framework for Synthetic Data Generation and Verification | [paper] | [code]
[2024/11/21] PIORS: Personalized Intelligent Outpatient Reception based on Large Language Model with Multi-Agents Medical Scenario Simulation | [paper] | [code]
[2024/11/21] Enhancing LLMs for Power System Simulations: A Feedback-driven Multi-agent Framework | [paper] | [code]
[2024/11/18] The Power of Many: Multi-Agent Multimodal Models for Cultural Image Captioning | [paper] | [code]
[2024/11/12] BudgetMLAgent: A Cost-Effective LLM Multi-Agent system for Automating Machine Learning Tasks | [paper] | [code]
[2024/11/11] Using Generative AI and Multi-Agents to Provide Automatic Feedback | [paper] | [code]
[2024/11/09] Mixture of Knowledge Minigraph Agents for Literature Review Generation | [paper] | [code]
[2024/11/05] SAUCE: Synchronous and Asynchronous User-Customizable Environment for Multi-Agent LLM Interaction | [paper] | [code]
[2024/11/05] SMoA: Improving Multi-agent Large Language Models with Sparse Mixture-of-Agents | [paper] | [code]
[2024/11/01] DARD: A Multi-Agent Approach for Task-Oriented Dialog Systems | [paper] | [code]
[2024/10/30] ACC-Debate: An Actor-Critic Approach to Multi-Agent Debate | [paper] | [code]
[2024/10/29] Flow-DPO: Improving LLM Mathematical Reasoning through Online Multi-Agent Learning | [paper] | [code]
[2024/10/29] MARCO: Multi-Agent Real-time Chat Orchestration | [paper] | [code]
[2024/10/28] CRAT: A Multi-Agent Framework for Causality-Enhanced Reflective and Retrieval-Augmented Translation with Large Language Models | [paper] | [code]
[2024/10/27] AutoKaggle: A Multi-Agent Framework for Autonomous Data Science Competitions | [paper] | [code]
[2024/10/24] Schema-Guided Culture-Aware Complex Event Simulation with Multi-Agent Role-Play | [paper] | [code]
[2024/10/23] GraphTeam: Facilitating Large Language Model-based Graph Analysis via Multi-Agent Collaboration | [paper] | [code]
[2024/10/22] Decoding Time Series with LLMs: A Multi-Agent Framework for Cross-Domain Annotation | [paper] | [code]
[2024/10/19] An Electoral Approach to Diversify LLM-based Multi-Agent Collective Decision-Making | [paper] | [code]
[2024/10/18] Synthesizing Post-Training Data for LLMs through Multi-Agent Simulation | [paper] | [code]
[2024/10/17] AdaSwitch: Adaptive Switching between Small and Large Agents for Effective Cloud-Local Collaborative Learning | [paper] | [code]
[2024/10/16] PRefLexOR: Preference-based Recursive Language Modeling for Exploratory Optimization of Reasoning and Agentic Thinking | [paper] | [code]
[2024/10/13] LLM-Based Multi-Agent Systems are Scalable Graph Generative Models | [paper] | [code]
[2024/10/12] Many Heads Are Better Than One: Improved Scientific Idea Generation by A LLM-Based Multi-Agent System | [paper] | [code]
[2024/10/11] JAILJUDGE: A Comprehensive Jailbreak Judge Benchmark with Multi-Agent Enhanced Explanation Evaluation Framework | [paper] | [code]
[2024/10/11] PEAR: A Robust and Flexible Automation Framework for Ptychography Enabled by Multiple Large Language Model Agents | [paper] | [code]
[2024/10/10] AI-Press: A Multi-Agent News Generating and Feedback Simulation System Powered by Large Language Models | [paper] | [code]
[2024/10/10] Multi-Agent Collaborative Data Selection for Efficient LLM Pretraining | [paper] | [code]
[2024/10/10] Optima: Optimizing Effectiveness and Efficiency for LLM-Based Multi-Agent System | [paper] | [code]
[2024/10/10] Prompt Engineering a Schizophrenia Chatbot: Utilizing a Multi-Agent Approach for Enhanced Compliance with Prompt Instructions | [paper] | [code]
[2024/10/10] Diversity of Thought Elicits Stronger Reasoning Capabilities in Multi-Agent Debate Frameworks | [paper] | [code]
[2024/10/09] Seeker: Enhancing Exception Handling in Code with LLM-based Multi-Agent Approach | [paper] | [code]
[2024/10/07] Adversarial Multi-Agent Evaluation of Large Language Models through Iterative Debates | [paper] | [code]
[2024/10/06] MindScope: Exploring cognitive biases in large language models through Multi-Agent Systems | [paper] | [code]
[2024/10/03] Towards Implicit Bias Detection and Mitigation in Multi-Agent LLM Interactions | [paper] | [code]
[2024/10/03] Agents' Room: Narrative Generation through Multi-step Collaboration | [paper] | [code]
[2024/10/03] Can Large Language Models Grasp Legal Theories? Enhance Legal Reasoning with Insights from Multi-Agent Collaboration | [paper] | [code]
[2024/10/03] ColaCare: Enhancing Electronic Health Record Modeling through Large Language Model-Driven Multi-Agent Collaboration | [paper] | [code]
[2024/10/03] AutoML-Agent: A Multi-Agent LLM Framework for Full-Pipeline AutoML | [paper] | [code]
[2024/10/02] RGD: Multi-LLM Based Agent Debugger via Refinement and Generation Guidance | [paper] | [code]
[2024/10/02] Zodiac: A Cardiologist-Level LLM Framework for Multi-Agent Diagnostics | [paper] | [code]
[2024/09/21] Towards Automated Patent Workflows: AI-Orchestrated Multi-Agent Framework for Intellectual Property Management and Analysis | [paper] | [code]
[2024/09/21] GroupDebate: Enhancing the Efficiency of Multi-Agent Debate Using Group Discussion | [paper] | [code]
[2024/09/20] Minstrel: Structural Prompt Generation with Multi-Agents Coordination for Non-AI Experts | [paper] | [code]
[2024/09/18] MAgICoRe: Multi-Agent, Iterative, Coarse-to-Fine Refinement for Reasoning | [paper] | [code]
[2024/09/17] The Art of Storytelling: Multi-Agent Generative AI for Dynamic Multimodal Narratives | [paper] | [code]
[2024/09/16] Instigating Cooperation among LLM Agents Using Adaptive Information Modulation | [paper] | [code]
[2024/09/14] Synergistic Simulations: Multi-Agent Problem Solving with Large Language Models | [paper] | [code]
[2024/09/12] Knowledge Tagging with Large Language Model based Multi-Agent System | [paper] | [code]
[2024/09/11] Propaganda to Hate: A Multimodal Analysis of Arabic Memes with Multi-Agent LLMs | [paper] | [code]
[2024/09/09] SciAgents: Automating scientific discovery through multi-agent intelligent graph reasoning | [paper] | [code]
[2024/09/06] Using Large Language Models to Generate Authentic Multi-agent Knowledge Work Datasets | [paper] | [code]
[2024/09/05] xLAM: A Family of Large Action Models to Empower AI Agent Systems | [paper] | [code]
[2024/09/02] Co-Learning: Code Learning for Multi-Agent Reinforcement Collaborative Framework with Conversational Natural Language Interfaces | [paper] | [code]
[2024/08/28] BattleAgentBench: A Benchmark for Evaluating Cooperation and Competition Capabilities of Language Models in Multi-Agent Systems | [paper] | [code]
[2024/08/27] AgentMonitor: A Plug-and-Play Framework for Predictive and Secure Multi-Agent Systems | [paper] | [code]
[2024/08/24] Towards Human-Level Understanding of Complex Process Engineering Schematics: A Pedagogical, Introspective Multi-Agent Framework for Open-Domain Question Answering | [paper] | [code]
[2024/08/22] MuMA-ToM: Multi-modal Multi-Agent Theory of Mind | [paper] | [code]
[2024/08/21] DreamFactory: Pioneering Multi-Scene Long Video Generation with a Multi-Agent Framework | [paper] | [code]
[2024/08/16] The Fellowship of the LLMs: Multi-Agent Workflows for Synthetic Preference Optimization Dataset Generation | [paper] | [code]
[2024/08/15] MAG-SQL: Multi-Agent Generative Approach with Soft Schema Linking and Iterative Sub-SQL Refinement for Text-to-SQL | [paper] | [code]
[2024/08/15] Text2BIM: Generating Building Models Using a Large Language Model-based Multi-Agent Framework | [paper] | [code]
[2024/08/14] Development of a Large Language Model-based Multi-Agent Clinical Decision Support System for Korean Triage and Acuity Scale (KTAS)-Based Triage and Treatment Planning in Emergency Departments | [paper] | [code]
[2024/08/08] Can LLMs Beat Humans in Debating? A Dynamic Multi-agent Framework for Competitive Debate | [paper] | [code]
[2024/08/05] ReDel: A Toolkit for LLM-Powered Recursive Multi-Agent Systems | [paper] | [code]
[2024/08/05] Evaluating and Enhancing LLMs Agent based on Theory of Mind in Guandan: A Multi-Player Cooperative Game under Imperfect Information | [paper] | [code]
[2024/07/23] LawLuo: A Multi-Agent Collaborative Framework for Multi-Round Chinese Legal Consultation | [paper] | [code]
[2024/07/21] Multi-Agent Causal Discovery Using Large Language Models | [paper] | [code]
[2024/07/19] NeLLCom-X: A Comprehensive Neural-Agent Framework to Simulate Language Learning and Group Communication | [paper] | [code]
[2024/07/17] Towards Collaborative Intelligence: Propagating Intentions and Reasoning for Multi-Agent Coordination with Large Language Models | [paper] | [code]
[2024/07/16] InvAgent: A Large Language Model based Multi-Agent System for Inventory Management in Supply Chains | [paper] | [code]
[2024/07/13] Synergistic Multi-Agent Framework with Trajectory Learning for Knowledge-Intensive Tasks | [paper] | [code]
[2024/07/13] Cohesive Conversations: Enhancing Authenticity in Multi-Agent Simulated Dialogues | [paper] | [code]
[2024/07/10] Flooding Spread of Manipulated Knowledge in LLM-Based Multi-Agent Communities | [paper] | [code]
[2024/07/09] FinCon: A Synthesized LLM Multi-Agent System with Conceptual Verbal Reinforcement for Enhanced Financial Decision Making | [paper] | [code]
[2024/07/09] Internet of Agents: Weaving a Web of Heterogeneous Agents for Collaborative Intelligence | [paper] | [code]
[2024/07/04] Solving Zebra Puzzles Using Constraint-Guided Multi-Agent Systems | [paper] | [code]
[2024/07/03] MentalAgora: A Gateway to Advanced Personalized Care in Mental Health through Multi-Agent Debating and Attribute Control | [paper] | [code]
[2024/06/17] Improving Multi-Agent Debate with Sparse Communication Topology | [paper] | [code]
[2024/06/13] Multi-Agent Software Development through Cross-Team Collaboration | [paper] | [code]
[2024/06/11] CoEvol: Constructing Better Responses for Instruction Finetuning through Multi-Agent Cooperation | [paper] | [code]
[2024/06/07] Mixture-of-Agents Enhances Large Language Model Capabilities | [paper] | [code]
[2024/06/05] Towards Detecting LLMs Hallucination via Markov Chain-based Multi-agent Debate Framework | [paper] | [code]
[2024/06/04] Chain of Agents: Large Language Models Collaborating on Long-Context Tasks | [paper] | [code]
[2024/06/03] Mobile-Agent-v2: Mobile Device Operation Assistant with Effective Navigation via Multi-Agent Collaboration | [paper] | [code]
[2024/05/30] Safe Multi-agent Reinforcement Learning with Natural Language Constraints | [paper] | [code]
[2024/05/23] CityGPT: Towards Urban IoT Learning, Analysis and Interaction with Multi-Agent System | [paper] | [code]
[2024/05/20] (Perhaps) Beyond Human Translation: Harnessing Multi-Agent Collaboration for Translating Ultra-Long Literary Texts | [paper] | [code]
[2024/05/10] LLM Discussion: Enhancing the Creativity of Large Language Models via Discussion Framework and Role-Play | [paper] | [code]
[2024/05/07] Enhancing the Efficiency and Accuracy of Underlying Asset Reviews in Structured Finance: The Application of Multi-agent Framework | [paper] | [code]
[2024/05/06] Persona Inconstancy in Multi-Agent LLM Collaboration: Conformity, Confabulation, and Impersonation | [paper] | [code]
[2024/05/05] Language Evolution for Evading Social Media Regulation via LLM-based Multi-agent Simulation | [paper] | [code]
[2024/04/25] Cooperate or Collapse: Emergence of Sustainable Cooperation in a Society of LLM Agents | [paper] | [code]
[2024/04/23] ClinicalAgent: Clinical Trial Multi-Agent System with Large Language Model-based Reasoning | [paper] | [code]
[2024/04/14] Confidence Calibration and Rationalization for LLMs via Multi-Agent Deliberation | [paper] | [code]
[2024/04/12] Leveraging Multi-AI Agents for Cross-Domain Knowledge Discovery | [paper] | [code]
[2024/04/09] Foundation Models to the Rescue: Deadlock Resolution in Connected Multi-Robot Systems | [paper] | [code]
[2024/04/08] 360$^\circ$REA: Towards A Reusable Experience Accumulation with 360{\deg} Assessment for Multi-Agent System | [paper] | [code]
[2024/04/06] MACM: Utilizing a Multi-Agent System for Condition Mining in Solving Complex Mathematical Problems | [paper] | [code]
[2024/04/02] Self-Organized Agents: A LLM Multi-Agent Framework toward Ultra Large-Scale Code Generation and Optimization | [paper] | [code]
[2024/04/02] CMAT: A Multi-Agent Collaboration Tuning Framework for Enhancing Small Language Models | [paper] | [code]
[2024/03/26] MAGIS: LLM-Based Multi-Agent Framework for GitHub Issue Resolution | [paper] | [code]
[2024/03/22] CACA Agent: Capability Collaboration based AI Agent | [paper] | [code]
[2024/03/21] Multi-Agent VQA: Exploring Multi-Agent Foundation Models in Zero-Shot Visual Question Answering | [paper] | [code]
[2024/03/19] Embodied LLM Agents Learn to Cooperate in Organized Teams | [paper] | [code]
[2024/03/12] Transforming Competition into Collaboration: The Revolutionary Role of Multi-Agent Systems and Language Models in Modern Organizations | [paper] | [code]
[2024/03/02] AutoDefense: Multi-Agent LLM Defense against Jailbreak Attacks | [paper] | [code]
[2024/02/28] Rethinking the Bounds of LLM Reasoning: Are Multi-Agent Discussions the Key? | [paper] | [code]
[2024/02/26] Chain-of-Discussion: A Multi-Model Framework for Complex Evidence-Based Question Answering | [paper] | [code]
[2024/02/26] LLMArena: Assessing Capabilities of Large Language Models in Dynamic Multi-Agent Environments | [paper] | [code]
[2024/02/21] LLM Based Multi-Agent Generation of Semi-structured Documents from Semantic Templates in the Public Administration Domain | [paper] | [code]
[2024/02/18] Benchmark Self-Evolving: A Multi-Agent Framework for Dynamic LLM Evaluation | [paper] | [code]
[2024/02/18] LongAgent: Scaling Language Models to 128k Context through Multi-Agent Collaboration | [paper] | [code]
[2024/02/15] TDAG: A Multi-Agent Framework based on Dynamic Task Decomposition and Agent Generation | [paper] | [code]
[2024/02/03] More Agents Is All You Need | [paper] | [code]
[2024/02/02] Reasoning Capacity in Multi-Agent Systems: Limitations, Challenges and Human-Centered Solutions | [paper] | [code]
[2024/02/02] A Multi-Agent Conversational Recommender System | [paper] | [code]
[2024/01/11] Combating Adversarial Attacks with Multi-Agent Debate | [paper] | [code]
[2024/01/08] MARG: Multi-Agent Review Generation for Scientific Papers | [paper] | [code]
[2024/01/08] SpeechAgents: Human-Communication Simulation with Multi-Modal Multi-Agent Systems | [paper] | [code]
[2024/01/08] Why Solving Multi-agent Path Finding with Large Language Model has not Succeeded Yet | [paper] | [code]
[2023/12/20] AgentCoder: Multi-Agent-based Code Generation with Iterative Testing and Optimisation | [paper] | [code]
[2023/10/31] Multi-Agent Consensus Seeking via Large Language Models | [paper] | [code]
[2023/10/25] MultiPrompter: Cooperative Prompt Optimization with Multi-Agent Reinforcement Learning | [paper] | [code]
[2023/08/22] ProAgent: Building Proactive Cooperative Agents with Large Language Models | [paper] | [code]
[2023/08/21] AgentVerse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors | [paper] | [code]
[2023/08/14] ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate | [paper] | [code]
[2023/08/01] MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework | [paper] | [code]
[2023/06/05] Multi-Agent Collaboration: Harnessing the Power of Intelligent LLM Agents | [paper] | [code]
[2023/05/31] Recursive Metropolis-Hastings Naming Game: Symbol Emergence in a Multi-agent System based on Probabilistic Generative Models | [paper] | [code]
[2023/05/30] Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate | [paper] | [code]
[2023/04/26] Multi-Party Chat: Conversational Agents in Group Settings with Humans and Models | [paper] | [code]
[2023/04/24] ChatLLM Network: More brains, More intelligence | [paper] | [code]

Stability

Safety

[2025/02/20] CORBA: Contagious Recursive Blocking Attacks on Multi-Agent Systems Based on Large Language Models | [paper] | [code]
[2025/02/18] AEIA-MN: Evaluating the Robustness of Multimodal LLM-Powered Mobile Agents Against Active Environmental Injection Attacks | [paper] | [code]
[2025/02/17] "Nuclear Deployed!": Analyzing Catastrophic Risks in Decision-making of Autonomous LLM Agents | [paper] | [code]
[2025/02/01] ALU: Agentic LLM Unlearning | [paper] | [code]
[2025/01/28] Context is Key for Agent Security | [paper] | [code]
[2024/12/21] The Task Shield: Enforcing Task Alignment to Defend Against Indirect Prompt Injection in LLM Agents | [paper] | [code]
[2024/12/16] Seeker: Towards Exception Safety Code Generation with Intermediate Language Agents Framework | [paper] | [code]
[2024/12/09] The Fusion of Large Language Models and Formal Methods for Trustworthy AI Agents: A Roadmap | [paper] | [code]
[2024/11/08] Towards Low-Resource Harmful Meme Detection with LMM Agents | [paper] | [code]
[2024/11/06] MRJ-Agent: An Effective Jailbreak Agent for Multi-Round Dialogue | [paper] | [code]
[2024/11/04] Attacking Vision-Language Computer Agents via Pop-ups | [paper] | [code]
[2024/10/22] AdvWeb: Controllable Black-box Attacks on VLM-powered Web Agents | [paper] | [code]
[2024/10/18] Coherence-Driven Multimodal Safety Dialogue with Active Learning for Embodied Agents | [paper] | [code]
[2024/10/11] AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents | [paper] | [code]
[2024/10/09] I Want to Break Free! Persuasion and Anti-Social Behavior of LLMs in Multi-Agent Settings with Social Hierarchy | [paper] | [code]
[2024/09/28] SELP: Generating Safe and Efficient Task Plans for Robot Agents with Large Language Models | [paper] | [code]
[2024/09/17] EIA: Environmental Injection Attack on Generalist Web Agents for Privacy Leakage | [paper] | [code]
[2024/09/13] AI-LieDar: Examine the Trade-off Between Utility and Truthfulness in LLM Agents | [paper] | [code]
[2024/08/20] Athena: Safe Autonomous Agents with Verbal Contrastive Learning | [paper] | [code]
[2024/08/05] Caution for the Environment: Multimodal Agents are Susceptible to Environmental Distractions | [paper] | [code]
[2024/07/23] RedAgent: Red Teaming Large Language Models with Context-aware Autonomous Language Agent | [paper] | [code]
[2024/06/05] BadAgent: Inserting and Activating Backdoor Attacks in LLM Agents | [paper] | [code]
[2024/05/30] Safe Multi-agent Reinforcement Learning with Natural Language Constraints | [paper] | [code]
[2024/05/24] Hacc-Man: An Arcade Game for Jailbreaking LLMs | [paper] | [code]
[2024/03/02] AutoDefense: Multi-Agent LLM Defense against Jailbreak Attacks | [paper] | [code]
[2024/02/17] Watch Out for Your Agents! Investigating Backdoor Threats to LLM-Based Agents | [paper] | [code]
[2024/02/16] ToolSword: Unveiling Safety Issues of Large Language Models in Tool Learning Across Three Stages | [paper] | [code]
[2024/02/02] TrustAgent: Towards Safe and Trustworthy LLM-based Agents | [paper] | [code]
[2024/01/11] Combating Adversarial Attacks with Multi-Agent Debate | [paper] | [code]
[2023/11/17] Testing Language Model Agents Safely in the Wild | [paper] | [code]

Bias

[2025/01/29] Actions Speak Louder than Words: Agent Decisions Reveal Implicit Biases in Language Models | [paper] | [code]
[2025/01/24] Unmasking Conversational Bias in AI Multiagent Systems | [paper] | [code]
[2024/12/20] Mitigating Social Bias in Large Language Models: A Multi-Objective Approach within a Multi-Agent Framework | [paper] | [code]
[2024/11/12] Mitigating Bias in Queer Representation within Large Language Models: A Collaborative Agent Approach | [paper] | [code]
[2024/10/06] MindScope: Exploring cognitive biases in large language models through Multi-Agent Systems | [paper] | [code]
[2024/10/03] Towards Implicit Bias Detection and Mitigation in Multi-Agent LLM Interactions | [paper] | [code]
[2024/05/23] ALI-Agent: Assessing LLMs' Alignment with Human Values via Agent-based Evaluation | [paper] | [code]
[2024/04/23] Aligning LLM Agents by Learning Latent Preference from User Edits | [paper] | [code]
[2024/02/19] Polarization of Autonomous Generative AI Agents Under Echo Chambers | [paper] | [code]
[2024/02/14] Towards better Human-Agent Alignment: Assessing Task Utility in LLM-Powered Applications | [paper] | [code]
[2024/01/09] Agent Alignment in Evolving Social Norms | [paper] | [code]

Hallucination

[2025/02/26] Winning Big with Small Models: Knowledge Distillation vs. Self-Training for Reducing Hallucination in QA Agents | [paper] | [code]
[2025/02/14] Automated Hypothesis Validation with Agentic Sequential Falsifications | [paper] | [code]
[2025/02/04] Position: Stop Acting Like Language Model Agents Are Normal Agents | [paper] | [code]
[2025/02/03] SelfCheckAgent: Zero-Resource Hallucination Detection in Generative Large Language Models | [paper] | [code]
[2025/01/19] Hallucination Mitigation using Agentic AI Natural Language-Based Frameworks | [paper] | [code]
[2024/11/25] Enhancing Multi-Agent Consensus through Third-Party LLM Integration: Analyzing Uncertainty and Mitigating Hallucinations in Large Language Models | [paper] | [code]
[2024/11/12] SHARP: Unlocking Interactive Hallucination via Stance Transfer in Role-Playing Agents | [paper] | [code]
[2024/07/08] DebUnc: Mitigating Hallucinations in Large Language Model Agent Communication with Uncertainty Estimations | [paper] | [code]
[2024/06/29] BioKGBench: A Knowledge Graph Checking Benchmark of AI Agent for Biomedical Science | [paper] | [code]
[2024/06/17] Small Agent Can Also Rock! Empowering Small Language Models as Hallucination Detector | [paper] | [code]
[2024/06/05] Towards Detecting LLMs Hallucination via Markov Chain-based Multi-agent Debate Framework | [paper] | [code]
[2024/05/28] TimeChara: Evaluating Point-in-Time Character Hallucination of Role-Playing Large Language Models | [paper] | [code]
[2024/02/13] Agent Smith: A Single Image Can Jailbreak One Million Multimodal LLM Agents Exponentially Fast | [paper] | [code]

Infrastructure

Benchmark&Evaluation

[2025/02/26] TheoremExplainAgent: Towards Multimodal Explanations for LLM Theorem Understanding | [paper] | [code]
[2025/02/25] RefuteBench 2.0 -- Agentic Benchmark for Dynamic Evaluation of LLM Responses to Refutation Instruction | [paper] | [code]
[2025/02/20] MLGym: A New Framework and Benchmark for Advancing AI Research Agents | [paper] | [code]
[2025/02/19] DataSciBench: An LLM Agent Benchmark for Data Science | [paper] | [code]
[2025/02/13] EmbodiedBench: Comprehensive Benchmarking Multi-modal Large Language Models for Vision-Driven Embodied Agents | [paper] | [code]
[2025/02/07] Evaluating Personality Traits in Large Language Models: Insights from Psychological Questionnaires | [paper] | [code]
[2025/02/06] Robotouille: An Asynchronous Planning Benchmark for LLM Agents | [paper] | [code]
[2025/02/01] Who's the MVP? A Game-Theoretic Evaluation Benchmark for Modular Attribution in LLM Agents | [paper] | [code]
[2025/01/21] EmbodiedEval: Evaluate Multimodal LLMs as Embodied Agents | [paper] | [code]
[2024/12/23] LegalAgentBench: Evaluating LLM Agents in Legal Domain | [paper] | [code]
[2024/12/19] Agent-SafetyBench: Evaluating the Safety of LLM Agents | [paper] | [code]
[2024/12/18] TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks | [paper] | [code]
[2024/12/18] ChinaTravel: A Real-World Benchmark for Language Agents in Chinese Travel Planning | [paper] | [code]
[2024/12/06] TeamCraft: A Benchmark for Multi-Modal Multi-Agent Systems in Minecraft | [paper] | [code]
[2024/12/02] Medchain: Bridging the Gap Between LLM Agents and Clinical Practice through Interactive Sequential Benchmarking | [paper] | [code]
[2024/11/05] Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset and Self-adaptive Planning Agent | [paper] | [code]
[2024/10/28] Can Machines Think Like Humans? A Behavioral Evaluation of LLM-Agents in Dictator Games | [paper] | [code]
[2024/10/25] AgentSense: Benchmarking Social Intelligence of Language Agents through Interactive Scenarios | [paper] | [code]
[2024/10/25] AGENT-CQ: Automatic Generation and Evaluation of Clarifying Questions for Conversational Search with LLMs | [paper] | [code]
[2024/10/23] MobileSafetyBench: Evaluating Safety of Autonomous Agents in Mobile Device Control | [paper] | [code]
[2024/10/16] Proactive Agent: Shifting LLM Agents from Reactive Responses to Active Assistance | [paper] | [code]
[2024/10/15] Revisiting Benchmark and Assessment: An Agent-based Exploratory Dynamic Evaluation Framework for LLMs | [paper] | [code]
[2024/10/11] JAILJUDGE: A Comprehensive Jailbreak Judge Benchmark with Multi-Agent Enhanced Explanation Evaluation Framework | [paper] | [code]
[2024/10/11] AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents | [paper] | [code]
[2024/10/10] Benchmarking Agentic Workflow Generation | [paper] | [code]
[2024/10/09] MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering | [paper] | [code]
[2024/10/09] Embodied Agent Interface: Benchmarking LLMs for Embodied Decision Making | [paper] | [code]
[2024/10/09] DA-Code: Agent Data Science Code Generation Benchmark for Large Language Models | [paper] | [code]
[2024/10/07] Adversarial Multi-Agent Evaluation of Large Language Models through Iterative Debates | [paper] | [code]
[2024/10/07] ScienceAgentBench: Toward Rigorous Assessment of Language Agents for Data-Driven Scientific Discovery | [paper] | [code]
[2024/09/23] Towards a Realistic Long-Term Benchmark for Open-Web Research Agents | [paper] | [code]
[2024/09/17] CORE-Bench: Fostering the Credibility of Published Research Through a Computational Reproducibility Agent Benchmark | [paper] | [code]
[2024/09/12] DSBench: How Far Are Data Science Agents to Becoming Data Science Experts? | [paper] | [code]
[2024/09/11] SUPER: Evaluating Agents on Setting Up and Executing Tasks from Research Repositories | [paper] | [code]
[2024/09/02] ComfyBench: Benchmarking LLM-based Agents in ComfyUI for Autonomously Designing Collaborative AI Systems | [paper] | [code]
[2024/08/28] BattleAgentBench: A Benchmark for Evaluating Cooperation and Competition Capabilities of Language Models in Multi-Agent Systems | [paper] | [code]
[2024/08/19] BLADE: Benchmarking Language Model Agents for Data-Driven Science | [paper] | [code]
[2024/08/13] What should I wear to a party in a Greek taverna? Evaluation for Conversational Agents in the Fashion Domain | [paper] | [code]
[2024/08/12] VisualAgentBench: Towards Large Multimodal Models as Visual Foundation Agents | [paper] | [code]
[2024/07/26] OfficeBench: Benchmarking Language Agents across Multiple Applications for Office Automation | [paper] | [code]
[2024/07/26] AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agents | [paper] | [code]
[2024/07/25] PersonaGym: Evaluating Persona Agents and LLMs | [paper] | [code]
[2024/07/23] AMONGAGENTS: Evaluating Large Language Models in the Interactive Text-Based Social Deduction Game | [paper] | [code]
[2024/07/22] AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks? | [paper] | [code]
[2024/07/12] IDAT: A Multi-Modal Dataset and Toolkit for Building and Evaluating Interactive Task-Solving Agents | [paper] | [code]
[2024/07/11] GTA: A Benchmark for General Tool Agents | [paper] | [code]
[2024/07/05] Towards Automated Functional Equation Proving: A Benchmark Dataset and A Domain-Specific In-Context Agent | [paper] | [code]
[2024/07/01] MIRAI: Evaluating LLM Agents for Event Forecasting | [paper] | [code]
[2024/07/01] ProductAgent: Benchmarking Conversational Product Search Agent with Asking Clarification Questions | [paper] | [code]
[2024/07/01] Mobile-Bench: An Evaluation Benchmark for LLM-based Mobile Agents | [paper] | [code]
[2024/06/28] Designing and Evaluating Multi-Chatbot Interface for Human-AI Communication: Preliminary Findings from a Persuasion Task | [paper] | [code]
[2024/06/13] ResearchArena: Benchmarking Large Language Models' Ability to Collect and Organize Information as Research Agents | [paper] | [code]
[2024/06/13] StreamBench: Towards Benchmarking Continuous Improvement of Language Agents | [paper] | [code]
[2024/06/07] WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild | [paper] | [code]
[2024/06/07] GameBench: Evaluating Strategic Reasoning Abilities of LLM Agents | [paper] | [code]
[2024/05/28] TimeChara: Evaluating Point-in-Time Character Hallucination of Role-Playing Large Language Models | [paper] | [code]
[2024/05/23] AndroidWorld: A Dynamic Benchmarking Environment for Autonomous Agents | [paper] | [code]
[2024/05/13] AgentClinic: a multimodal agent benchmark to evaluate AI in simulated clinical environments | [paper] | [code]
[2024/05/01] WorkBench: a Benchmark Dataset for Agents in a Realistic Workplace Setting | [paper] | [code]
[2024/04/23] Evaluating Tool-Augmented Agents in Remote Sensing Platforms | [paper] | [code]
[2024/04/22] How Well Can LLMs Echo Us? Evaluating AI Chatbots' Role-Play Ability with ECHO | [paper] | [code]
[2024/04/15] MMInA: Benchmarking Multihop Multimodal Internet Agents | [paper] | [code]
[2024/04/11] OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments | [paper] | [code]
[2024/04/09] AgentQuest: A Modular Benchmark Framework to Measure Progress and Improve LLM Agents | [paper] | [code]
[2024/04/05] GroundCocoa: A Benchmark for Evaluating Compositional & Conditional Reasoning in Language Models | [paper] | [code]
[2024/03/29] DataAgent: Evaluating Large Language Models' Ability to Answer Zero-Shot, Natural Language Queries | [paper] | [code]
[2024/03/26] Sharing the Cost of Success: A Game for Evaluating and Learning Collaborative Multi-Agent Instruction Giving and Following Policies | [paper] | [code]
[2024/03/20] SocialBench: Sociality Evaluation of Role-Playing Conversational Agents | [paper] | [code]
[2024/03/18] How Far Are We on the Decision-Making of LLMs? Evaluating LLMs' Gaming Ability in Multi-Agent Environments | [paper] | [code]
[2024/03/18] Tur[k]ingBench: A Challenge Benchmark for Web Agents | [paper] | [code]
[2024/03/13] Evaluating Large Language Models as Generative User Simulators for Conversational Recommendation | [paper] | [code]
[2024/03/05] InjecAgent: Benchmarking Indirect Prompt Injections in Tool-Integrated Large Language Model Agents | [paper] | [code]
[2024/02/27] Evaluating Very Long-Term Conversational Memory of LLM Agents | [paper] | [code]
[2024/02/27] Benchmarking Data Science Agents | [paper] | [code]
[2024/02/19] A Critical Evaluation of AI Feedback for Aligning Large Language Models | [paper] | [code]
[2024/02/18] Benchmark Self-Evolving: A Multi-Agent Framework for Dynamic LLM Evaluation | [paper] | [code]
[2024/02/18] MatPlotAgent: Method and Evaluation for LLM-Based Agentic Scientific Data Visualization | [paper] | [code]
[2024/02/05] LLM Agents in Interaction: Measuring Personality Consistency and Linguistic Alignment in Interacting Populations of Large Language Models | [paper] | [code]
[2024/02/02] TravelPlanner: A Benchmark for Real-World Planning with Language Agents | [paper] | [code]
[2024/01/02] CharacterEval: A Chinese Benchmark for Role-Playing Conversational Agent Evaluation | [paper] | [code]
[2023/12/28] How Far Are LLMs from Believable AI? A Benchmark for Evaluating the Believability of Human Behavior Simulation | [paper] | [code]
[2023/12/26] RoleEval: A Bilingual Role Evaluation Benchmark for Large Language Models | [paper] | [code]
[2023/11/16] ML-Bench: Evaluating Large Language Models and Agents for Machine Learning Tasks on Repository-Level Code | [paper] | [code]
[2023/11/15] ToolTalk: Evaluating Tool-Usage in a Conversational Setting | [paper] | [code]
[2023/10/24] FANToM: A Benchmark for Stress-testing Machine Theory of Mind in Interactions | [paper] | [code]
[2023/10/09] Put Your Money Where Your Mouth Is: Evaluating Strategic Planning and Execution of LLM Agents in an Auction Arena | [paper] | [code]
[2023/10/02] SmartPlay: A Benchmark for LLMs as Intelligent Agents | [paper] | [code]
[2023/10/01] RoleLLM: Benchmarking, Eliciting, and Enhancing Role-Playing Abilities of Large Language Models | [paper] | [code]
[2023/08/11] BOLAA: Benchmarking and Orchestrating LLM-augmented Autonomous Agents | [paper] | [code]
[2023/08/07] AgentBench: Evaluating LLMs as Agents | [paper] | [code]
[2023/04/27] ChatLog: Carefully Evaluating the Evolution of ChatGPT Across Time | [paper] | [code]

Environment&Platform

[2025/02/14] The Ann Arbor Architecture for Agent-Oriented Programming | [paper] | [code]
[2024/12/30] Training Software Engineering Agents and Verifiers with SWE-Gym | [paper] | [code]
[2024/11/05] SAUCE: Synchronous and Asynchronous User-Customizable Environment for Multi-Agent LLM Interaction | [paper] | [code]
[2024/08/09] AutoGen Studio: A No-Code Developer Tool for Building and Debugging Multi-Agent Systems | [paper] | [code]
[2024/08/06] OpenOmni: A Collaborative Open Source Tool for Building Future-Ready Multimodal Conversational Agents | [paper] | [code]
[2024/07/23] OpenHands: An Open Platform for AI Software Developers as Generalist Agents | [paper] | [code]
[2024/07/14] AutoGRAMS: Autonomous Graphical Agent Modeling Software | [paper] | [code]
[2024/07/12] IDAT: A Multi-Modal Dataset and Toolkit for Building and Evaluating Interactive Task-Solving Agents | [paper] | [code]
[2024/07/08] Coding Reliable LLM-based Integrated Task and Knowledge Agents with GenieWorksheets | [paper] | [code]
[2024/06/06] AgentGym: Evolving Large Language Model-based Agents across Diverse Environments | [paper] | [code]
[2024/05/23] AndroidWorld: A Dynamic Benchmarking Environment for Autonomous Agents | [paper] | [code]
[2024/02/27] OmniACT: A Dataset and Benchmark for Enabling Multimodal Generalist Autonomous Agents for Desktop and Web | [paper] | [code]
[2023/03/14] CB2: Collaborative Natural Language Interaction Research Platform | [paper] | [code]

Dataset

[2025/02/09] MTPChat: A Multimodal Time-Aware Persona Dataset for Conversational Agents | [paper] | [code]
[2025/02/09] HamRaz: A Culture-Based Persian Conversation Dataset for Person-Centered Therapy Using LLM Agents | [paper] | [code]
[2025/01/23] Hypothesis Generation for Materials Discovery and Design Using Goal-Driven and Constraint-Guided LLM Agents | [paper] | [code]
[2025/01/14] Agent-Centric Projection of Prompting Techniques and Implications for Synthetic Training Data for Large Language Models | [paper] | [code]
[2024/12/30] Plancraft: an evaluation dataset for planning with LLM agents | [paper] | [code]
[2024/12/28] BaiJia: A Large-Scale Role-Playing Agent Corpus of Chinese Historical Characters | [paper] | [code]
[2024/12/24] Explainable Multi-Modal Data Exploration in Natural Language via LLM Agent | [paper] | [code]
[2024/12/06] CALICO: Conversational Agent Localization via Synthetic Data Generation | [paper] | [code]
[2024/11/28] MAG-V: A Multi-Agent Framework for Synthetic Data Generation and Verification | [paper] | [code]
[2024/11/21] Star-Agents: Automatic Data Optimization with LLM Agents for Instruction Tuning | [paper] | [code]
[2024/10/18] Synthesizing Post-Training Data for LLMs through Multi-Agent Simulation | [paper] | [code]
[2024/10/10] AgentBank: Towards Generalized LLM Agents via Fine-Tuning on 50000+ Interaction Trajectories | [paper] | [code]
[2024/09/06] Using Large Language Models to Generate Authentic Multi-agent Knowledge Work Datasets | [paper] | [code]
[2024/08/22] MDD-5k: A New Diagnostic Conversation Dataset for Mental Disorders Synthesized via Neuro-Symbolic LLM Agents | [paper] | [code]
[2024/08/16] The Fellowship of the LLMs: Multi-Agent Workflows for Synthetic Preference Optimization Dataset Generation | [paper] | [code]
[2024/07/12] IDAT: A Multi-Modal Dataset and Toolkit for Building and Evaluating Interactive Task-Solving Agents | [paper] | [code]
[2024/06/16] GUI-WORLD: A Dataset for GUI-oriented Multimodal LLM-based Agents | [paper] | [code]
[2024/03/19] Agent-FLAN: Designing Data and Methods of Effective Agent Tuning for Large Language Models | [paper] | [code]
[2024/02/27] OmniACT: A Dataset and Benchmark for Enabling Multimodal Generalist Autonomous Agents for Desktop and Web | [paper] | [code]
[2023/07/31] HAGRID: A Human-LLM Collaborative Dataset for Generative Information-Seeking with Attribution | [paper] | [code]

Others

[2025/02/20] Optimizing Model Selection for Compound AI Systems | [paper] | [code]
[2024/12/03] Large Multimodal Agents for Accurate Phishing Detection with Enhanced Token Optimization and Cost Reduction | [paper] | [code]
[2024/03/18] EnvGen: Generating and Adapting Environments via LLMs for Training Embodied Agents | [paper] | [code]

⭐ Star History

For Tasks:

Click tags to check more tools for each tasks

explore ai agent architectures conduct research on llm-based agents implement agent frameworks fine-tune language models for specific tasks evaluate agent performance

For Jobs:

ai researcher data scientist machine learning engineer natural language processing engineer research scientist

Alternative AI tools for LLM-Agents-Papers

Similar Open Source Tools

LLM-Agents-Papers

github

: 1.3k

awesome-LLM-game-agent-papers

This repository provides a comprehensive survey of research papers on large language model (LLM)-based game agents. LLMs are powerful AI models that can understand and generate human language, and they have shown great promise for developing intelligent game agents. This survey covers a wide range of topics, including adventure games, crafting and exploration games, simulation games, competition games, cooperation games, communication games, and action games. For each topic, the survey provides an overview of the state-of-the-art research, as well as a discussion of the challenges and opportunities for future work.

github

: 469

do-research-in-AI

This repository is a collection of research lectures and experience sharing posts from frontline researchers in the field of AI. It aims to help individuals upgrade their research skills and knowledge through insightful talks and experiences shared by experts. The content covers various topics such as evaluating research papers, choosing research directions, research methodologies, and tips for writing high-quality scientific papers. The repository also includes discussions on academic career paths, research ethics, and the emotional aspects of research work. Overall, it serves as a valuable resource for individuals interested in advancing their research capabilities in the field of AI.

github

: 61

LLM-in-Vision

Recent LLM (Large Language Models)-based CV and multi-modal works.

github

: 743

Paper-Reading-ConvAI

Paper-Reading-ConvAI is a repository that contains a list of papers, datasets, and resources related to Conversational AI, mainly encompassing dialogue systems and natural language generation. This repository is constantly updating.

github

: 1.0k

awesome-and-novel-works-in-slam

This repository contains a curated list of cutting-edge works in Simultaneous Localization and Mapping (SLAM). It includes research papers, projects, and tools related to various aspects of SLAM, such as 3D reconstruction, semantic mapping, novel algorithms, large-scale mapping, and more. The repository aims to showcase the latest advancements in SLAM technology and provide resources for researchers and practitioners in the field.

github

: 59

Semi-Auto-NovelAI-to-Pixiv

Semi-Auto-NovelAI-to-Pixiv is a powerful tool that enables batch image generation with NovelAI, along with various other useful features in a super user-friendly interface. It allows users to create images, generate random images, upload images to Pixiv, apply filters, enhance images, add watermarks, and more. The tool also supports video-to-image conversion and various image manipulation tasks. It offers a seamless experience for users looking to automate image processing tasks.

github

: 242

Awesome-Code-LLM

Analyze the following text from a github repository (name and readme text at end) . Then, generate a JSON object with the following keys and provide the corresponding information for each key, in lowercase letters: 'description' (detailed description of the repo, must be less than 400 words，Ensure that no line breaks and quotation marks.),'for_jobs' (List 5 jobs suitable for this tool,in lowercase letters), 'ai_keywords' (keywords of the tool,user may use those keyword to find the tool,in lowercase letters), 'for_tasks' (list of 5 specific tasks user can use this tool to do,in lowercase letters), 'answer' (in english languages)

github

: 2.3k

Awesome_papers_on_LLMs_detection

This repository is a curated list of papers focused on the detection of Large Language Models (LLMs)-generated content. It includes the latest research papers covering detection methods, datasets, attacks, and more. The repository is regularly updated to include the most recent papers in the field.

github

: 147

prompt-in-context-learning

An Open-Source Engineering Guide for Prompt-in-context-learning from EgoAlpha Lab. 📝 Papers | ⚡️ Playground | 🛠 Prompt Engineering | 🌍 ChatGPT Prompt | ⛳ LLMs Usage Guide > **⭐️ Shining ⭐️:** This is fresh, daily-updated resources for in-context learning and prompt engineering. As Artificial General Intelligence (AGI) is approaching, let’s take action and become a super learner so as to position ourselves at the forefront of this exciting era and strive for personal and professional greatness. The resources include: _🎉Papers🎉_: The latest papers about _In-Context Learning_ , _Prompt Engineering_ , _Agent_ , and _Foundation Models_. _🎉Playground🎉_: Large language models（LLMs）that enable prompt experimentation. _🎉Prompt Engineering🎉_: Prompt techniques for leveraging large language models. _🎉ChatGPT Prompt🎉_: Prompt examples that can be applied in our work and daily lives. _🎉LLMs Usage Guide🎉_: The method for quickly getting started with large language models by using LangChain. In the future, there will likely be two types of people on Earth (perhaps even on Mars, but that's a question for Musk): - Those who enhance their abilities through the use of AIGC; - Those whose jobs are replaced by AI automation. 💎EgoAlpha: Hello! human👤, are you ready?

github

: 1.5k

Awesome-LLM-Interpretability

Awesome-LLM-Interpretability is a curated list of materials related to LLM (Large Language Models) interpretability, covering tutorials, code libraries, surveys, videos, papers, and blogs. It includes resources on transformer mechanistic interpretability, visualization, interventions, probing, fine-tuning, feature representation, learning dynamics, knowledge editing, hallucination detection, and redundancy analysis. The repository aims to provide a comprehensive overview of tools, techniques, and methods for understanding and interpreting the inner workings of large language models.

github

: 130

ChatGPT-On-CS

ChatGPT-On-CS is an intelligent chatbot tool based on large models, supporting various platforms like WeChat, Taobao, Bilibili, Douyin, Weibo, and more. It can handle text, voice, and image inputs, access external resources through plugins, and customize enterprise AI applications based on proprietary knowledge bases. Users can set custom replies, utilize ChatGPT interface for intelligent responses, send images and binary files, and create personalized chatbots using knowledge base files. The tool also features platform-specific plugin systems for accessing external resources and supports enterprise AI applications customization.

github

: 2.2k

LLM-IR-Bias-Fairness-Survey

github

: 52

ChatGPT-On-CS

This project is an intelligent dialogue customer service tool based on a large model, which supports access to platforms such as WeChat, Qianniu, Bilibili, Douyin Enterprise, Douyin, Doudian, Weibo chat, Xiaohongshu professional account operation, Xiaohongshu, Zhihu, etc. You can choose GPT3.5/GPT4.0/ Lazy Treasure Box (more platforms will be supported in the future), which can process text, voice and pictures, and access external resources such as operating systems and the Internet through plug-ins, and support enterprise AI applications customized based on their own knowledge base.

github

: 768

LLM-for-misinformation-research

LLM-for-misinformation-research is a curated paper list of misinformation research using large language models (LLMs). The repository covers methods for detection and verification, tools for fact-checking complex claims, decision-making and explanation, claim matching, post-hoc explanation generation, and other tasks related to combating misinformation. It includes papers on fake news detection, rumor detection, fact verification, and more, showcasing the application of LLMs in various aspects of misinformation research.

github

: 78

LLM-Dojo

LLM-Dojo is an open-source platform for learning and practicing large models, providing a framework for building custom large model training processes, implementing various tricks and principles in the llm_tricks module, and mainstream model chat templates. The project includes an open-source large model training framework, detailed explanations and usage of the latest LLM tricks, and a collection of mainstream model chat templates. The term 'Dojo' symbolizes a place dedicated to learning and practice, borrowing its meaning from martial arts training.

github

: 612

For similar tasks

LLM-Agents-Papers

github

: 1.3k

OSWorld

OSWorld is a benchmarking tool designed to evaluate multimodal agents for open-ended tasks in real computer environments. It provides a platform for running experiments, setting up virtual machines, and interacting with the environment using Python scripts. Users can install the tool on their desktop or server, manage dependencies with Conda, and run benchmark tasks. The tool supports actions like executing commands, checking for specific results, and evaluating agent performance. OSWorld aims to facilitate research in AI by providing a standardized environment for testing and comparing different agent baselines.

github

: 1.7k

council

Council is an open-source platform designed for the rapid development and deployment of customized generative AI applications using teams of agents. It extends the LLM tool ecosystem by providing advanced control flow and scalable oversight for AI agents. Users can create sophisticated agents with predictable behavior by leveraging Council's powerful approach to control flow using Controllers, Filters, Evaluators, and Budgets. The framework allows for automated routing between agents, comparing, evaluating, and selecting the best results for a task. Council aims to facilitate packaging and deploying agents at scale on multiple platforms while enabling enterprise-grade monitoring and quality control.

github

: 815

ComfyBench

ComfyBench is a comprehensive benchmark tool designed to evaluate agents' ability to design collaborative AI systems in ComfyUI. It provides tasks for agents to learn from documents and create workflows, which are then converted into code for better understanding by LLMs. The tool measures performance based on pass rate and resolve rate, reflecting the correctness of workflow execution and task realization. ComfyAgent, a component of ComfyBench, autonomously designs new workflows by learning from existing ones, interpreting them as collaborative AI systems to complete given tasks.

github

: 135

MARBLE

MARBLE (Multi-Agent Coordination Backbone with LLM Engine) is a modular framework for developing, testing, and evaluating multi-agent systems leveraging Large Language Models. It provides a structured environment for agents to interact in simulated environments, utilizing cognitive abilities and communication mechanisms for collaborative or competitive tasks. The framework features modular design, multi-agent support, LLM integration, shared memory, flexible environments, metrics and evaluation, industrial coding standards, and Docker support.

github

: 61

For similar jobs

weave

Weave is a toolkit for developing Generative AI applications, built by Weights & Biases. With Weave, you can log and debug language model inputs, outputs, and traces; build rigorous, apples-to-apples evaluations for language model use cases; and organize all the information generated across the LLM workflow, from experimentation to evaluations to production. Weave aims to bring rigor, best-practices, and composability to the inherently experimental process of developing Generative AI software, without introducing cognitive overhead.

github

: 855

LLMStack

LLMStack is a no-code platform for building generative AI agents, workflows, and chatbots. It allows users to connect their own data, internal tools, and GPT-powered models without any coding experience. LLMStack can be deployed to the cloud or on-premise and can be accessed via HTTP API or triggered from Slack or Discord.

github

: 1.5k

VisionCraft

The VisionCraft API is a free API for using over 100 different AI models. From images to sound.

github

: 94

kaito

Kaito is an operator that automates the AI/ML inference model deployment in a Kubernetes cluster. It manages large model files using container images, avoids tuning deployment parameters to fit GPU hardware by providing preset configurations, auto-provisions GPU nodes based on model requirements, and hosts large model images in the public Microsoft Container Registry (MCR) if the license allows. Using Kaito, the workflow of onboarding large AI inference models in Kubernetes is largely simplified.

github

: 405

PyRIT

PyRIT is an open access automation framework designed to empower security professionals and ML engineers to red team foundation models and their applications. It automates AI Red Teaming tasks to allow operators to focus on more complicated and time-consuming tasks and can also identify security harms such as misuse (e.g., malware generation, jailbreaking), and privacy harms (e.g., identity theft). The goal is to allow researchers to have a baseline of how well their model and entire inference pipeline is doing against different harm categories and to be able to compare that baseline to future iterations of their model. This allows them to have empirical data on how well their model is doing today, and detect any degradation of performance based on future improvements.

github

: 2.3k

tabby

Tabby is a self-hosted AI coding assistant, offering an open-source and on-premises alternative to GitHub Copilot. It boasts several key features: * Self-contained, with no need for a DBMS or cloud service. * OpenAPI interface, easy to integrate with existing infrastructure (e.g Cloud IDE). * Supports consumer-grade GPUs.

github

: 30.6k

spear

SPEAR (Simulator for Photorealistic Embodied AI Research) is a powerful tool for training embodied agents. It features 300 unique virtual indoor environments with 2,566 unique rooms and 17,234 unique objects that can be manipulated individually. Each environment is designed by a professional artist and features detailed geometry, photorealistic materials, and a unique floor plan and object layout. SPEAR is implemented as Unreal Engine assets and provides an OpenAI Gym interface for interacting with the environments via Python.

github

: 224

Magick

Magick is a groundbreaking visual AIDE (Artificial Intelligence Development Environment) for no-code data pipelines and multimodal agents. Magick can connect to other services and comes with nodes and templates well-suited for intelligent agents, chatbots, complex reasoning systems and realistic characters.

github

: 675