ai-agent-papers

ai-agent-papers

A collection of AI Agents papers (Updated biweekly)

Stars: 565

Visit
 screenshot

The AI Agents Papers repository provides a curated collection of papers focusing on AI agents, covering topics such as agent capabilities, applications, architectures, and presentations. It includes a variety of papers on ideation, decision making, long-horizon tasks, learning, memory-based agents, self-evolving agents, and more. The repository serves as a valuable resource for researchers and practitioners interested in AI agent technologies and advancements.

README:

AI Agents Papers

Updated biweekly.

AI Agent

AI agents can think, act, and complete tasks by themselves.
But can they really replace our jobs?

AI Agent Workflows

Paper Categories

πŸ”₯: Recommended papers
πŸ“–: Survey papers
βš–οΈ: Benchmark papers

References

September Highlights (Updated 24 Sep)

Ideation & Decision Meking

  • "The Need for Verification in AI-Driven Scientific Discovery" [paper]
  • "What Would an LLM Do? Evaluating Policymaking Capabilities of Large Language Models" [paper]
  • "LLM-empowered Agents Simulation Framework for Scenario Generation in Service Ecosystem Governance Social World Models" [paper]
  • "Language Models Do Not Follow Occam’s Razor: A Benchmark for Inductive and Abductive Reasoning" [paper]
  • "LLM-empowered Agents Simulation Framework for Scenario Generation in Service Ecosystem Governance" [paper]
  • "VulAgent: A Hypothesis Validation-Based Multi-Agent System for Software Vulnerability Detection" [paper]
  • "Tackling One Health Risks: How Large Language Models are leveraged for Risk Negotiation and Consensus-building" [paper]
  • "Agents of Discovery" [paper]

Long-Horizon Task

  • "ReSum: Unlocking Long-Horizon Search Intelligence via Context Summarization" [paper]
  • "Empowering LLMs with Parameterized Skills for Adversarial Long-Horizon Planning" [paper]
  • "The Illusion of Diminishing Returns: Measuring Long Horizon Execution in LLMs" [paper]
  • "WebExplorer: Explore and Evolve for Training Long-Horizon Web Agents" [paper]

Long-Context Task

  • βš–οΈ "LoCoBench: A Benchmark for Long-Context Large Language Models in Complex Software Engineering" [[paper]]
  • "SWE-QA: Can Language Models Answer Repository-level Code Questions?" [paper]

Learning

  • "rStar2-Agent: Agentic Reasoning Technical Report" [paper]
  • "The Landscape of Agentic Reinforcement Learning for LLMs: A Survey" [paper]
  • "Scaling Agents via Continual Pre-training" [paper]
  • "ReSum: Unlocking Long-Horizon Search Intelligence via Context Summarization" [paper]
  • "ArcMemo: Abstract Reasoning Composition with Lifelong LLM Memory" [paper]

Survey

  • πŸ“– "LLM-based Agentic Reasoning Frameworks: A Survey from Methods to Scenarios" [paper]
  • πŸ“– "Reinforcement Learning Foundations for Deep Research Systems: A Survey" [paper]

August Highlights

Self-Evolving Agents

  • "Enabling Self-Improving Agents to Learn at Test Time With Human-In-The-Loop Guidance" [paper]
  • πŸ“– "A Comprehensive Survey of Self-Evolving AI Agents" [paper]
  • "HealthFlow: A Self-Evolving AI Agent with Meta Planning for Autonomous Healthcare Research" [paper]
  • "SEAgent: Self-Evolving Computer Use Agent with Autonomous Learning from Experience"
  • "SE-Agent: Self-Evolution Trajectory Optimization in Multi-Step Reasoning with LLM-Based Agents" [paper]
  • "HERAKLES: Hierarchical Skill Compilation for Open-ended LLM Agents" [paper]
  • βš–οΈ "Building Self-Evolving Agents via Experience-Driven Lifelong Learning: A Framework and Benchmark" [paper]

Memory based llm Agents

  • Seeing, Listening, Remembering, and Reasoning: A Multimodal Agent with Long-Term Memory [paper]
  • "Memp: Exploring Agent Procedural Memory" [paper]
  • "Nemori: Self-Organizing Agent Memory Inspired by Cognitive Science" [paper]
  • "Coarse-to-Fine Grounded Memory for LLM Agent Planning" [paper]
  • "Learn to Memorize: Optimizing LLM-based Agents with Adaptive Memory Framework" [paper]
  • "Memento: Fine-tuning LLM Agents without Fine-tuning LLMs" [paper]
  • "Memory-R1: Enhancing Large Language Model Agents to Manage and Utilize Memories via Reinforcement Learning" [paper]

Ideation Agents

  • "K-Dense Analyst: Towards Fully Automated Scientific Analysis" [paper]
  • πŸ“– "From AI for Science to Agentic Science: A Survey on Autonomous Scientific Discovery" [paper]
  • "The AI Data Scientist" [paper]
  • "Spacer: Towards Engineered Scientific Inspiration" [paper]
  • "BIODISCO: Multi-agent hypothesis generation with dual-mode evidence, iterative feedback and temporal evaluation" [paper]
  • "Expert-Guided LLM Reasoning for Battery Discovery: From AI-Driven Hypothesis to Synthesis and Characterization" [paper]
  • "MK2 at PBIG Competition: A Prompt Generation Solution" [paper]

July Highlights

Agent Blueprints

  • "LLM Agents Are the Antidote to Walled Gardens", University of Oxford. [paper]
  • "Exploring Advanced LLM Multi-Agent Systems Based on Blackboard Architecture", State Key Laboratory. [paper]
  • "Aime: Towards Fully-Autonomous Multi-Agent Framework", ByteDance. [paper]
  • πŸ“– "A Survey of Context Engineering for Large Language Models" [paper]
  • πŸ“– "A Survey on Autonomy-Induced Security Risks in Large Model-Based Agents" [paper]
  • "From Reasoning to Super-Intelligence: A Search-Theoretic Perspective", AA-I. [paper]
  • "Making REST APIs Agent-Ready: From OpenAPI to Model Context Protocol Servers for Tool-Augmented LLMs", University of Michigan. [paper]

Agent Applications

  • "Large Language Model Powered Intelligent Urban Agents: Concepts, Capabilities, and Applications", Shandong University. [paper]
  • "Emotionally Intelligent Task-oriented Dialogue Systems: Architecture, Representation, and Optimisation", Heinrich Heine University. [paper]
  • "Agent Ideate: A Framework for Product Idea Generation from Patents Using Agentic AI", TCS Research. [paper]
  • "Agent Exchange: Shaping the Future of AI Agent Economics", Shanghai Jiao Tong University. [paper]
  • "Evaluating LLM Agent Collusion in Double Auctions", Relativity, Stanford University, Arb Research. [paper]
  • "Enhancing COBOL Code Explanations: A Multi-Agents Approach Using Large Language Models", Queen’s University, IBM USA. [paper]
  • "CREW-WILDFIRE: Benchmarking Agentic Multi-Agent Collaborations at Scale", Duke University, Army Research Laboratory. [paper]
  • "Deep Researcher with Test-Time Diffusion", Google. [paper]

Enterprise Agents

  • "AI Agents-as-Judge: Automated Assessment of Accuracy, Consistency, Completeness and Clarity for Enterprise Documents", Accenture. [paper]
  • "Agentic Retrieval of Topics and Insights from Earnings Calls", Bloomberg. [paper]
  • βš–οΈ "Ready Jurist One: Benchmarking Language Agents for Legal Intelligence in Dynamic Environments", Fudan University. [paper]
  • "Routine: A Structural Planning Framework for LLM Agent System in Enterprise", Digital China AI Research. [paper]
  • "Enabling Self-Improving Agents to Learn at Test Time With Human-In-The-Loop Guidance", ByteDance. [paper]
  • "Compliance Brain Assistant: Conversational Agentic AI for Assisting Compliance Tasks in Enterprise Environments", Meta. [paper]

Data Agents

  • "Data Agent: A Holistic Architecture for Orchestrating Data+AI Ecosystems", Tsinghua University. [paper]
  • βš–οΈ "DABstep: Data Agent Benchmark for Multi-step Reasoning", Adyen, Hugging Face. [paper]
  • πŸ“– "Toward Real-World Table Agents: Capabilities, Workflows, and Design Principles for LLM-based Table Intelligence", Zhejiang University. [paper]

Research Agents

  • πŸ“– "The Evolving Role of Large Language Models in Scientific Innovation: Evaluator, Collaborator, and Scientist", University of North Texas. [paper]
  • "AI Research Agents for Machine Learning: Search, Exploration, and Generalization in MLE-bench", Meta. [paper]
  • "Open-ended Scientific Discovery via Bayesian Surprise", Allen Institute for AI. [paper]
  • "Large Language Models as Innovators: A Framework to Leverage Latent Space Exploration for Novelty Discovery", WrocΕ‚aw University. [paper]

Role Playing Agents

  • "Too Human to Model: The Uncanny Valley of LLMs in Social Simulation", Atmospheric Environmental Research. [paper]
  • "Do Role-Playing Agents Practice What They Preach? Belief-Behavior Consistency in LLM-Based Simulations of Human Trust", CAMEL-AI.org. [paper]
  • "LLM Economist: Large Population Models and Mechanism Design in Multi-Agent Generative Simulacra",Princeton University, Salesforce Research. [paper]
  • πŸ“– "Large Language Models for Agent-Based Modelling: Current and possible uses across the modelling cycle" [paper]
  • "Foundation Model Self-Play: Open-Ended Strategy Innovation via Foundation Models", CIFAR AI Chair. [paper]

Memory

  • "MemOS: A Memory OS for AI System", MemTensor (Shanghai) Technology Co., Ltd. [paper]
  • βš–οΈ "Evaluating Memory in LLM Agents via Incremental Multi-Turn Interactions", UC San Diego. [paper]
  • "MIRIX: Multi-Agent Memory System for LLM-Based Agents", MIRIX AI. [paper]

June Highlights

Deep Research Agents

  • βš–οΈ "DeepResearch Bench: A Comprehensive Benchmark for Deep Research Agents" [paper]
  • πŸ“– "From Web Search towards Agentic Deep Research: Incentivizing Search with Reasoning Agents" [paper]
  • πŸ“– "Deep Research Agents: A Systematic Examination And Roadmap" [paper]
  • πŸ“– "Towards AI Search Paradigm" [paper]
  • "Mind2Web 2: Evaluating Agentic Search with Agent-as-a-Judge" [paper]
  • "MMSearch-R1: Incentivizing LMMs to Search" [paper]
  • "Towards Robust Fact-Checking: A Multi-Agent System with Advanced Evidence Retrieval" [paper]

Data Science Agents

  • [Jun 2025] "AUTOMIND: Adaptive Knowledgeable Agent for Automated Data Science" [paper]
  • πŸ“– [Jun 2025] "Measuring Data Science Automation: A Survey of Evaluation Tools for AI Assistants and Agents" [paper]
  • [Jun 2025] "SheetMind: An End-to-End LLM-Powered Multi-Agent Framework for Spreadsheet Automation" [paper]
  • [Jun 2025] "SWE-SQL: Illuminating LLM Pathways to Solve User SQL Issues in Real-World Applications" [paper]
  • [Jun 2025] "Towards Community-Driven Agents for Machine Learning Engineering" [paper]
  • [Jun 2025] "MLE-STAR: Machine Learning Engineering Agent via Search and Targeted Refinement" [paper]

Business Operation Agents

  • "Oversight Structures for Agentic AI in Public-Sector Organizations" [paper]
  • βš–οΈ "AssetOpsBench: Benchmarking AI Agents for Task Automation in Industrial Asset Operations and Maintenance" [paper]
  • πŸ“– "Application-Driven Value Alignment in Agentic AI Systems: Survey and Perspectives" [paper]
  • "Intelligent Design 4.0: Paradigm Evolution Toward the Agentic AI Era" [paper]
  • "Improved LLM Agents for Financial Document Question Answering" [paper]
  • βš–οΈ "ALE-Bench: A Benchmark for Long-Horizon Objective-Driven Algorithm Engineering" [paper]
  • "Decide less, communicate more: On the construct validity of end-to-end fact-checking in medicine" [paper]
  • "SV-LLM: An Agentic Approach for SoC Security Verification using Large Language Models" [paper]
  • βš–οΈ "SOP-Bench: Complex Industrial SOPs for Evaluating LLM Agents" [paper]
  • "Intelligent Design 4.0: Paradigm Evolution Toward the Agentic AI Era" [paper]
  • "Managing Complex Failure Analysis Workflows with LLM-based Reasoning and Acting Agents" [paper]
  • "AgenticControl: An Automated Control Design Framework Using Large Language Models" [paper]
  • πŸ“– "A Survey of AI for Materials Science: Foundation Models, LLM Agents, Datasets, and Tools" [paper]

May Highlights

Inference Time Computing

  • πŸ“– "A Survey of Slow Thinking-based Reasoning LLMs using Reinforced Learning and Inference-time Scaling Law" [paper]
  • πŸ“– "Perception, Reason, Think, and Plan: A Survey on Large Multimodal Reasoning Models" [paper]

Tool Integrated Reasoning

  • "Table-R1: Inference-Time Scaling for Table Reasoning" [paper]
  • "Agentic Reasoning and Tool Integration for LLMs via Reinforcement Learning" [paper]
  • "Nemotron-Research-Tool-N1: Exploring Tool-Using Language Models with Reinforced Reasoning" [paper]
  • "Agent RL Scaling Law: Spontaneous Code Execution for Mathematical Problem Solving" [paper]
  • "Reinforced Internal-External Knowledge Synergistic Reasoning for Efficient Adaptive Search Agent" [paper]
  • "An Empirical Study on Reinforcement Learning for Reasoning-Search Interleaved LLM Agents" [paper]
  • "Process vs. Outcome Reward: Which is Better for Agentic RAG Reinforcement Learning" [paper]
  • "MIRROR: Multi-agent Intra- and Inter-Reflection for Optimized Reasoning in Tool Learning" [paper]
  • "EvolveSearch: An Iterative Self-Evolving Search Agent" [paper]
  • "VisualToolAgent (VisTA): A Reinforcement Learning Framework for Visual Tool Selection" [paper]
  • "Tool-Star: Empowering LLM-Brained Multi-Tool Reasoner via Reinforcement Learning" [paper]

Self-Improvement & Self-Evolution

Metric & Reward

  • "RM-R1: Reward Modeling as Reasoning" [paper]
  • "Reward Reasoning Model" [paper]
  • "R3: Robust Rubric-Agnostic Reward Models" [paper]
  • "AutoLibra: Agent Metric Induction from Open-Ended Feedback" [paper]

Memory

  • "MemOS: An Operating System for Memory-Augmented Generation (MAG) in Large Language Models (Short Version)" [paper]
  • "MemEngine: A Unified and Modular Library for Developing Advanced Memory of LLM-based Agents" [paper]
  • "MARK: Memory Augmented Refinement of Knowledge" [paper]
  • πŸ“– "Rethinking Memory in AI: Taxonomy, Operations, Topics, and Future Directions" [paper]

Skills

  • "Skill Discovery for Software Scripting Automation via Offline Simulations with LLMs" [paper]
  • "Rethinking Agent Design: From Top-Down Workflows to Bottom-Up Skill Evolution" [paper]
  • "Alita: Generalist Agent Enabling Scalable Agentic Reasoning with Minimal Predefinition and Maximal Self-Evolution" [paper]

Reasoning Model

  • "Absolute Zero: Reinforced Self-play Reasoning with Zero Data" [paper]
  • "Self-Generated In-Context Examples Improve LLM Agents for Sequential Decision-Making Tasks" [paper]
  • "DEBATE, TRAIN, EVOLVE: Self-Evolution of Language Model Reasoning" [paper]
  • "Self Rewarding Self Improving" [paper]
  • "EvolveSearch: An Iterative Self-Evolving Search Agent" [paper]

(Multi) Agent Architecture

  • "AlphaEvolve: A coding agent for scientific and algorithmic discovery" [paper]
  • "Meta-Design Matters:A Self-Design Multi-Agent System" [paper]
  • "Darwin GΓΆdel Machine:Open-Ended Evolution of Self-Improving Agents" [paper]
  • "SEW: Self-Evolving Agentic Workflows for Automated Code Generation" [paper]
  • "Multi-Agent Collaboration via Evolving Orchestration" [paper]

Multi-Agent

  • πŸ“– "Creativity in LLM-based Multi-Agent Systems: A Survey" [paper]
  • βš–οΈ "Benchmarking LLMs’ Swarm intelligence" [paper]
  • "Which Agent Causes Task Failures and When? On Automated Failure Attribution of LLM Multi-Agent Systems" [paper]
  • "Humanizing LLMs: A Survey of Psychological Measurements with Tools, Datasets, and Human-Agent Applications" [paper]
  • "Towards Multi-Agent Reasoning Systems for Collaborative Expertise Delegation: An Exploratory Design Study" [paper]

Real-World Application of AI Agents

Researcher

  • "34 Examples of LLM Applications in Materials Science and Chemistry: Towards Automation, Assistants, Agents, and Accelerated Scientific Discovery" [paper]
  • "PiFlow: Principle-aware Scientific Discovery with Multi-Agent Collaboration" [paper]
  • "R&D-Agent: Automating Data-Driven AI Solution Building Through LLM-Powered Automated Research, Development, and Evolution" [paper]
  • πŸ“– "From Automation to Autonomy: A Survey on Large Language Models in Scientific Discovery" [paper]
  • "Towards Artificial Intelligence Research Assistant for Expert-Involved Learning" [paper]

Data Scientist

  • "MLE-Dojo: Interactive Environments for Empowering LLM Agents in Machine Learning Engineering" [paper]
  • "ML-Agent: Reinforcing LLM Agents for Autonomous Machine Learning Engineering" [paper]
  • "Data-to-Dashboard: Multi-Agent LLM Framework for Insightful Visualization in Enterprise Analytics" [paper]
  • "Agentic Feature Augmentation: Unifying Selection and Generation with Teaming, Planning, and Memories" [paper]
  • "JARVIS: A Multi-Agent Code Assistant for High-Quality EDA Script Generation" [paper]
  • "MLZero: A Multi-Agent System for End-to-end Machine Learning Automation" [paper]

Software Engineer

  • "Can Agents Fix Agent Issues?" [paper]
  • "Vibe Coding vs. Agentic Coding: Fundamentals and Practical Implications of Agentic AI" [paper]

Others

  • "The Real Barrier to LLM Agent Usability is Agentic ROI" [paper]
  • πŸ“– "A Survey on Large Language Model based Human-Agent Systems" [paper]
  • πŸ“– "Vision-Language-Action Models: Concepts, Progress, Applications and Challenges" [paper]
  • πŸ“– "Multi-agent Embodied AI: Advances and Future Directions" [paper]
  • "Efficient Agent Training for Computer Use" [paper]
  • βš–οΈ "AGENTIF: Benchmarking Instruction Following of Large Language Models in Agentic Scenarios" [paper]

April Highlights

Inference Time Computing

  • "Inference-Time Scaling for Generalist Reward Modeling" [paper]
  • "Inference-Time Scaling for Complex Tasks: Where We Stand and What Lies Ahead"[paper]
  • "Review, Refine, Repeat: Understanding Iterative Decoding of AI Agents with Dynamic Evaluation and Selection"[paper]
  • "Dual Engines of Thoughts: A Depth-Breadth Integration Framework for Open-Ended Analysis"[paper]
  • πŸ“– "A Survey of Frontiers in LLM Reasoning: Inference Scaling, Learning to Reason, and Agentic Systems"[paper]

Self-Experience-Driven Agents

  • "Welcome to the Era of Experience" [paper]
  • "SkillWeaver: Web Agents can Self-Improve by Discovering and Honing Skills"[paper]
  • "Exploring Expert Failures Improves LLM Agent Tuning" [paper]
  • "Inducing Programmatic Skills for Agentic Tasks" [paper]
  • "Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory" [paper]
  • "Local Prompt Optimization" [paper]
  • "Revisiting Prompt Optimization with Large Reasoning Modelsβ€”A Case Study on Event Extraction" [paper]
  • "Iterative Trajectory Exploration for Multimodal Agents" [papaer]

Meta Agents

  • "FlowReasoner: Reinforcing Query-Level Meta-Agents" [paper]
  • "A Self-Improving Coding Agent" [paper]
  • "Toward Evaluative Thinking: Meta Policy Optimization with Evolving Reward Models" [paper]

Reinforcement Learning Applications for AI Agents

  • "ToolRL: Reward is All Tool Learning Needs" [paper]
  • "OTC: Optimal Tool Calls via Reinforcement Learning" [paper]
  • "LLMs are Greedy Agents: Effects of RL Fine-tuning on Decision-Making Abilities" [paper]
  • πŸ“– "Meta-Thinking in LLMs via Multi-Agent Reinforcement Learning: A Survey" [paper]

Real-World Application of AI Agents

  • "The AI Scientist-v2: Workshop-Level Automated Scientific Discovery via Agentic Tree Search" [paper]
  • "UFO2: The Desktop AgentOS" [paper]
  • "AGENTADA: Skill-Adaptive Data Analytics for Tailored Insight Discovery"[paper]
  • βš–οΈ "BrowseComp: A Simple Yet Challenging Benchmark for Browsing Agents" [paper]
  • "Toward Super Agent System with Hybrid AI Router" [paper] "AgentA/B: Automated and Scalable Web A/B Testing with Interactive LLM Agents" [paper]
  • [Apr 2025] "UXAgent: A System for Simulating Usability Testing of Web Design with LLM Agents" [paper]
  • πŸ“– "Challenges and Paths Towards AI for Software Engineering"[paper]

Survey

  • πŸ“– "Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems"[paper]
  • πŸ“– "Adaptive Human-Agent Teaming: A Review of Empirical Studies from the Process Dynamics Perspective" [paper]
  • πŸ“– "A Survey of AI Agent Protocols" [paper]

For Tasks:

Click tags to check more tools for each tasks

For Jobs:

Alternative AI tools for ai-agent-papers

Similar Open Source Tools

For similar tasks

For similar jobs