
Awesome-Papers-Autonomous-Agent
A collection of recent papers on building autonomous agent. Two topics included: RL-based / LLM-based agents.
Stars: 521

Awesome-Papers-Autonomous-Agent is a curated collection of recent papers focusing on autonomous agents, specifically interested in RL-based agents and LLM-based agents. The repository aims to provide a comprehensive resource for researchers and practitioners interested in intelligent agents that can achieve goals, acquire knowledge, and continually improve. The collection includes papers on various topics such as instruction following, building agents based on world models, using language as knowledge, leveraging LLMs as a tool, generalization across tasks, continual learning, combining RL and LLM, transformer-based policies, trajectory to language, trajectory prediction, multimodal agents, training LLMs for generalization and adaptation, task-specific designing, multi-agent systems, experimental analysis, benchmarking, applications, algorithm design, and combining with RL.
README:
This is a collection of recent papers focusing on autonomous agent. Here is how Wikipedia defines Agent:
In artificial intelligence, an intelligent agent is an agent acting in an intelligent manner; It perceives its environment, takes actions autonomously in order to achieve goals, and may improve its performance with learning or acquiring knowledge. An intelligent agent may be simple or complex: A thermostator other control systemis considered an example of an intelligent agent, as is a human being, as is any system that meets the definition, such as a firm, a state, or a biome.
Thus, the key of an agent is that it can achieve goals, acquire knowledge and continually improve. The traditional agents in RL research will not be considered in this collection. Though LLM-based agents have caught people's eyes in recent research, RL-based agents also take their special position. Specifically, this repo is interested in two types of agent: RL-based agent and LLM-based agent.
Note that this paper list is under active maintaince. Free free to open an issue if you found any missed papers that fit the topic.
- 2024/01/31: Add a special list for surveys on autonomous agent.
- 2023/12/08: Add papers accepted by ICML'23 and ICLR'23 π
- 2023/11/08: Add papers accepted by NeurIPS'23. Add related links (project page or github) to these accepted papers π
- 2023/10/25: Classify all papers based on their research topics. Check ToC for the standard of classification π
- 2023/10/18: Release first version of collection, including papers submitted to ICLR 2024 π
Table of Contents
- A Survey on Large Language Model based Autonomous Agents
- The Rise and Potential of Large Language Model Based Agents: A Survey
- [NeurIPS'23] Natural Language-conditioned Reinforcement Learning with Inside-out Task Language Development and Translation
- [NeurIPS'23] Guide Your Agent with Adaptive Multimodal Rewards [project]
- Compositional Instruction Following with Language Models and Reinforcement Learning
- RT-1: Robotics Transformer for Real-World Control at Scale [blog]
- RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control [blog]
- Open X-Embodiment: Robotic Learning Datasets and RT-X Models [blog]
- [NeurIPS'23] Guide Your Agent with Adaptive Multimodal Rewards [project]
- LEO: An Embodied Generalist Agent in 3D World [project]
- [ICLR'23 Oral] Transformers are Sample-Efficient World Models [code]
- Learning to Model the World with Language
- MAMBA: an Effective World Model Approach for Meta-Reinforcement Learning
- Learning with Language Inference and Tips for Continual Reinforcement Learning
- Informing Reinforcement Learning Agents by Grounding Natural Language to Markov Decision Processes
- Language Reward Modulation for Pretraining Reinforcement Learning
- [NeurIPS'23] Efficient Policy Adaptation with Contrastive Prompt Ensemble for Embodied Agents
- [ICLR'23] Reward Design with Language Models [code]
- [ICML'23] RLang: A Declarative Language for Describing Partial World Knowledge to Reinforcement Learning Agents [Poster]
- [ICML'23] Do Embodied Agents Dream of Pixelated Sheep: Embodied Decision Making using Language Guided World Modelling [Project][Code]
- [ICML'23] Grounding Large Language Models in Interactive Environments with Online Reinforcement Learning
- Leveraging Large Language Models for Optimised Coordination in Textual Multi-Agent Reinforcement Learning
- Text2Reward: Dense Reward Generation with Language Models for Reinforcement Learning
- Language to Rewards for Robotic Skill Synthesis
- Eureka: Human-Level Reward Design via Coding Large Language Models
- STARLING: Self-supervised Training of Text-based Reinforcement Learning Agent with Large Language Models
- ADAPTER-RL: Adaptation of Any Agent using Reinforcement Learning
- Online Continual Learning for Interactive Instruction Following Agents
- [NeurIPS'23] A Definition of Continual Reinforcement Learning
- [NeurIPS'23] Large Language Models Are Semi-Parametric Reinforcement Learning Agents
- RoboGPT : An intelligent agent of making embodied long-term decisions for daily instruction tasks
- Can Language Agents Approach the Performance of RL? An Empirical Study On OpenAI Gym
- RLAdapter: Bridging Large Language Models to Reinforcement Learning in Open Worlds
- [NeurIPS'23] Cross-Episodic Curriculum for Transformer Agents. [project]
- [NeurIPS'23] State2Explanation: Concept-Based Explanations to Benefit Agent Learning and User Understanding
- [NeurIPS'23] Semantic HELM: A Human-Readable Memory for Reinforcement Learning
- [ICML'23] Distilling Internet-Scale Vision-Language Models into Embodied Agents
- Understanding Your Agent: Leveraging Large Language Models for Behavior Explanation
- Enhancing Human Experience in Human-Agent Collaboration: A Human-Centered Modeling Approach Based on Positive Human Gain
- A Competition Winning Deep Reinforcement Learning Agent in microRTS
- Aligning Agents like Large Language Models
- [ICML'23] PaLM-E: An Embodied Multimodal Language Model
- Steve-Eye: Equipping LLM-based Embodied Agents with Visual Perception in Open Worlds
- Multimodal Web Navigation with Instruction-Finetuned Foundation Models
- You Only Look at Screens: Multimodal Chain-of-Action Agents
- Learning Embodied Vision-Language Programming From Instruction, Exploration, and Environmental Feedback
- An Embodied Generalist Agent in 3D World
- JARVIS-1: Open-world Multi-task Agents with Memory-Augmented Multimodal Language Models
- FireAct: Toward Language Agent Finetuning
- Adapting LLM Agents Through Communication
- AgentTuning: Enabling Generalized Agent Abilities for LLMs
- Retroformer: Retrospective Large Language Agents with Policy Gradient Optimization
- [NeurIPS'23] Describe, Explain, Plan and Select: Interactive Planning with LLMs Enables Open-World Multi-Task Agents
- [NeurIPS'23] SwiftSage: A Generative Agent with Fast and Slow Thinking for Complex Interactive Tasks [Github]
- Rethinking the Buyerβs Inspection Paradox in Information Markets with Language Agents
- A Language-Agent Approach to Formal Theorem-Proving
- Agent Instructs Large Language Models to be General Zero-Shot Reasoners
- Ghost in the Minecraft: Hierarchical Agents for Minecraft via Large Language Models with Text-based Knowledge and Memory
- PaperQA: Retrieval-Augmented Generative Agent for Scientific Research
- Language Agents for Detecting Implicit Stereotypes in Text-to-image Models at Scale
- Suspicion-Agent: Playing Imperfect Information Games with Theory of Mind Aware GPT-4
- CoMM: Collaborative Multi-Agent, Multi-Reasoning-Path Prompting for Complex Problem Solving
- Building Cooperative Embodied Agents Modularly with Large Language Models
- OKR-Agent: An Object and Key Results Driven Agent System with Hierarchical Self-Collaboration and Self-Evaluation
- MetaGPT: Meta Programming for Multi-Agent Collaborative Framework
- AutoAgents: A Framework for Automatic Agent Generation
- Dynamic LLM-Agent Network: An LLM-agent Collaboration Framework with Agent Team Optimization
- AgentVerse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors
- Exploring Collaboration Mechanisms for LLM Agents: A Social Psychology View
- REX: Rapid Exploration and eXploitation for AI agents
- Emergence of Social Norms in Large Language Model-based Agent Societies
- Identifying the Risks of LM Agents with an LM-Emulated Sandbox
- Evaluating Multi-Agent Coordination Abilities in Large Language Models
- Large Language Models as Gaming Agents
- Benchmarking Large Language Models as AI Research Agents
- Adaptive Environmental Modeling for Task-Oriented Language Agents
- CLIN: A Continually Learning Language Agent for Rapid Task Adaptation and Generalization
- [ACL'24] A Controllable World of Apps and People for Benchmarking Interactive Coding Agents [website][blog]
- [ICLR'23] Task Ambiguity in Humans and Language Models [code]
- SmartPlay : A Benchmark for LLMs as Intelligent Agents
- AgentBench: Evaluating LLMs as Agents
- Put Your Money Where Your Mouth Is: Evaluating Strategic Planning and Execution of LLM Agents in an Auction Arena
- SOTOPIA: Interactive Evaluation for Social Intelligence in Language Agents
- SocioDojo: Building Lifelong Analytical Agents with Real-world Text and Time Series
- WebArena: A Realistic Web Environment for Building Autonomous Agents
- LLM-Deliberation: Evaluating LLMs with Interactive Multi-Agent Negotiation Game
- Evaluating Large Language Models at Evaluating Instruction Following
- CivRealm: A Learning and Reasoning Odyssey for Decision-Making Agents
- Lyfe Agents: generative agents for low-cost real-time social interactions
- AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation
- [ICLR'23 Oral] ReAct: Synergizing Reasoning and Acting in Language Models [code]
- [NeurIPS'23] AdaPlanner: Adaptive Planning from Feedback with Language Models [github]
- Prospector: Improving LLM Agents with Self-Asking and Trajectory Ranking
- Formally Specifying the High-Level Behavior of LLM-Based Agents
- Cumulative Reasoning With Large Language Models
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for Awesome-Papers-Autonomous-Agent
Similar Open Source Tools

Awesome-Papers-Autonomous-Agent
Awesome-Papers-Autonomous-Agent is a curated collection of recent papers focusing on autonomous agents, specifically interested in RL-based agents and LLM-based agents. The repository aims to provide a comprehensive resource for researchers and practitioners interested in intelligent agents that can achieve goals, acquire knowledge, and continually improve. The collection includes papers on various topics such as instruction following, building agents based on world models, using language as knowledge, leveraging LLMs as a tool, generalization across tasks, continual learning, combining RL and LLM, transformer-based policies, trajectory to language, trajectory prediction, multimodal agents, training LLMs for generalization and adaptation, task-specific designing, multi-agent systems, experimental analysis, benchmarking, applications, algorithm design, and combining with RL.

Awesome-Embodied-AI
Awesome-Embodied-AI is a curated list of papers on Embodied AI and related resources, tracking and summarizing research and industrial progress in the field. It includes surveys, workshops, tutorials, talks, blogs, and papers covering various aspects of Embodied AI, such as vision-language navigation, large language model-based agents, robotics, and more. The repository welcomes contributions and aims to provide a comprehensive overview of the advancements in Embodied AI.

LLM-PLSE-paper
LLM-PLSE-paper is a repository focused on the applications of Large Language Models (LLMs) in Programming Language and Software Engineering (PL/SE) domains. It covers a wide range of topics including bug detection, specification inference and verification, code generation, fuzzing and testing, code model and reasoning, code understanding, IDE technologies, prompting for reasoning tasks, and agent/tool usage and planning. The repository provides a comprehensive collection of research papers, benchmarks, empirical studies, and frameworks related to the capabilities of LLMs in various PL/SE tasks.

OpenManus-RL
OpenManus-RL is an open-source initiative focused on enhancing reasoning and decision-making capabilities of large language models (LLMs) through advanced reinforcement learning (RL)-based agent tuning. The project explores novel algorithmic structures, diverse reasoning paradigms, sophisticated reward strategies, and extensive benchmark environments. It aims to push the boundaries of agent reasoning and tool integration by integrating insights from leading RL tuning frameworks and continuously updating progress in a dynamic, live-streaming fashion.

gorilla
Gorilla is a tool that enables LLMs to use tools by invoking APIs. Given a natural language query, Gorilla comes up with the semantically- and syntactically- correct API to invoke. With Gorilla, you can use LLMs to invoke 1,600+ (and growing) API calls accurately while reducing hallucination. Gorilla also releases APIBench, the largest collection of APIs, curated and easy to be trained on!

SoM-LLaVA
SoM-LLaVA is a new data source and learning paradigm for Multimodal LLMs, empowering open-source Multimodal LLMs with Set-of-Mark prompting and improved visual reasoning ability. The repository provides a new dataset that is complementary to existing training sources, enhancing multimodal LLMs with Set-of-Mark prompting and improved general capacity. By adding 30k SoM data to the visual instruction tuning stage of LLaVA, the tool achieves 1% to 6% relative improvements on all benchmarks. Users can train SoM-LLaVA via command line and utilize the implementation to annotate COCO images with SoM. Additionally, the tool can be loaded in Huggingface for further usage.

Recommendation-Systems-without-Explicit-ID-Features-A-Literature-Review
This repository is a collection of papers and resources related to recommendation systems, focusing on foundation models, transferable recommender systems, large language models, and multimodal recommender systems. It explores questions such as the necessity of ID embeddings, the shift from matching to generating paradigms, and the future of multimodal recommender systems. The papers cover various aspects of recommendation systems, including pretraining, user representation, dataset benchmarks, and evaluation methods. The repository aims to provide insights and advancements in the field of recommendation systems through literature reviews, surveys, and empirical studies.

interpret
InterpretML is an open-source package that incorporates state-of-the-art machine learning interpretability techniques under one roof. With this package, you can train interpretable glassbox models and explain blackbox systems. InterpretML helps you understand your model's global behavior, or understand the reasons behind individual predictions. Interpretability is essential for: - Model debugging - Why did my model make this mistake? - Feature Engineering - How can I improve my model? - Detecting fairness issues - Does my model discriminate? - Human-AI cooperation - How can I understand and trust the model's decisions? - Regulatory compliance - Does my model satisfy legal requirements? - High-risk applications - Healthcare, finance, judicial, ...

cheat-sheet-pdf
The Cheat-Sheet Collection for DevOps, Engineers, IT professionals, and more is a curated list of cheat sheets for various tools and technologies commonly used in the software development and IT industry. It includes cheat sheets for Nginx, Docker, Ansible, Python, Go (Golang), Git, Regular Expressions (Regex), PowerShell, VIM, Jenkins, CI/CD, Kubernetes, Linux, Redis, Slack, Puppet, Google Cloud Developer, AI, Neural Networks, Machine Learning, Deep Learning & Data Science, PostgreSQL, Ajax, AWS, Infrastructure as Code (IaC), System Design, and Cyber Security.

shandu
Shandu is an advanced AI research system that automates comprehensive research processes using language models, web scraping, and iterative exploration to generate well-structured reports with citations. It features intelligent state-based workflow, deep exploration, multi-source information synthesis, enhanced web scraping, smart source evaluation, content analysis pipeline, comprehensive report generation, parallel processing, adaptive search strategy, and full citation management.

glossAPI
The glossAPI project aims to develop a Greek language model as open-source software, with code licensed under EUPL and data under Creative Commons BY-SA. The project focuses on collecting and evaluating open text sources in Greek, with efforts to prioritize and gather textual data sets. The project encourages contributions through the CONTRIBUTING.md file and provides resources in the wiki for viewing and modifying recorded sources. It also welcomes ideas and corrections through issue submissions. The project emphasizes the importance of open standards, ethically secured data, privacy protection, and addressing digital divides in the context of artificial intelligence and advanced language technologies.

agent-squad
Agent Squad is a flexible, lightweight open-source framework for orchestrating multiple AI agents to handle complex conversations. It intelligently routes queries, maintains context across interactions, and offers pre-built components for quick deployment. The system allows easy integration of custom agents and conversation messages storage solutions, making it suitable for various applications from simple chatbots to sophisticated AI systems, scaling efficiently.

LLM-FuzzX
LLM-FuzzX is an open-source user-friendly fuzz testing tool for large language models (e.g., GPT, Claude, LLaMA), equipped with advanced task-aware mutation strategies, fine-grained evaluation, and jailbreak detection capabilities. It helps researchers and developers quickly discover potential security vulnerabilities and enhance model robustness. The tool features a user-friendly web interface for visual configuration and real-time monitoring, supports various advanced mutation methods, integrates RoBERTa model for real-time jailbreak detection and evaluation, supports multiple language models like GPT, Claude, LLaMA, provides visualization analysis with seed flowcharts and experiment data statistics, and offers detailed logging support for main, mutation, and jailbreak logs.

AI_Gen_Novel
AI_Gen_Novel is a project exploring the limits of AI in writing online fiction. Leveraging large language models and multi-agent technology, the tool aims to automatically generate web novels by compressing long texts, optimizing prompts, and enhancing originality. The tool combines the core idea of RecurrentGPT with language-based iterative computation to create texts of any length. Future directions include enhancing model capabilities, optimizing program architecture, and introducing more prior knowledge for structured storytelling.

ExtractThinker
ExtractThinker is a library designed for extracting data from files and documents using Language Model Models (LLMs). It offers ORM-style interaction between files and LLMs, supporting multiple document loaders such as Tesseract OCR, Azure Form Recognizer, AWS TextExtract, and Google Document AI. Users can customize extraction using contract definitions, process documents asynchronously, handle various document formats efficiently, and split and process documents. The project is inspired by the LangChain ecosystem and focuses on Intelligent Document Processing (IDP) using LLMs to achieve high accuracy in document extraction tasks.
For similar tasks

Awesome-LLM-RAG
This repository, Awesome-LLM-RAG, aims to record advanced papers on Retrieval Augmented Generation (RAG) in Large Language Models (LLMs). It serves as a resource hub for researchers interested in promoting their work related to LLM RAG by updating paper information through pull requests. The repository covers various topics such as workshops, tutorials, papers, surveys, benchmarks, retrieval-enhanced LLMs, RAG instruction tuning, RAG in-context learning, RAG embeddings, RAG simulators, RAG search, RAG long-text and memory, RAG evaluation, RAG optimization, and RAG applications.

Awesome_LLM_System-PaperList
Since the emergence of chatGPT in 2022, the acceleration of Large Language Model has become increasingly important. Here is a list of papers on LLMs inference and serving.

LLM-Tool-Survey
This repository contains a collection of papers related to tool learning with large language models (LLMs). The papers are organized according to the survey paper 'Tool Learning with Large Language Models: A Survey'. The survey focuses on the benefits and implementation of tool learning with LLMs, covering aspects such as task planning, tool selection, tool calling, response generation, benchmarks, evaluation, challenges, and future directions in the field. It aims to provide a comprehensive understanding of tool learning with LLMs and inspire further exploration in this emerging area.

Awesome-CVPR2024-ECCV2024-AIGC
A Collection of Papers and Codes for CVPR 2024 AIGC. This repository compiles and organizes research papers and code related to CVPR 2024 and ECCV 2024 AIGC (Artificial Intelligence and Graphics Computing). It serves as a valuable resource for individuals interested in the latest advancements in the field of computer vision and artificial intelligence. Users can find a curated list of papers and accompanying code repositories for further exploration and research. The repository encourages collaboration and contributions from the community through stars, forks, and pull requests.

LLMs-in-science
The 'LLMs-in-science' repository is a collaborative environment for organizing papers related to large language models (LLMs) and autonomous agents in the field of chemistry. The goal is to discuss trend topics, challenges, and the potential for supporting scientific discovery in the context of artificial intelligence. The repository aims to maintain a systematic structure of the field and welcomes contributions from the community to keep the content up-to-date and relevant.

Awesome-Papers-Autonomous-Agent
Awesome-Papers-Autonomous-Agent is a curated collection of recent papers focusing on autonomous agents, specifically interested in RL-based agents and LLM-based agents. The repository aims to provide a comprehensive resource for researchers and practitioners interested in intelligent agents that can achieve goals, acquire knowledge, and continually improve. The collection includes papers on various topics such as instruction following, building agents based on world models, using language as knowledge, leveraging LLMs as a tool, generalization across tasks, continual learning, combining RL and LLM, transformer-based policies, trajectory to language, trajectory prediction, multimodal agents, training LLMs for generalization and adaptation, task-specific designing, multi-agent systems, experimental analysis, benchmarking, applications, algorithm design, and combining with RL.

awesome-lifelong-llm-agent
This repository is a collection of papers and resources related to Lifelong Learning of Large Language Model (LLM) based Agents. It focuses on continual learning and incremental learning of LLM agents, identifying key modules such as Perception, Memory, and Action. The repository serves as a roadmap for understanding lifelong learning in LLM agents and provides a comprehensive overview of related research and surveys.

LLM-Agent-Survey
LLM-Agent-Survey is a comprehensive repository that provides a curated list of papers related to Large Language Model (LLM) agents. The repository categorizes papers based on LLM-Profiled Roles and includes high-quality publications from prestigious conferences and journals. It aims to offer a systematic understanding of LLM-based agents, covering topics such as tool use, planning, and feedback learning. The repository also includes unpublished papers with insightful analysis and novelty, marked for future updates. Users can explore a wide range of surveys, tool use cases, planning workflows, and benchmarks related to LLM agents.
For similar jobs

weave
Weave is a toolkit for developing Generative AI applications, built by Weights & Biases. With Weave, you can log and debug language model inputs, outputs, and traces; build rigorous, apples-to-apples evaluations for language model use cases; and organize all the information generated across the LLM workflow, from experimentation to evaluations to production. Weave aims to bring rigor, best-practices, and composability to the inherently experimental process of developing Generative AI software, without introducing cognitive overhead.

LLMStack
LLMStack is a no-code platform for building generative AI agents, workflows, and chatbots. It allows users to connect their own data, internal tools, and GPT-powered models without any coding experience. LLMStack can be deployed to the cloud or on-premise and can be accessed via HTTP API or triggered from Slack or Discord.

VisionCraft
The VisionCraft API is a free API for using over 100 different AI models. From images to sound.

kaito
Kaito is an operator that automates the AI/ML inference model deployment in a Kubernetes cluster. It manages large model files using container images, avoids tuning deployment parameters to fit GPU hardware by providing preset configurations, auto-provisions GPU nodes based on model requirements, and hosts large model images in the public Microsoft Container Registry (MCR) if the license allows. Using Kaito, the workflow of onboarding large AI inference models in Kubernetes is largely simplified.

PyRIT
PyRIT is an open access automation framework designed to empower security professionals and ML engineers to red team foundation models and their applications. It automates AI Red Teaming tasks to allow operators to focus on more complicated and time-consuming tasks and can also identify security harms such as misuse (e.g., malware generation, jailbreaking), and privacy harms (e.g., identity theft). The goal is to allow researchers to have a baseline of how well their model and entire inference pipeline is doing against different harm categories and to be able to compare that baseline to future iterations of their model. This allows them to have empirical data on how well their model is doing today, and detect any degradation of performance based on future improvements.

tabby
Tabby is a self-hosted AI coding assistant, offering an open-source and on-premises alternative to GitHub Copilot. It boasts several key features: * Self-contained, with no need for a DBMS or cloud service. * OpenAPI interface, easy to integrate with existing infrastructure (e.g Cloud IDE). * Supports consumer-grade GPUs.

spear
SPEAR (Simulator for Photorealistic Embodied AI Research) is a powerful tool for training embodied agents. It features 300 unique virtual indoor environments with 2,566 unique rooms and 17,234 unique objects that can be manipulated individually. Each environment is designed by a professional artist and features detailed geometry, photorealistic materials, and a unique floor plan and object layout. SPEAR is implemented as Unreal Engine assets and provides an OpenAI Gym interface for interacting with the environments via Python.

Magick
Magick is a groundbreaking visual AIDE (Artificial Intelligence Development Environment) for no-code data pipelines and multimodal agents. Magick can connect to other services and comes with nodes and templates well-suited for intelligent agents, chatbots, complex reasoning systems and realistic characters.