AutoAgent
"AutoAgent: Fully-Automated and Zero-Code LLM Agent Framework"
Stars: 1939
AutoAgent is a fully-automated and zero-code framework that enables users to create and deploy LLM agents through natural language alone. It is a top performer on the GAIA Benchmark, equipped with a native self-managing vector database, and allows for easy creation of tools, agents, and workflows without any coding. AutoAgent seamlessly integrates with a wide range of LLMs and supports both function-calling and ReAct interaction modes. It is designed to be dynamic, extensible, customized, and lightweight, serving as a personal AI assistant.
README:
Welcome to AutoAgent! AutoAgent is a Fully-Automated and highly Self-Developing framework that enables users to create and deploy LLM agents through Natural Language Alone.
-
π Top Performer on the GAIA Benchmark AutoAgent has ranked the #1 spot among open-sourced methods, delivering comparable performance to OpenAI's Deep Research.
-
π Agentic-RAG with Native Self-Managing Vector Database AutoAgent equipped with a native self-managing vector database, outperforms industry-leading solutions like LangChain.
-
β¨ Agent and Workflow Create with Ease AutoAgent leverages natural language to effortlessly build ready-to-use tools, agents and workflows - no coding required.
-
π Universal LLM Support AutoAgent seamlessly integrates with A Wide Range of LLMs (e.g., OpenAI, Anthropic, Deepseek, vLLM, Grok, Huggingface ...)
-
π Flexible Interaction Benefit from support for both function-calling and ReAct interaction modes.
-
π€ Dynamic, Extensible, Lightweight AutoAgent is your Personal AI Assistant, designed to be dynamic, extensible, customized, and lightweight.
π Unlock the Future of LLM Agents. Try π₯AutoAgentπ₯ Now!
- [2025, Feb 17]: Β ππWe've updated and released AutoAgent v0.2.0 (formerly known as MetaChain). Detailed changes include: 1) fix the bug of different LLM providers from issues; 2) add automatic installation of AutoAgent in the container environment according to issues; 3) add more easy-to-use commands for the CLI mode. 4) Rename the project to AutoAgent for better understanding.
- [2025, Feb 10]: Β ππWe've released MetaChain!, including framework, evaluation codes and CLI mode! Check our paper for more details.
- β¨ Features
- π₯ News
- π How to Use AutoAgent
- β‘ Quick Start
- βοΈ Todo List
- π¬ How To Reproduce the Results in the Paper
- π Documentation
- π€ Join the Community
- π Acknowledgements
- π Cite
AutoAgent have an out-of-the-box multi-agent system, which you could choose user mode in the start page to use it. This multi-agent system is a general AI assistant, having the same functionality with OpenAI's Deep Research and the comparable performance with it in GAIA benchmark.
- π High Performance: Matches Deep Research using Claude 3.5 rather than OpenAI's o3 model.
- π Model Flexibility: Compatible with any LLM (including Deepseek-R1, Grok, Gemini, etc.)
- π° Cost-Effective: Open-source alternative to Deep Research's $200/month subscription
- π― User-Friendly: Easy-to-deploy CLI interface for seamless interaction
- π File Support: Handles file uploads for enhanced data interaction
π₯ Deep Research (aka User Mode)
The most distinctive feature of AutoAgent is its natural language customization capability. Unlike other agent frameworks, AutoAgent allows you to create tools, agents, and workflows using natural language alone. Simply choose agent editor or workflow editor mode to start your journey of building agents through conversations.
You can use agent editor as shown in the following figure.
Input what kind of agent you want to create. |
Automated agent profiling. |
Output the agent profiles. |
Create the desired tools. |
Input what do you want to complete with the agent. (Optional) |
Create the desired agent(s) and go to the next step. |
You can also create the agent workflows using natural language description with the workflow editor mode, as shown in the following figure. (Tips: this mode does not support tool creation temporarily.)
Input what kind of workflow you want to create. |
Automated workflow profiling. |
Output the workflow profiles. |
Input what do you want to complete with the workflow. (Optional) |
Create the desired workflow(s) and go to the next step. |
git clone https://github.com/HKUDS/AutoAgent.git
cd AutoAgent
pip install -e .We use Docker to containerize the agent-interactive environment. So please install Docker first. You don't need to manually pull the pre-built image, because we have let Auto-Deep-Research automatically pull the pre-built image based on your architecture of your machine.
Create an environment variable file, just like .env.template, and set the API keys for the LLMs you want to use. Not every LLM API Key is required, use what you need.
# Required Github Tokens of your own
GITHUB_AI_TOKEN=
# Optional API Keys
OPENAI_API_KEY=
DEEPSEEK_API_KEY=
ANTHROPIC_API_KEY=
GEMINI_API_KEY=
HUGGINGFACE_API_KEY=
GROQ_API_KEY=
XAI_API_KEY=[π¨ News: ] We have updated a more easy-to-use command to start the CLI mode and fix the bug of different LLM providers from issues. You can follow the following steps to start the CLI mode with different LLM providers with much less configuration.
You can run auto main to start full part of AutoAgent, including user mode, agent editor and workflow editor. Btw, you can also run auto deep-research to start more lightweight user mode, just like the Auto-Deep-Research project. Some configuration of this command is shown below.
-
--container_name: Name of the Docker container (default: 'deepresearch') -
--port: Port for the container (default: 12346) -
COMPLETION_MODEL: Specify the LLM model to use, you should follow the name of Litellm to set the model name. (Default:claude-3-5-sonnet-20241022) -
DEBUG: Enable debug mode for detailed logs (default: False) -
API_BASE_URL: The base URL for the LLM provider (default: None) -
FN_CALL: Enable function calling (default: None). Most of time, you could ignore this option because we have already set the default value based on the model name. -
git_clone: Clone the AutoAgent repository to the local environment (only support with theauto maincommand, default: True) -
test_pull_name: The name of the test pull. (only support with theauto maincommand, default: 'autoagent_mirror')
In the agent editor and workflow editor mode, we should clone a mirror of the AutoAgent repository to the local agent-interactive environment and let our AutoAgent automatically update the AutoAgent itself, such as creating new tools, agents and workflows. So if you want to use the agent editor and workflow editor mode, you should set the git_clone to True and set the test_pull_name to 'autoagent_mirror' or other branches.
Then I will show you how to use the full part of AutoAgent with the auto main command and different LLM providers. If you want to use the auto deep-research command, you can refer to the Auto-Deep-Research project for more details.
- set the
ANTHROPIC_API_KEYin the.envfile.
ANTHROPIC_API_KEY=your_anthropic_api_key- run the following command to start Auto-Deep-Research.
auto main # default model is claude-3-5-sonnet-20241022- set the
OPENAI_API_KEYin the.envfile.
OPENAI_API_KEY=your_openai_api_key- run the following command to start Auto-Deep-Research.
COMPLETION_MODEL=gpt-4o auto main- set the
MISTRAL_API_KEYin the.envfile.
MISTRAL_API_KEY=your_mistral_api_key- run the following command to start Auto-Deep-Research.
COMPLETION_MODEL=mistral/mistral-large-2407 auto main- set the
GEMINI_API_KEYin the.envfile.
GEMINI_API_KEY=your_gemini_api_key- run the following command to start Auto-Deep-Research.
COMPLETION_MODEL=gemini/gemini-2.0-flash auto main- set the
HUGGINGFACE_API_KEYin the.envfile.
HUGGINGFACE_API_KEY=your_huggingface_api_key- run the following command to start Auto-Deep-Research.
COMPLETION_MODEL=huggingface/meta-llama/Llama-3.3-70B-Instruct auto main- set the
GROQ_API_KEYin the.envfile.
GROQ_API_KEY=your_groq_api_key- run the following command to start Auto-Deep-Research.
COMPLETION_MODEL=groq/deepseek-r1-distill-llama-70b auto main- set the
OPENAI_API_KEYin the.envfile.
OPENAI_API_KEY=your_api_key_for_openai_compatible_endpoints- run the following command to start Auto-Deep-Research.
COMPLETION_MODEL=openai/grok-2-latest API_BASE_URL=https://api.x.ai/v1 auto mainWe recommend using OpenRouter as LLM provider of DeepSeek-R1 temporarily. Because official API of DeepSeek-R1 can not be used efficiently.
- set the
OPENROUTER_API_KEYin the.envfile.
OPENROUTER_API_KEY=your_openrouter_api_key- run the following command to start Auto-Deep-Research.
COMPLETION_MODEL=openrouter/deepseek/deepseek-r1 auto main- set the
DEEPSEEK_API_KEYin the.envfile.
DEEPSEEK_API_KEY=your_deepseek_api_key- run the following command to start Auto-Deep-Research.
COMPLETION_MODEL=deepseek/deepseek-chat auto mainAfter the CLI mode is started, you can see the start page of AutoAgent:
You can import the browser cookies to the browser environment to let the agent better access some specific websites. For more details, please refer to the cookies folder.
If you want to create tools from the third-party tool platforms, such as RapidAPI, you should subscribe tools from the platform and add your own API keys by running process_tool_docs.py.
python process_tool_docs.pyMore features coming soon! π Web GUI interface under development.
AutoAgent is continuously evolving! Here's what's coming:
- π More Benchmarks: Expanding evaluations to SWE-bench, WebArena, and more
- π₯οΈ GUI Agent: Supporting Computer-Use agents with GUI interaction
- π§ Tool Platforms: Integration with more platforms like Composio
- ποΈ Code Sandboxes: Supporting additional environments like E2B
- π¨ Web Interface: Developing comprehensive GUI for better user experience
Have ideas or suggestions? Feel free to open an issue! Stay tuned for more exciting updates! π
For the GAIA benchmark, you can run the following command to run the inference.
cd path/to/AutoAgent && sh evaluation/gaia/scripts/run_infer.shFor the evaluation, you can run the following command.
cd path/to/AutoAgent && python evaluation/gaia/get_score.pyFor the Agentic-RAG task, you can run the following command to run the inference.
Step1. Turn to this page and download it. Save them to your datapath.
Step2. Run the following command to run the inference.
cd path/to/AutoAgent && sh evaluation/multihoprag/scripts/run_rag.shStep3. The result will be saved in the evaluation/multihoprag/result.json.
A more detailed documentation is coming soon π, and we will update in the Documentation page.
We want to build a community for AutoAgent, and we welcome everyone to join us. You can join our community by:
- Join our Slack workspace - Here we talk about research, architecture, and future development.
- Join our Discord server - This is a community-run server for general discussion, questions, and feedback.
- Read or post Github Issues - Check out the issues we're working on, or add your own ideas.
Rome wasn't built in a day. AutoAgent stands on the shoulders of giants, and we are deeply grateful for the outstanding work that came before us. Our framework architecture draws inspiration from OpenAI Swarm, while our user mode's three-agent design benefits from Magentic-one's insights. We've also learned from OpenHands for documentation structure and many other excellent projects for agent-environment interaction design, among others. We express our sincere gratitude and respect to all these pioneering works that have been instrumental in shaping AutoAgent.
@misc{AutoAgent,
title={{AutoAgent: A Fully-Automated and Zero-Code Framework for LLM Agents}},
author={Jiabin Tang, Tianyu Fan, Chao Huang},
year={2025},
eprint={202502.05957},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2502.05957},
}For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for AutoAgent
Similar Open Source Tools
AutoAgent
AutoAgent is a fully-automated and zero-code framework that enables users to create and deploy LLM agents through natural language alone. It is a top performer on the GAIA Benchmark, equipped with a native self-managing vector database, and allows for easy creation of tools, agents, and workflows without any coding. AutoAgent seamlessly integrates with a wide range of LLMs and supports both function-calling and ReAct interaction modes. It is designed to be dynamic, extensible, customized, and lightweight, serving as a personal AI assistant.
Auto-Deep-Research
Auto-Deep-Research is an open-source and cost-efficient alternative to OpenAI's Deep Research, based on the AutoAgent framework. It offers high performance, universal LLM support, flexible interaction, cost-efficiency, file support, and one-click launch. Users can seamlessly integrate with various LLMs, handle file uploads, and start instantly with a simple command. The tool aims to provide a fully-automated and personalized AI assistant at a lower cost, catering to community needs and showcasing the potential of AutoAgent for practical AI applications.
joinly
joinly.ai is a connector middleware designed to enable AI agents to actively participate in video calls, providing essential meeting tools for AI agents to perform tasks and interact in real time. It supports live interaction, conversational flow, cross-platform compatibility, bring-your-own-LLM, and choose-your-preferred-TTS/STT services. The tool is 100% open-source, self-hosted, and privacy-first, aiming to make meetings accessible to AI agents by joining and participating in video calls.
kwaak
Kwaak is a tool that allows users to run a team of autonomous AI agents locally from their own machine. It enables users to write code, improve test coverage, update documentation, and enhance code quality while focusing on building innovative projects. Kwaak is designed to run multiple agents in parallel, interact with codebases, answer questions about code, find examples, write and execute code, create pull requests, and more. It is free and open-source, allowing users to bring their own API keys or models via Ollama. Kwaak is part of the bosun.ai project, aiming to be a platform for autonomous code improvement.
Devon
Devon is an open-source pair programmer tool designed to facilitate collaborative coding sessions. It provides features such as multi-file editing, codebase exploration, test writing, bug fixing, and architecture exploration. The tool supports Anthropic, OpenAI, and Groq APIs, with plans to add more models in the future. Devon is community-driven, with ongoing development goals including multi-model support, plugin system for tool builders, self-hostable Electron app, and setting SOTA on SWE-bench Lite. Users can contribute to the project by developing core functionality, conducting research on agent performance, providing feedback, and testing the tool.
pear-landing-page
PearAI Landing Page is an open-source AI-powered code editor managed by Nang and Pan. It is built with Next.js, Vercel, Tailwind CSS, and TypeScript. The project requires setting up environment variables for proper configuration. Users can run the project locally by starting the development server and visiting the specified URL in the browser. Recommended extensions include Prettier, ESLint, and JavaScript and TypeScript Nightly. Contributions to the project are welcomed and appreciated.
PySpur
PySpur is a graph-based editor designed for LLM workflows, offering modular building blocks for easy workflow creation and debugging at node level. It allows users to evaluate final performance and promises self-improvement features in the future. PySpur is easy-to-hack, supports JSON configs for workflow graphs, and is lightweight with minimal dependencies, making it a versatile tool for workflow management in the field of AI and machine learning.
AiR
AiR is an AI tool built entirely in Rust that delivers blazing speed and efficiency. It features accurate translation and seamless text rewriting to supercharge productivity. AiR is designed to assist non-native speakers by automatically fixing errors and polishing language to sound like a native speaker. The tool is under heavy development with more features on the horizon.
bigcodebench
BigCodeBench is an easy-to-use benchmark for code generation with practical and challenging programming tasks. It aims to evaluate the true programming capabilities of large language models (LLMs) in a more realistic setting. The benchmark is designed for HumanEval-like function-level code generation tasks, but with much more complex instructions and diverse function calls. BigCodeBench focuses on the evaluation of LLM4Code with diverse function calls and complex instructions, providing precise evaluation & ranking and pre-generated samples to accelerate code intelligence research. It inherits the design of the EvalPlus framework but differs in terms of execution environment and test evaluation.
MetaGPT
MetaGPT is a multi-agent framework that enables GPT to work in a software company, collaborating to tackle more complex tasks. It assigns different roles to GPTs to form a collaborative entity for complex tasks. MetaGPT takes a one-line requirement as input and outputs user stories, competitive analysis, requirements, data structures, APIs, documents, etc. Internally, MetaGPT includes product managers, architects, project managers, and engineers. It provides the entire process of a software company along with carefully orchestrated SOPs. MetaGPT's core philosophy is "Code = SOP(Team)", materializing SOP and applying it to teams composed of LLMs.
PentestGPT
PentestGPT is a penetration testing tool empowered by ChatGPT, designed to automate the penetration testing process. It operates interactively to guide penetration testers in overall progress and specific operations. The tool supports solving easy to medium HackTheBox machines and other CTF challenges. Users can use PentestGPT to perform tasks like testing connections, using different reasoning models, discussing with the tool, searching on Google, and generating reports. It also supports local LLMs with custom parsers for advanced users.
trieve
Trieve is an advanced relevance API for hybrid search, recommendations, and RAG. It offers a range of features including self-hosting, semantic dense vector search, typo tolerant full-text/neural search, sub-sentence highlighting, recommendations, convenient RAG API routes, the ability to bring your own models, hybrid search with cross-encoder re-ranking, recency biasing, tunable popularity-based ranking, filtering, duplicate detection, and grouping. Trieve is designed to be flexible and customizable, allowing users to tailor it to their specific needs. It is also easy to use, with a simple API and well-documented features.
aira-dojo
aira-dojo is a scalable and customizable framework for AI research agents, designed to accelerate hill-climbing on research capabilities toward a fully automated AI research scientist. The framework provides a general abstraction for tasks and agents, implements the MLE-bench task, and includes state-of-the-art agents. It features an isolated code execution environment that integrates smoothly with job schedulers like Slurm, enabling large-scale experiments and rapid iteration across a portfolio of tasks and solvers.
RooFlow
RooFlow is a VS Code extension that enhances AI-assisted development by providing persistent project context and optimized mode interactions. It reduces token consumption and streamlines workflow by integrating Architect, Code, Test, Debug, and Ask modes. The tool simplifies setup, offers real-time updates, and provides clearer instructions through YAML-based rule files. It includes components like Memory Bank, System Prompts, VS Code Integration, and Real-time Updates. Users can install RooFlow by downloading specific files, placing them in the project structure, and running an insert-variables script. They can then start a chat, select a mode, interact with Roo, and use the 'Update Memory Bank' command for synchronization. The Memory Bank structure includes files for active context, decision log, product context, progress tracking, and system patterns. RooFlow features persistent context, real-time updates, mode collaboration, and reduced token consumption.
lexido
Lexido is an innovative assistant for the Linux command line, designed to boost your productivity and efficiency. Powered by Gemini Pro 1.0 and utilizing the free API, Lexido offers smart suggestions for commands based on your prompts and importantly your current environment. Whether you're installing software, managing files, or configuring system settings, Lexido streamlines the process, making it faster and more intuitive.
EasyInstruct
EasyInstruct is a Python package proposed as an easy-to-use instruction processing framework for Large Language Models (LLMs) like GPT-4, LLaMA, ChatGLM in your research experiments. EasyInstruct modularizes instruction generation, selection, and prompting, while also considering their combination and interaction.
For similar tasks
activepieces
Activepieces is an open source replacement for Zapier, designed to be extensible through a type-safe pieces framework written in Typescript. It features a user-friendly Workflow Builder with support for Branches, Loops, and Drag and Drop. Activepieces integrates with Google Sheets, OpenAI, Discord, and RSS, along with 80+ other integrations. The list of supported integrations continues to grow rapidly, thanks to valuable contributions from the community. Activepieces is an open ecosystem; all piece source code is available in the repository, and they are versioned and published directly to npmjs.com upon contributions. If you cannot find a specific piece on the pieces roadmap, please submit a request by visiting the following link: Request Piece Alternatively, if you are a developer, you can quickly build your own piece using our TypeScript framework. For guidance, please refer to the following guide: Contributor's Guide
bee-agent-framework
The Bee Agent Framework is an open-source tool for building, deploying, and serving powerful agentic workflows at scale. It provides AI agents, tools for creating workflows in Javascript/Python, a code interpreter, memory optimization strategies, serialization for pausing/resuming workflows, traceability features, production-level control, and upcoming features like model-agnostic support and a chat UI. The framework offers various modules for agents, llms, memory, tools, caching, errors, adapters, logging, serialization, and more, with a roadmap including MLFlow integration, JSON support, structured outputs, chat client, base agent improvements, guardrails, and evaluation.
mastra
Mastra is an opinionated Typescript framework designed to help users quickly build AI applications and features. It provides primitives such as workflows, agents, RAG, integrations, syncs, and evals. Users can run Mastra locally or deploy it to a serverless cloud. The framework supports various LLM providers, offers tools for building language models, workflows, and accessing knowledge bases. It includes features like durable graph-based state machines, retrieval-augmented generation, integrations, syncs, and automated tests for evaluating LLM outputs.
otto-m8
otto-m8 is a flowchart based automation platform designed to run deep learning workloads with minimal to no code. It provides a user-friendly interface to spin up a wide range of AI models, including traditional deep learning models and large language models. The tool deploys Docker containers of workflows as APIs for integration with existing workflows, building AI chatbots, or standalone applications. Otto-m8 operates on an Input, Process, Output paradigm, simplifying the process of running AI models into a flowchart-like UI.
flows-ai
Flows AI is a lightweight, type-safe AI workflow orchestrator inspired by Anthropic's agent patterns and built on top of Vercel AI SDK. It provides a simple and deterministic way to build AI workflows by connecting different input/outputs together, either explicitly defining workflows or dynamically breaking down complex tasks using an orchestrator agent. The library is designed without classes or state, focusing on flexible input/output contracts for nodes.
LangGraph-learn
LangGraph-learn is a community-driven project focused on mastering LangGraph and other AI-related topics. It provides hands-on examples and resources to help users learn how to create and manage language model workflows using LangGraph and related tools. The project aims to foster a collaborative learning environment for individuals interested in AI and machine learning by offering practical examples and tutorials on building efficient and reusable workflows involving language models.
xorq
Xorq (formerly LETSQL) is a data processing library built on top of Ibis and DataFusion to write multi-engine data workflows. It provides a flexible and powerful tool for processing and analyzing data from various sources, enabling users to create complex data pipelines and perform advanced data transformations.
beeai-framework
BeeAI Framework is a versatile tool for building production-ready multi-agent systems. It offers flexibility in orchestrating agents, seamless integration with various models and tools, and production-grade controls for scaling. The framework supports Python and TypeScript libraries, enabling users to implement simple to complex multi-agent patterns, connect with AI services, and optimize token usage and resource management.
For similar jobs
weave
Weave is a toolkit for developing Generative AI applications, built by Weights & Biases. With Weave, you can log and debug language model inputs, outputs, and traces; build rigorous, apples-to-apples evaluations for language model use cases; and organize all the information generated across the LLM workflow, from experimentation to evaluations to production. Weave aims to bring rigor, best-practices, and composability to the inherently experimental process of developing Generative AI software, without introducing cognitive overhead.
LLMStack
LLMStack is a no-code platform for building generative AI agents, workflows, and chatbots. It allows users to connect their own data, internal tools, and GPT-powered models without any coding experience. LLMStack can be deployed to the cloud or on-premise and can be accessed via HTTP API or triggered from Slack or Discord.
VisionCraft
The VisionCraft API is a free API for using over 100 different AI models. From images to sound.
kaito
Kaito is an operator that automates the AI/ML inference model deployment in a Kubernetes cluster. It manages large model files using container images, avoids tuning deployment parameters to fit GPU hardware by providing preset configurations, auto-provisions GPU nodes based on model requirements, and hosts large model images in the public Microsoft Container Registry (MCR) if the license allows. Using Kaito, the workflow of onboarding large AI inference models in Kubernetes is largely simplified.
PyRIT
PyRIT is an open access automation framework designed to empower security professionals and ML engineers to red team foundation models and their applications. It automates AI Red Teaming tasks to allow operators to focus on more complicated and time-consuming tasks and can also identify security harms such as misuse (e.g., malware generation, jailbreaking), and privacy harms (e.g., identity theft). The goal is to allow researchers to have a baseline of how well their model and entire inference pipeline is doing against different harm categories and to be able to compare that baseline to future iterations of their model. This allows them to have empirical data on how well their model is doing today, and detect any degradation of performance based on future improvements.
tabby
Tabby is a self-hosted AI coding assistant, offering an open-source and on-premises alternative to GitHub Copilot. It boasts several key features: * Self-contained, with no need for a DBMS or cloud service. * OpenAPI interface, easy to integrate with existing infrastructure (e.g Cloud IDE). * Supports consumer-grade GPUs.
spear
SPEAR (Simulator for Photorealistic Embodied AI Research) is a powerful tool for training embodied agents. It features 300 unique virtual indoor environments with 2,566 unique rooms and 17,234 unique objects that can be manipulated individually. Each environment is designed by a professional artist and features detailed geometry, photorealistic materials, and a unique floor plan and object layout. SPEAR is implemented as Unreal Engine assets and provides an OpenAI Gym interface for interacting with the environments via Python.
Magick
Magick is a groundbreaking visual AIDE (Artificial Intelligence Development Environment) for no-code data pipelines and multimodal agents. Magick can connect to other services and comes with nodes and templates well-suited for intelligent agents, chatbots, complex reasoning systems and realistic characters.











