Awesome-Embodied-AI
A curated list of awesome papers on Embodied AI and related research/industry-driven resources.
Stars: 349
Awesome-Embodied-AI is a curated list of papers on Embodied AI and related resources, tracking and summarizing research and industrial progress in the field. It includes surveys, workshops, tutorials, talks, blogs, and papers covering various aspects of Embodied AI, such as vision-language navigation, large language model-based agents, robotics, and more. The repository welcomes contributions and aims to provide a comprehensive overview of the advancements in Embodied AI.
README:
A curated list of awesome papers on Embodied AI and related research/industry-driven resources, inspired by awesome-computer-vision.
Embodied AI has led to a new breakthrough, and this repository will keep tracking and summarizing the research or industrial progress.
- Contribution is highly welcome and feel free to submit a pull request or contact me.
If you find this repository helpful, please consider Stars ⭐ or Sharing ⬆️.
- CVPR-Workshop
- ICCV-Workshop
- CS539-OregonStateUniversity
- ChatGPT for Robotics: Design Principles and Model Abilities
Please do consider this fantastic paper : Agent AI: Surveying the Horizons of Multimodal Interaction
- Aligning Cyber Space with Physical World: A Comprehensive Survey on Embodied AI
- Vision-Language Navigation with Embodied Intelligence: A Survey
- The Rise and Potential of Large Language Model Based Agents: A Survey
- A Survey of Embodied AI: From Simulators to Research Tasks
- A Survey on LLM-based Autonomous Agents
- Mindstorms in Natural Language-Based Societies of Mind
- Lifelong Learning of Large Language Model based Agents: A Roadmap
- Data Interpreter: An LLM Agent For Data Science
- Communicative Agents for Software Development
- Exposing Limitations of Language Model Agents in Sequential-Task Compositions on the Web
- Experiential Co-Learning of Software-Developing Agents
- EASYTOOL: Enhancing LLM-based Agents with Concise Tool Instruction
- A survey of embodied ai: From simulators to research tasks
- Embodied AI in education: A review on the body, environment, and mind
- Agent ai: Surveying the horizons of multimodal interaction
- Learning to Generate Context-Sensitive Backchannel Smiles for Embodied AI Agents with Applications in Mental Health Dialogues
- Alexa arena: A user-centric interactive platform for embodied ai
- Artificial intelligence education for young children: A case study of technology‐enhanced embodied learning
- Embodiedgpt: Vision-language pre-training via embodied chain of thought
- Multimodal embodied interactive agent for cafe scene
- The Essential Role of Causality in Foundation World Models for Embodied AI
- A Survey on Robotics with Foundation Models: toward Embodied AI
- Where are we in the search for an artificial visual cortex for embodied intelligence?
- A Comprehensive Survey of Cross-Domain Policy Transfer for Embodied Agents
- The sense of agency in human–AI interactions
- " Don't forget to put the milk back!" Dataset for Enabling Embodied Agents to Detect Anomalous Situations
- Vision-Language Navigation with Embodied Intelligence: A Survey
- Hierarchical Auto-Organizing System for Open-Ended Multi-Agent Navigation
- Velma: Verbalization embodiment of llm agents for vision and language navigation in street view
- Spatially-Aware Transformer Memory for Embodied Agents
- VELMA: Verbalization Embodiment of LLM Agents for Vision and Language Navigation in Street View
- Embodied Human Activity Recognition
- LoTa-Bench: Benchmarking Language-oriented Task Planners for Embodied Agents
- EDGI: Equivariant diffusion for planning with embodied agents
- Large Multimodal Agents: A Survey
- Egocentric Planning for Scalable Embodied Task Achievement
- EnvGen: Generating and Adapting Environments via LLMs for Training Embodied Agents
- Human-agent teams in VR and the effects on trust calibration
- Talk with Ted: an embodied conversational agent for caregivers
- MOPA: Modular Object Navigation With PointGoal Agents
- Embodied Conversational Agents for Chronic Diseases: Scoping Review
- Towards anatomy education with generative AI-based virtual assistants in immersive virtual reality environments
- Improving Knowledge Extraction from LLMs for Task Learning through Agent Analysis
- A Survey on Large Language Model-Based Game Agents
- Autort: Embodied foundation models for large scale orchestration of robotic agents
- Towards Heterogeneous Multi-Agent Systems in Space
- Embodied Machine Learning
- Penetrative ai: Making llms comprehend the physical world
- WebVLN: Vision-and-Language Navigation on Websites
- Generating meaning: active inference and the scope and limits of passive AI
- RoboHive: A Unified Framework for Robot Learning
- Grounded Decoding: Guiding Text Generation with Grounded Models for Embodied Agents
- Turing Test in the Era of LLM
- Generative Models for Decision Making
-
AgentScope: A Flexible yet Robust Multi-Agent Platform

-
MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI

-
MuLan: Multimodal-LLM Agent for Progressive Multi-Object Diffusion

- Vision-Language Navigation with Embodied Intelligence: A Survey
- An Interactive Agent Foundation Model
-
UFO:A UI-Focused Agent for Windows OS Interaction

- Thanks to GT-RIPL's repository
- Thanks to Jacob Rintamaki's repository
- Thanks to Jiankai-Sun's repository
- Thanks to Yafei Hu's repository
- Thanks to Changan's repository
- Thanks to Rui's repository
- An Interactive Agent Foundation Model
-
AutoGen, EcoOptiGen

-
AgentTuning: Enabling Generalized Agent Abilities For LLMs

-
AgentBench: Evaluating LLMs as Agents

-
The Rise and Potential of Large Language Model Based Agents: A Survey

-
An Open-source Framework for Autonomous Language Agents

-
MetaGPT: Meta Programming for Multi-Agent Collaborative Framework

-
AgentVerse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors in Agents

-
ModelScope-Agent: Building Your Customizable Agent System with Open-source Large Language Models

-
Embodied Task Planning with Large Language Models

-
Building Cooperative Embodied Agents Modularly with Large Language Models

-
State-Maintaining Language Models for Embodied Reasoning

- Embodied Executable Policy Learning with Language-based Scene Summarization
-
Voyager: An Open-Ended Embodied Agent with Large Language Models

-
Simple Embodied Language Learning as a Byproduct of Meta-Reinforcement Learning

-
Vision-Language Tasks

- Exploring Large Language Models for Communication Games: An Empirical Study on Werewolf
-
Language Guided Generation of 3D Embodied AI Environments

-
CogAgent: Visual Expert for Pretrained Language Models

-
ProAgent: from Robotic Process Automation to Agentic Process Automation

-
Waymax: An accelerated simulator for autonomous driving research

- HOW FAR ARE LARGE LANGUAGE MODELS FROM AGENTS WITH THEORY-OF-MIND?
-
AgentBench: Evaluating LLMs as Agents

- MINDAGENT: EMERGENT GAMING INTERACTION
- Alexa, play with robot: Introducing the First Alexa Prize SimBot Challenge on Embodied AI
- Emergent Communication for Embodied Control
-
Simple but Effective: CLIP Embeddings for Embodied AI

- Embodied AI-Driven Operation of Smart Cities: A Concise Review
-
Modeling Dynamic Environments with Scene Graph Memory

-
An Open-source Framework for Autonomous Language Agents

-
MetaGPT: Meta Programming for Multi-Agent Collaborative Framework

For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for Awesome-Embodied-AI
Similar Open Source Tools
Awesome-Embodied-AI
Awesome-Embodied-AI is a curated list of papers on Embodied AI and related resources, tracking and summarizing research and industrial progress in the field. It includes surveys, workshops, tutorials, talks, blogs, and papers covering various aspects of Embodied AI, such as vision-language navigation, large language model-based agents, robotics, and more. The repository welcomes contributions and aims to provide a comprehensive overview of the advancements in Embodied AI.
Awesome-Papers-Autonomous-Agent
Awesome-Papers-Autonomous-Agent is a curated collection of recent papers focusing on autonomous agents, specifically interested in RL-based agents and LLM-based agents. The repository aims to provide a comprehensive resource for researchers and practitioners interested in intelligent agents that can achieve goals, acquire knowledge, and continually improve. The collection includes papers on various topics such as instruction following, building agents based on world models, using language as knowledge, leveraging LLMs as a tool, generalization across tasks, continual learning, combining RL and LLM, transformer-based policies, trajectory to language, trajectory prediction, multimodal agents, training LLMs for generalization and adaptation, task-specific designing, multi-agent systems, experimental analysis, benchmarking, applications, algorithm design, and combining with RL.
awesome-lifelong-llm-agent
This repository is a collection of papers and resources related to Lifelong Learning of Large Language Model (LLM) based Agents. It focuses on continual learning and incremental learning of LLM agents, identifying key modules such as Perception, Memory, and Action. The repository serves as a roadmap for understanding lifelong learning in LLM agents and provides a comprehensive overview of related research and surveys.
rlhf_thinking_model
This repository is a collection of research notes and resources focusing on training large language models (LLMs) and Reinforcement Learning from Human Feedback (RLHF). It includes methodologies, techniques, and state-of-the-art approaches for optimizing preferences and model alignment in LLM training. The purpose is to serve as a reference for researchers and engineers interested in reinforcement learning, large language models, model alignment, and alternative RL-based methods.
agentsociety
AgentSociety is an advanced framework designed for building agents in urban simulation environments. It integrates LLMs' planning, memory, and reasoning capabilities to generate realistic behaviors. The framework supports dataset-based, text-based, and rule-based environments with interactive visualization. It includes tools for interviews, surveys, interventions, and metric recording tailored for social experimentation.
agent
Stately Agent is a library for building stateful, interactive agents using OpenAI's GPT-3 API. With Stately Agent, you can create agents that can remember past conversations, track state, and generate text that is both informative and engaging.
LLMs-in-science
The 'LLMs-in-science' repository is a collaborative environment for organizing papers related to large language models (LLMs) and autonomous agents in the field of chemistry. The goal is to discuss trend topics, challenges, and the potential for supporting scientific discovery in the context of artificial intelligence. The repository aims to maintain a systematic structure of the field and welcomes contributions from the community to keep the content up-to-date and relevant.
Awesome-LLM-in-Social-Science
This repository compiles a list of academic papers that evaluate, align, simulate, and provide surveys or perspectives on the use of Large Language Models (LLMs) in the field of Social Science. The papers cover various aspects of LLM research, including assessing their alignment with human values, evaluating their capabilities in tasks such as opinion formation and moral reasoning, and exploring their potential for simulating social interactions and addressing issues in diverse fields of Social Science. The repository aims to provide a comprehensive resource for researchers and practitioners interested in the intersection of LLMs and Social Science.
MMMU
MMMU is a benchmark designed to evaluate multimodal models on college-level subject knowledge tasks, covering 30 subjects and 183 subfields with 11.5K questions. It focuses on advanced perception and reasoning with domain-specific knowledge, challenging models to perform tasks akin to those faced by experts. The evaluation of various models highlights substantial challenges, with room for improvement to stimulate the community towards expert artificial general intelligence (AGI).
awesome-ai
Awesome AI is a curated list of artificial intelligence resources including courses, tools, apps, and open-source projects. It covers a wide range of topics such as machine learning, deep learning, natural language processing, robotics, conversational interfaces, data science, and more. The repository serves as a comprehensive guide for individuals interested in exploring the field of artificial intelligence and its applications across various domains.
LAMBDA
LAMBDA is a code-free multi-agent data analysis system that utilizes large models to address data analysis challenges in complex data-driven applications. It allows users to perform complex data analysis tasks through human language instruction, seamlessly generate and debug code using two key agent roles, integrate external models and algorithms, and automatically generate reports. The system has demonstrated strong performance on various machine learning datasets, enhancing data science practice by integrating human and artificial intelligence.
policy-synth
Policy Synth is a TypeScript class library that empowers better decision-making for governments and companies by integrating collective and artificial intelligence. It streamlines processes through multi-scale AI agent logic flows, robust APIs, and cutting-edge real-time AI-driven web applications. The tool supports organizations in generating, refining, and implementing smarter, data-informed strategies, fostering collaboration with AI to tackle complex challenges effectively.
awesome-llms-fine-tuning
This repository is a curated collection of resources for fine-tuning Large Language Models (LLMs) like GPT, BERT, RoBERTa, and their variants. It includes tutorials, papers, tools, frameworks, and best practices to aid researchers, data scientists, and machine learning practitioners in adapting pre-trained models to specific tasks and domains. The resources cover a wide range of topics related to fine-tuning LLMs, providing valuable insights and guidelines to streamline the process and enhance model performance.
AI6127
AI6127 is a course focusing on deep neural networks for natural language processing (NLP). It covers core NLP tasks and machine learning models, emphasizing deep learning methods using libraries like Pytorch. The course aims to teach students state-of-the-art techniques for practical NLP problems, including writing, debugging, and training deep neural models. It also explores advancements in NLP such as Transformers and ChatGPT.
llm_benchmarks
llm_benchmarks is a collection of benchmarks and datasets for evaluating Large Language Models (LLMs). It includes various tasks and datasets to assess LLMs' knowledge, reasoning, language understanding, and conversational abilities. The repository aims to provide comprehensive evaluation resources for LLMs across different domains and applications, such as education, healthcare, content moderation, coding, and conversational AI. Researchers and developers can leverage these benchmarks to test and improve the performance of LLMs in various real-world scenarios.
For similar tasks
genai-os
Kuwa GenAI OS is an open, free, secure, and privacy-focused Generative-AI Operating System. It provides a multi-lingual turnkey solution for GenAI development and deployment on Linux and Windows. Users can enjoy features such as concurrent multi-chat, quoting, full prompt-list import/export/share, and flexible orchestration of prompts, RAGs, bots, models, and hardware/GPUs. The system supports various environments from virtual hosts to cloud, and it is open source, allowing developers to contribute and customize according to their needs.
Neurite
Neurite is an innovative project that combines chaos theory and graph theory to create a digital interface that explores hidden patterns and connections for creative thinking. It offers a unique workspace blending fractals with mind mapping techniques, allowing users to navigate the Mandelbrot set in real-time. Nodes in Neurite represent various content types like text, images, videos, code, and AI agents, enabling users to create personalized microcosms of thoughts and inspirations. The tool supports synchronized knowledge management through bi-directional synchronization between mind-mapping and text-based hyperlinking. Neurite also features FractalGPT for modular conversation with AI, local AI capabilities for multi-agent chat networks, and a Neural API for executing code and sequencing animations. The project is actively developed with plans for deeper fractal zoom, advanced control over node placement, and experimental features.
fast-stable-diffusion
Fast-stable-diffusion is a project that offers notebooks for RunPod, Paperspace, and Colab Pro adaptations with AUTOMATIC1111 Webui and Dreambooth. It provides tools for running and implementing Dreambooth, a stable diffusion project. The project includes implementations by XavierXiao and is sponsored by Runpod, Paperspace, and Colab Pro.
big-AGI
big-AGI is an AI suite designed for professionals seeking function, form, simplicity, and speed. It offers best-in-class Chats, Beams, and Calls with AI personas, visualizations, coding, drawing, side-by-side chatting, and more, all wrapped in a polished UX. The tool is powered by the latest models from 12 vendors and open-source servers, providing users with advanced AI capabilities and a seamless user experience. With continuous updates and enhancements, big-AGI aims to stay ahead of the curve in the AI landscape, catering to the needs of both developers and AI enthusiasts.
generative-ai
This repository contains codes related to Generative AI as per YouTube video. It includes various notebooks and files for different days covering topics like map reduce, text to SQL, LLM parameters, tagging, and Kaggle competition. The repository also includes resources like PDF files and databases for different projects related to Generative AI.
Cradle
The Cradle project is a framework designed for General Computer Control (GCC), empowering foundation agents to excel in various computer tasks through strong reasoning abilities, self-improvement, and skill curation. It provides a standardized environment with minimal requirements, constantly evolving to support more games and software. The repository includes released versions, publications, and relevant assets.
azure-functions-openai-extension
Azure Functions OpenAI Extension is a project that adds support for OpenAI LLM (GPT-3.5-turbo, GPT-4) bindings in Azure Functions. It provides NuGet packages for various functionalities like text completions, chat completions, assistants, embeddings generators, and semantic search. The project requires .NET 6 SDK or greater, Azure Functions Core Tools v4.x, and specific settings in Azure Function or local settings for development. It offers features like text completions, chat completion, assistants with custom skills, embeddings generators for text relatedness, and semantic search using vector databases. The project also includes examples in C# and Python for different functionalities.
rubra
Rubra is a collection of open-weight large language models enhanced with tool-calling capability. It allows users to call user-defined external tools in a deterministic manner while reasoning and chatting, making it ideal for agentic use cases. The models are further post-trained to teach instruct-tuned models new skills and mitigate catastrophic forgetting. Rubra extends popular inferencing projects for easy use, enabling users to run the models easily.
For similar jobs
sweep
Sweep is an AI junior developer that turns bugs and feature requests into code changes. It automatically handles developer experience improvements like adding type hints and improving test coverage.
teams-ai
The Teams AI Library is a software development kit (SDK) that helps developers create bots that can interact with Teams and Microsoft 365 applications. It is built on top of the Bot Framework SDK and simplifies the process of developing bots that interact with Teams' artificial intelligence capabilities. The SDK is available for JavaScript/TypeScript, .NET, and Python.
ai-guide
This guide is dedicated to Large Language Models (LLMs) that you can run on your home computer. It assumes your PC is a lower-end, non-gaming setup.
classifai
Supercharge WordPress Content Workflows and Engagement with Artificial Intelligence. Tap into leading cloud-based services like OpenAI, Microsoft Azure AI, Google Gemini and IBM Watson to augment your WordPress-powered websites. Publish content faster while improving SEO performance and increasing audience engagement. ClassifAI integrates Artificial Intelligence and Machine Learning technologies to lighten your workload and eliminate tedious tasks, giving you more time to create original content that matters.
chatbot-ui
Chatbot UI is an open-source AI chat app that allows users to create and deploy their own AI chatbots. It is easy to use and can be customized to fit any need. Chatbot UI is perfect for businesses, developers, and anyone who wants to create a chatbot.
BricksLLM
BricksLLM is a cloud native AI gateway written in Go. Currently, it provides native support for OpenAI, Anthropic, Azure OpenAI and vLLM. BricksLLM aims to provide enterprise level infrastructure that can power any LLM production use cases. Here are some use cases for BricksLLM: * Set LLM usage limits for users on different pricing tiers * Track LLM usage on a per user and per organization basis * Block or redact requests containing PIIs * Improve LLM reliability with failovers, retries and caching * Distribute API keys with rate limits and cost limits for internal development/production use cases * Distribute API keys with rate limits and cost limits for students
uAgents
uAgents is a Python library developed by Fetch.ai that allows for the creation of autonomous AI agents. These agents can perform various tasks on a schedule or take action on various events. uAgents are easy to create and manage, and they are connected to a fast-growing network of other uAgents. They are also secure, with cryptographically secured messages and wallets.
griptape
Griptape is a modular Python framework for building AI-powered applications that securely connect to your enterprise data and APIs. It offers developers the ability to maintain control and flexibility at every step. Griptape's core components include Structures (Agents, Pipelines, and Workflows), Tasks, Tools, Memory (Conversation Memory, Task Memory, and Meta Memory), Drivers (Prompt and Embedding Drivers, Vector Store Drivers, Image Generation Drivers, Image Query Drivers, SQL Drivers, Web Scraper Drivers, and Conversation Memory Drivers), Engines (Query Engines, Extraction Engines, Summary Engines, Image Generation Engines, and Image Query Engines), and additional components (Rulesets, Loaders, Artifacts, Chunkers, and Tokenizers). Griptape enables developers to create AI-powered applications with ease and efficiency.