
Awesome-Embodied-AI
A curated list of awesome papers on Embodied AI and related research/industry-driven resources.
Stars: 349

Awesome-Embodied-AI is a curated list of papers on Embodied AI and related resources, tracking and summarizing research and industrial progress in the field. It includes surveys, workshops, tutorials, talks, blogs, and papers covering various aspects of Embodied AI, such as vision-language navigation, large language model-based agents, robotics, and more. The repository welcomes contributions and aims to provide a comprehensive overview of the advancements in Embodied AI.
README:
A curated list of awesome papers on Embodied AI and related research/industry-driven resources, inspired by awesome-computer-vision.
Embodied AI has led to a new breakthrough, and this repository will keep tracking and summarizing the research or industrial progress.
- Contribution is highly welcome and feel free to submit a pull request or contact me.
If you find this repository helpful, please consider Stars ⭐ or Sharing ⬆️.
- CVPR-Workshop
- ICCV-Workshop
- CS539-OregonStateUniversity
- ChatGPT for Robotics: Design Principles and Model Abilities
Please do consider this fantastic paper : Agent AI: Surveying the Horizons of Multimodal Interaction
- Aligning Cyber Space with Physical World: A Comprehensive Survey on Embodied AI
- Vision-Language Navigation with Embodied Intelligence: A Survey
- The Rise and Potential of Large Language Model Based Agents: A Survey
- A Survey of Embodied AI: From Simulators to Research Tasks
- A Survey on LLM-based Autonomous Agents
- Mindstorms in Natural Language-Based Societies of Mind
- Lifelong Learning of Large Language Model based Agents: A Roadmap
- Data Interpreter: An LLM Agent For Data Science
- Communicative Agents for Software Development
- Exposing Limitations of Language Model Agents in Sequential-Task Compositions on the Web
- Experiential Co-Learning of Software-Developing Agents
- EASYTOOL: Enhancing LLM-based Agents with Concise Tool Instruction
- A survey of embodied ai: From simulators to research tasks
- Embodied AI in education: A review on the body, environment, and mind
- Agent ai: Surveying the horizons of multimodal interaction
- Learning to Generate Context-Sensitive Backchannel Smiles for Embodied AI Agents with Applications in Mental Health Dialogues
- Alexa arena: A user-centric interactive platform for embodied ai
- Artificial intelligence education for young children: A case study of technology‐enhanced embodied learning
- Embodiedgpt: Vision-language pre-training via embodied chain of thought
- Multimodal embodied interactive agent for cafe scene
- The Essential Role of Causality in Foundation World Models for Embodied AI
- A Survey on Robotics with Foundation Models: toward Embodied AI
- Where are we in the search for an artificial visual cortex for embodied intelligence?
- A Comprehensive Survey of Cross-Domain Policy Transfer for Embodied Agents
- The sense of agency in human–AI interactions
- " Don't forget to put the milk back!" Dataset for Enabling Embodied Agents to Detect Anomalous Situations
- Vision-Language Navigation with Embodied Intelligence: A Survey
- Hierarchical Auto-Organizing System for Open-Ended Multi-Agent Navigation
- Velma: Verbalization embodiment of llm agents for vision and language navigation in street view
- Spatially-Aware Transformer Memory for Embodied Agents
- VELMA: Verbalization Embodiment of LLM Agents for Vision and Language Navigation in Street View
- Embodied Human Activity Recognition
- LoTa-Bench: Benchmarking Language-oriented Task Planners for Embodied Agents
- EDGI: Equivariant diffusion for planning with embodied agents
- Large Multimodal Agents: A Survey
- Egocentric Planning for Scalable Embodied Task Achievement
- EnvGen: Generating and Adapting Environments via LLMs for Training Embodied Agents
- Human-agent teams in VR and the effects on trust calibration
- Talk with Ted: an embodied conversational agent for caregivers
- MOPA: Modular Object Navigation With PointGoal Agents
- Embodied Conversational Agents for Chronic Diseases: Scoping Review
- Towards anatomy education with generative AI-based virtual assistants in immersive virtual reality environments
- Improving Knowledge Extraction from LLMs for Task Learning through Agent Analysis
- A Survey on Large Language Model-Based Game Agents
- Autort: Embodied foundation models for large scale orchestration of robotic agents
- Towards Heterogeneous Multi-Agent Systems in Space
- Embodied Machine Learning
- Penetrative ai: Making llms comprehend the physical world
- WebVLN: Vision-and-Language Navigation on Websites
- Generating meaning: active inference and the scope and limits of passive AI
- RoboHive: A Unified Framework for Robot Learning
- Grounded Decoding: Guiding Text Generation with Grounded Models for Embodied Agents
- Turing Test in the Era of LLM
- Generative Models for Decision Making
-
AgentScope: A Flexible yet Robust Multi-Agent Platform
-
MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI
-
MuLan: Multimodal-LLM Agent for Progressive Multi-Object Diffusion
- Vision-Language Navigation with Embodied Intelligence: A Survey
- An Interactive Agent Foundation Model
-
UFO:A UI-Focused Agent for Windows OS Interaction
- Thanks to GT-RIPL's repository
- Thanks to Jacob Rintamaki's repository
- Thanks to Jiankai-Sun's repository
- Thanks to Yafei Hu's repository
- Thanks to Changan's repository
- Thanks to Rui's repository
- An Interactive Agent Foundation Model
-
AutoGen, EcoOptiGen
-
AgentTuning: Enabling Generalized Agent Abilities For LLMs
-
AgentBench: Evaluating LLMs as Agents
-
The Rise and Potential of Large Language Model Based Agents: A Survey
-
An Open-source Framework for Autonomous Language Agents
-
MetaGPT: Meta Programming for Multi-Agent Collaborative Framework
-
AgentVerse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors in Agents
-
ModelScope-Agent: Building Your Customizable Agent System with Open-source Large Language Models
-
Embodied Task Planning with Large Language Models
-
Building Cooperative Embodied Agents Modularly with Large Language Models
-
State-Maintaining Language Models for Embodied Reasoning
- Embodied Executable Policy Learning with Language-based Scene Summarization
-
Voyager: An Open-Ended Embodied Agent with Large Language Models
-
Simple Embodied Language Learning as a Byproduct of Meta-Reinforcement Learning
-
Vision-Language Tasks
- Exploring Large Language Models for Communication Games: An Empirical Study on Werewolf
-
Language Guided Generation of 3D Embodied AI Environments
-
CogAgent: Visual Expert for Pretrained Language Models
-
ProAgent: from Robotic Process Automation to Agentic Process Automation
-
Waymax: An accelerated simulator for autonomous driving research
- HOW FAR ARE LARGE LANGUAGE MODELS FROM AGENTS WITH THEORY-OF-MIND?
-
AgentBench: Evaluating LLMs as Agents
- MINDAGENT: EMERGENT GAMING INTERACTION
- Alexa, play with robot: Introducing the First Alexa Prize SimBot Challenge on Embodied AI
- Emergent Communication for Embodied Control
-
Simple but Effective: CLIP Embeddings for Embodied AI
- Embodied AI-Driven Operation of Smart Cities: A Concise Review
-
Modeling Dynamic Environments with Scene Graph Memory
-
An Open-source Framework for Autonomous Language Agents
-
MetaGPT: Meta Programming for Multi-Agent Collaborative Framework
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for Awesome-Embodied-AI
Similar Open Source Tools

Awesome-Embodied-AI
Awesome-Embodied-AI is a curated list of papers on Embodied AI and related resources, tracking and summarizing research and industrial progress in the field. It includes surveys, workshops, tutorials, talks, blogs, and papers covering various aspects of Embodied AI, such as vision-language navigation, large language model-based agents, robotics, and more. The repository welcomes contributions and aims to provide a comprehensive overview of the advancements in Embodied AI.

Awesome-Papers-Autonomous-Agent
Awesome-Papers-Autonomous-Agent is a curated collection of recent papers focusing on autonomous agents, specifically interested in RL-based agents and LLM-based agents. The repository aims to provide a comprehensive resource for researchers and practitioners interested in intelligent agents that can achieve goals, acquire knowledge, and continually improve. The collection includes papers on various topics such as instruction following, building agents based on world models, using language as knowledge, leveraging LLMs as a tool, generalization across tasks, continual learning, combining RL and LLM, transformer-based policies, trajectory to language, trajectory prediction, multimodal agents, training LLMs for generalization and adaptation, task-specific designing, multi-agent systems, experimental analysis, benchmarking, applications, algorithm design, and combining with RL.

ai_all_resources
This repository is a compilation of excellent ML and DL tutorials created by various individuals and organizations. It covers a wide range of topics, including machine learning fundamentals, deep learning, computer vision, natural language processing, reinforcement learning, and more. The resources are organized into categories, making it easy to find the information you need. Whether you're a beginner or an experienced practitioner, you're sure to find something valuable in this repository.

awesome-ai-coding
Awesome-AI-Coding is a curated list of AI coding topics, projects, datasets, LLM models, embedding models, papers, blogs, products, startups, and peer awesome lists related to artificial intelligence in coding. It includes tools for code completion, code generation, code documentation, and code search, as well as AI models and techniques for improving developer productivity. The repository also features information on various AI-powered developer tools, copilots, and related resources in the AI coding domain.

fastRAG
fastRAG is a research framework designed to build and explore efficient retrieval-augmented generative models. It incorporates state-of-the-art Large Language Models (LLMs) and Information Retrieval to empower researchers and developers with a comprehensive tool-set for advancing retrieval augmented generation. The framework is optimized for Intel hardware, customizable, and includes key features such as optimized RAG pipelines, efficient components, and RAG-efficient components like ColBERT and Fusion-in-Decoder (FiD). fastRAG supports various unique components and backends for running LLMs, making it a versatile tool for research and development in the field of retrieval-augmented generation.

awesome-khmer-language
Awesome Khmer Language is a comprehensive collection of resources for the Khmer language, including tools, datasets, research papers, projects/models, blogs/slides, and miscellaneous items. It covers a wide range of topics related to Khmer language processing, such as character normalization, word segmentation, part-of-speech tagging, optical character recognition, text-to-speech, and more. The repository aims to support the development of natural language processing applications for the Khmer language by providing a diverse set of resources and tools for researchers and developers.

AI-Bootcamp
The AI Bootcamp is a comprehensive training program focusing on real-world applications to equip individuals with the skills and knowledge needed to excel as AI engineers. The bootcamp covers topics such as Real-World PyTorch, Machine Learning Projects, Fine-tuning Tiny LLM, Deployment of LLM to Production, AI Agents with GPT-4 Turbo, CrewAI, Llama 3, and more. Participants will learn foundational skills in Python for AI, ML Pipelines, Large Language Models (LLMs), AI Agents, and work on projects like RagBase for private document chat.

awesome-flux-ai
Awesome Flux AI is a curated list of resources, tools, libraries, and applications related to Flux AI technology. It serves as a comprehensive collection for developers, researchers, and enthusiasts interested in Flux AI. The platform offers open-source text-to-image AI models developed by Black Forest Labs, aiming to advance generative deep learning models for media, creativity, efficiency, and diversity.

app_generative_ai
This repository contains course materials for T81 559: Applications of Generative Artificial Intelligence at Washington University in St. Louis. The course covers practical applications of Large Language Models (LLMs) and text-to-image networks using Python. Students learn about generative AI principles, LangChain, Retrieval-Augmented Generation (RAG) model, image generation techniques, fine-tuning neural networks, and prompt engineering. Ideal for students, researchers, and professionals in computer science, the course offers a transformative learning experience in the realm of Generative AI.

awesome-quant-ai
Awesome Quant AI is a curated list of resources focusing on quantitative investment and trading strategies using artificial intelligence and machine learning in finance. It covers key challenges in quantitative finance, AI/ML technical fit, predictive modeling, sequential decision-making, synthetic data generation, contextual reasoning, mathematical foundations, design approach, quantitative trading strategies, tools and platforms, learning resources, books, research papers, community, and conferences. The repository aims to provide a comprehensive resource for those interested in the intersection of AI, machine learning, and quantitative finance, with a focus on extracting alpha while managing risk in financial systems.

awesome-mobile-robotics
The 'awesome-mobile-robotics' repository is a curated list of important content related to Mobile Robotics and AI. It includes resources such as courses, books, datasets, software and libraries, podcasts, conferences, journals, companies and jobs, laboratories and research groups, and miscellaneous resources. The repository covers a wide range of topics in the field of Mobile Robotics and AI, providing valuable information for enthusiasts, researchers, and professionals in the domain.

Awesome-AI-Data-Guided-Projects
A curated list of data science & AI guided projects to start building your portfolio. The repository contains guided projects covering various topics such as large language models, time series analysis, computer vision, natural language processing (NLP), and data science. Each project provides detailed instructions on how to implement specific tasks using different tools and technologies.

llamator
LLAMATOR is a Red Teaming python-framework designed for testing chatbots and LLM-systems. It provides support for custom attacks, a wide range of attacks on RAG/Agent/Prompt in English and Russian, custom configuration of chat clients, history of attack requests and responses in Excel and CSV format, and test report document generation in DOCX format. The tool is classified under OWASP for Prompt Injection, Prompt Leakage, and Misinformation. It is supported by AI Security Lab ITMO, Raft Security, and AI Talent Hub.

openrl
OpenRL is an open-source general reinforcement learning research framework that supports training for various tasks such as single-agent, multi-agent, offline RL, self-play, and natural language. Developed based on PyTorch, the goal of OpenRL is to provide a simple-to-use, flexible, efficient and sustainable platform for the reinforcement learning research community. It supports a universal interface for all tasks/environments, single-agent and multi-agent tasks, offline RL training with expert dataset, self-play training, reinforcement learning training for natural language tasks, DeepSpeed, Arena for evaluation, importing models and datasets from Hugging Face, user-defined environments, models, and datasets, gymnasium environments, callbacks, visualization tools, unit testing, and code coverage testing. It also supports various algorithms like PPO, DQN, SAC, and environments like Gymnasium, MuJoCo, Atari, and more.

llm-rag-vectordb-python
This repository provides sample applications and tutorials to showcase the power of Amazon Bedrock with Python. It helps Python developers understand how to harness Amazon Bedrock in building generative AI-enabled applications. The resources also demonstrate integration with vector databases using RAG (Retrieval-augmented generation) and services like Amazon Aurora, RDS, and OpenSearch. Additionally, it explores using langchain and streamlit to create effective experimental applications.

llamator
LLAMATOR is a Red Teaming Python framework designed for testing chatbots and LLM systems. It provides support for custom attacks, a wide range of attack options in English and Russian, custom configuration of chat clients, history tracking of attack requests and responses in Excel and CSV formats, and test report generation in DOCX format. The tool is classified under OWASP as addressing prompt injection, system prompt leakage, and misinformation. It is supported by the AI Security Lab ITMO, Raft Security, and AI Talent Hub, and is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International license.
For similar tasks

genai-os
Kuwa GenAI OS is an open, free, secure, and privacy-focused Generative-AI Operating System. It provides a multi-lingual turnkey solution for GenAI development and deployment on Linux and Windows. Users can enjoy features such as concurrent multi-chat, quoting, full prompt-list import/export/share, and flexible orchestration of prompts, RAGs, bots, models, and hardware/GPUs. The system supports various environments from virtual hosts to cloud, and it is open source, allowing developers to contribute and customize according to their needs.

Neurite
Neurite is an innovative project that combines chaos theory and graph theory to create a digital interface that explores hidden patterns and connections for creative thinking. It offers a unique workspace blending fractals with mind mapping techniques, allowing users to navigate the Mandelbrot set in real-time. Nodes in Neurite represent various content types like text, images, videos, code, and AI agents, enabling users to create personalized microcosms of thoughts and inspirations. The tool supports synchronized knowledge management through bi-directional synchronization between mind-mapping and text-based hyperlinking. Neurite also features FractalGPT for modular conversation with AI, local AI capabilities for multi-agent chat networks, and a Neural API for executing code and sequencing animations. The project is actively developed with plans for deeper fractal zoom, advanced control over node placement, and experimental features.

fast-stable-diffusion
Fast-stable-diffusion is a project that offers notebooks for RunPod, Paperspace, and Colab Pro adaptations with AUTOMATIC1111 Webui and Dreambooth. It provides tools for running and implementing Dreambooth, a stable diffusion project. The project includes implementations by XavierXiao and is sponsored by Runpod, Paperspace, and Colab Pro.

big-AGI
big-AGI is an AI suite designed for professionals seeking function, form, simplicity, and speed. It offers best-in-class Chats, Beams, and Calls with AI personas, visualizations, coding, drawing, side-by-side chatting, and more, all wrapped in a polished UX. The tool is powered by the latest models from 12 vendors and open-source servers, providing users with advanced AI capabilities and a seamless user experience. With continuous updates and enhancements, big-AGI aims to stay ahead of the curve in the AI landscape, catering to the needs of both developers and AI enthusiasts.

generative-ai
This repository contains codes related to Generative AI as per YouTube video. It includes various notebooks and files for different days covering topics like map reduce, text to SQL, LLM parameters, tagging, and Kaggle competition. The repository also includes resources like PDF files and databases for different projects related to Generative AI.

Cradle
The Cradle project is a framework designed for General Computer Control (GCC), empowering foundation agents to excel in various computer tasks through strong reasoning abilities, self-improvement, and skill curation. It provides a standardized environment with minimal requirements, constantly evolving to support more games and software. The repository includes released versions, publications, and relevant assets.

azure-functions-openai-extension
Azure Functions OpenAI Extension is a project that adds support for OpenAI LLM (GPT-3.5-turbo, GPT-4) bindings in Azure Functions. It provides NuGet packages for various functionalities like text completions, chat completions, assistants, embeddings generators, and semantic search. The project requires .NET 6 SDK or greater, Azure Functions Core Tools v4.x, and specific settings in Azure Function or local settings for development. It offers features like text completions, chat completion, assistants with custom skills, embeddings generators for text relatedness, and semantic search using vector databases. The project also includes examples in C# and Python for different functionalities.

rubra
Rubra is a collection of open-weight large language models enhanced with tool-calling capability. It allows users to call user-defined external tools in a deterministic manner while reasoning and chatting, making it ideal for agentic use cases. The models are further post-trained to teach instruct-tuned models new skills and mitigate catastrophic forgetting. Rubra extends popular inferencing projects for easy use, enabling users to run the models easily.
For similar jobs

sweep
Sweep is an AI junior developer that turns bugs and feature requests into code changes. It automatically handles developer experience improvements like adding type hints and improving test coverage.

teams-ai
The Teams AI Library is a software development kit (SDK) that helps developers create bots that can interact with Teams and Microsoft 365 applications. It is built on top of the Bot Framework SDK and simplifies the process of developing bots that interact with Teams' artificial intelligence capabilities. The SDK is available for JavaScript/TypeScript, .NET, and Python.

ai-guide
This guide is dedicated to Large Language Models (LLMs) that you can run on your home computer. It assumes your PC is a lower-end, non-gaming setup.

classifai
Supercharge WordPress Content Workflows and Engagement with Artificial Intelligence. Tap into leading cloud-based services like OpenAI, Microsoft Azure AI, Google Gemini and IBM Watson to augment your WordPress-powered websites. Publish content faster while improving SEO performance and increasing audience engagement. ClassifAI integrates Artificial Intelligence and Machine Learning technologies to lighten your workload and eliminate tedious tasks, giving you more time to create original content that matters.

chatbot-ui
Chatbot UI is an open-source AI chat app that allows users to create and deploy their own AI chatbots. It is easy to use and can be customized to fit any need. Chatbot UI is perfect for businesses, developers, and anyone who wants to create a chatbot.

BricksLLM
BricksLLM is a cloud native AI gateway written in Go. Currently, it provides native support for OpenAI, Anthropic, Azure OpenAI and vLLM. BricksLLM aims to provide enterprise level infrastructure that can power any LLM production use cases. Here are some use cases for BricksLLM: * Set LLM usage limits for users on different pricing tiers * Track LLM usage on a per user and per organization basis * Block or redact requests containing PIIs * Improve LLM reliability with failovers, retries and caching * Distribute API keys with rate limits and cost limits for internal development/production use cases * Distribute API keys with rate limits and cost limits for students

uAgents
uAgents is a Python library developed by Fetch.ai that allows for the creation of autonomous AI agents. These agents can perform various tasks on a schedule or take action on various events. uAgents are easy to create and manage, and they are connected to a fast-growing network of other uAgents. They are also secure, with cryptographically secured messages and wallets.

griptape
Griptape is a modular Python framework for building AI-powered applications that securely connect to your enterprise data and APIs. It offers developers the ability to maintain control and flexibility at every step. Griptape's core components include Structures (Agents, Pipelines, and Workflows), Tasks, Tools, Memory (Conversation Memory, Task Memory, and Meta Memory), Drivers (Prompt and Embedding Drivers, Vector Store Drivers, Image Generation Drivers, Image Query Drivers, SQL Drivers, Web Scraper Drivers, and Conversation Memory Drivers), Engines (Query Engines, Extraction Engines, Summary Engines, Image Generation Engines, and Image Query Engines), and additional components (Rulesets, Loaders, Artifacts, Chunkers, and Tokenizers). Griptape enables developers to create AI-powered applications with ease and efficiency.