
Awesome-Embodied-AI
A curated list of awesome papers on Embodied AI and related research/industry-driven resources.
Stars: 349

Awesome-Embodied-AI is a curated list of papers on Embodied AI and related resources, tracking and summarizing research and industrial progress in the field. It includes surveys, workshops, tutorials, talks, blogs, and papers covering various aspects of Embodied AI, such as vision-language navigation, large language model-based agents, robotics, and more. The repository welcomes contributions and aims to provide a comprehensive overview of the advancements in Embodied AI.
README:
A curated list of awesome papers on Embodied AI and related research/industry-driven resources, inspired by awesome-computer-vision.
Embodied AI has led to a new breakthrough, and this repository will keep tracking and summarizing the research or industrial progress.
- Contribution is highly welcome and feel free to submit a pull request or contact me.
If you find this repository helpful, please consider Stars ⭐ or Sharing ⬆️.
- CVPR-Workshop
- ICCV-Workshop
- CS539-OregonStateUniversity
- ChatGPT for Robotics: Design Principles and Model Abilities
Please do consider this fantastic paper : Agent AI: Surveying the Horizons of Multimodal Interaction
- Aligning Cyber Space with Physical World: A Comprehensive Survey on Embodied AI
- Vision-Language Navigation with Embodied Intelligence: A Survey
- The Rise and Potential of Large Language Model Based Agents: A Survey
- A Survey of Embodied AI: From Simulators to Research Tasks
- A Survey on LLM-based Autonomous Agents
- Mindstorms in Natural Language-Based Societies of Mind
- Lifelong Learning of Large Language Model based Agents: A Roadmap
- Data Interpreter: An LLM Agent For Data Science
- Communicative Agents for Software Development
- Exposing Limitations of Language Model Agents in Sequential-Task Compositions on the Web
- Experiential Co-Learning of Software-Developing Agents
- EASYTOOL: Enhancing LLM-based Agents with Concise Tool Instruction
- A survey of embodied ai: From simulators to research tasks
- Embodied AI in education: A review on the body, environment, and mind
- Agent ai: Surveying the horizons of multimodal interaction
- Learning to Generate Context-Sensitive Backchannel Smiles for Embodied AI Agents with Applications in Mental Health Dialogues
- Alexa arena: A user-centric interactive platform for embodied ai
- Artificial intelligence education for young children: A case study of technology‐enhanced embodied learning
- Embodiedgpt: Vision-language pre-training via embodied chain of thought
- Multimodal embodied interactive agent for cafe scene
- The Essential Role of Causality in Foundation World Models for Embodied AI
- A Survey on Robotics with Foundation Models: toward Embodied AI
- Where are we in the search for an artificial visual cortex for embodied intelligence?
- A Comprehensive Survey of Cross-Domain Policy Transfer for Embodied Agents
- The sense of agency in human–AI interactions
- " Don't forget to put the milk back!" Dataset for Enabling Embodied Agents to Detect Anomalous Situations
- Vision-Language Navigation with Embodied Intelligence: A Survey
- Hierarchical Auto-Organizing System for Open-Ended Multi-Agent Navigation
- Velma: Verbalization embodiment of llm agents for vision and language navigation in street view
- Spatially-Aware Transformer Memory for Embodied Agents
- VELMA: Verbalization Embodiment of LLM Agents for Vision and Language Navigation in Street View
- Embodied Human Activity Recognition
- LoTa-Bench: Benchmarking Language-oriented Task Planners for Embodied Agents
- EDGI: Equivariant diffusion for planning with embodied agents
- Large Multimodal Agents: A Survey
- Egocentric Planning for Scalable Embodied Task Achievement
- EnvGen: Generating and Adapting Environments via LLMs for Training Embodied Agents
- Human-agent teams in VR and the effects on trust calibration
- Talk with Ted: an embodied conversational agent for caregivers
- MOPA: Modular Object Navigation With PointGoal Agents
- Embodied Conversational Agents for Chronic Diseases: Scoping Review
- Towards anatomy education with generative AI-based virtual assistants in immersive virtual reality environments
- Improving Knowledge Extraction from LLMs for Task Learning through Agent Analysis
- A Survey on Large Language Model-Based Game Agents
- Autort: Embodied foundation models for large scale orchestration of robotic agents
- Towards Heterogeneous Multi-Agent Systems in Space
- Embodied Machine Learning
- Penetrative ai: Making llms comprehend the physical world
- WebVLN: Vision-and-Language Navigation on Websites
- Generating meaning: active inference and the scope and limits of passive AI
- RoboHive: A Unified Framework for Robot Learning
- Grounded Decoding: Guiding Text Generation with Grounded Models for Embodied Agents
- Turing Test in the Era of LLM
- Generative Models for Decision Making
-
AgentScope: A Flexible yet Robust Multi-Agent Platform
-
MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI
-
MuLan: Multimodal-LLM Agent for Progressive Multi-Object Diffusion
- Vision-Language Navigation with Embodied Intelligence: A Survey
- An Interactive Agent Foundation Model
-
UFO:A UI-Focused Agent for Windows OS Interaction
- Thanks to GT-RIPL's repository
- Thanks to Jacob Rintamaki's repository
- Thanks to Jiankai-Sun's repository
- Thanks to Yafei Hu's repository
- Thanks to Changan's repository
- Thanks to Rui's repository
- An Interactive Agent Foundation Model
-
AutoGen, EcoOptiGen
-
AgentTuning: Enabling Generalized Agent Abilities For LLMs
-
AgentBench: Evaluating LLMs as Agents
-
The Rise and Potential of Large Language Model Based Agents: A Survey
-
An Open-source Framework for Autonomous Language Agents
-
MetaGPT: Meta Programming for Multi-Agent Collaborative Framework
-
AgentVerse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors in Agents
-
ModelScope-Agent: Building Your Customizable Agent System with Open-source Large Language Models
-
Embodied Task Planning with Large Language Models
-
Building Cooperative Embodied Agents Modularly with Large Language Models
-
State-Maintaining Language Models for Embodied Reasoning
- Embodied Executable Policy Learning with Language-based Scene Summarization
-
Voyager: An Open-Ended Embodied Agent with Large Language Models
-
Simple Embodied Language Learning as a Byproduct of Meta-Reinforcement Learning
-
Vision-Language Tasks
- Exploring Large Language Models for Communication Games: An Empirical Study on Werewolf
-
Language Guided Generation of 3D Embodied AI Environments
-
CogAgent: Visual Expert for Pretrained Language Models
-
ProAgent: from Robotic Process Automation to Agentic Process Automation
-
Waymax: An accelerated simulator for autonomous driving research
- HOW FAR ARE LARGE LANGUAGE MODELS FROM AGENTS WITH THEORY-OF-MIND?
-
AgentBench: Evaluating LLMs as Agents
- MINDAGENT: EMERGENT GAMING INTERACTION
- Alexa, play with robot: Introducing the First Alexa Prize SimBot Challenge on Embodied AI
- Emergent Communication for Embodied Control
-
Simple but Effective: CLIP Embeddings for Embodied AI
- Embodied AI-Driven Operation of Smart Cities: A Concise Review
-
Modeling Dynamic Environments with Scene Graph Memory
-
An Open-source Framework for Autonomous Language Agents
-
MetaGPT: Meta Programming for Multi-Agent Collaborative Framework
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for Awesome-Embodied-AI
Similar Open Source Tools

Awesome-Embodied-AI
Awesome-Embodied-AI is a curated list of papers on Embodied AI and related resources, tracking and summarizing research and industrial progress in the field. It includes surveys, workshops, tutorials, talks, blogs, and papers covering various aspects of Embodied AI, such as vision-language navigation, large language model-based agents, robotics, and more. The repository welcomes contributions and aims to provide a comprehensive overview of the advancements in Embodied AI.

Awesome-Papers-Autonomous-Agent
Awesome-Papers-Autonomous-Agent is a curated collection of recent papers focusing on autonomous agents, specifically interested in RL-based agents and LLM-based agents. The repository aims to provide a comprehensive resource for researchers and practitioners interested in intelligent agents that can achieve goals, acquire knowledge, and continually improve. The collection includes papers on various topics such as instruction following, building agents based on world models, using language as knowledge, leveraging LLMs as a tool, generalization across tasks, continual learning, combining RL and LLM, transformer-based policies, trajectory to language, trajectory prediction, multimodal agents, training LLMs for generalization and adaptation, task-specific designing, multi-agent systems, experimental analysis, benchmarking, applications, algorithm design, and combining with RL.

rlhf_thinking_model
This repository is a collection of research notes and resources focusing on training large language models (LLMs) and Reinforcement Learning from Human Feedback (RLHF). It includes methodologies, techniques, and state-of-the-art approaches for optimizing preferences and model alignment in LLM training. The purpose is to serve as a reference for researchers and engineers interested in reinforcement learning, large language models, model alignment, and alternative RL-based methods.

awesome-gpt-prompt-engineering
Awesome GPT Prompt Engineering is a curated list of resources, tools, and shiny things for GPT prompt engineering. It includes roadmaps, guides, techniques, prompt collections, papers, books, communities, prompt generators, Auto-GPT related tools, prompt injection information, ChatGPT plug-ins, prompt engineering job offers, and AI links directories. The repository aims to provide a comprehensive guide for prompt engineering enthusiasts, covering various aspects of working with GPT models and improving communication with AI tools.

cheat-sheet-pdf
The Cheat-Sheet Collection for DevOps, Engineers, IT professionals, and more is a curated list of cheat sheets for various tools and technologies commonly used in the software development and IT industry. It includes cheat sheets for Nginx, Docker, Ansible, Python, Go (Golang), Git, Regular Expressions (Regex), PowerShell, VIM, Jenkins, CI/CD, Kubernetes, Linux, Redis, Slack, Puppet, Google Cloud Developer, AI, Neural Networks, Machine Learning, Deep Learning & Data Science, PostgreSQL, Ajax, AWS, Infrastructure as Code (IaC), System Design, and Cyber Security.

awesome-ai-coding
Awesome-AI-Coding is a curated list of AI coding topics, projects, datasets, LLM models, embedding models, papers, blogs, products, startups, and peer awesome lists related to artificial intelligence in coding. It includes tools for code completion, code generation, code documentation, and code search, as well as AI models and techniques for improving developer productivity. The repository also features information on various AI-powered developer tools, copilots, and related resources in the AI coding domain.

fastRAG
fastRAG is a research framework designed to build and explore efficient retrieval-augmented generative models. It incorporates state-of-the-art Large Language Models (LLMs) and Information Retrieval to empower researchers and developers with a comprehensive tool-set for advancing retrieval augmented generation. The framework is optimized for Intel hardware, customizable, and includes key features such as optimized RAG pipelines, efficient components, and RAG-efficient components like ColBERT and Fusion-in-Decoder (FiD). fastRAG supports various unique components and backends for running LLMs, making it a versatile tool for research and development in the field of retrieval-augmented generation.

awesome-khmer-language
Awesome Khmer Language is a comprehensive collection of resources for the Khmer language, including tools, datasets, research papers, projects/models, blogs/slides, and miscellaneous items. It covers a wide range of topics related to Khmer language processing, such as character normalization, word segmentation, part-of-speech tagging, optical character recognition, text-to-speech, and more. The repository aims to support the development of natural language processing applications for the Khmer language by providing a diverse set of resources and tools for researchers and developers.

AI-Bootcamp
The AI Bootcamp is a comprehensive training program focusing on real-world applications to equip individuals with the skills and knowledge needed to excel as AI engineers. The bootcamp covers topics such as Real-World PyTorch, Machine Learning Projects, Fine-tuning Tiny LLM, Deployment of LLM to Production, AI Agents with GPT-4 Turbo, CrewAI, Llama 3, and more. Participants will learn foundational skills in Python for AI, ML Pipelines, Large Language Models (LLMs), AI Agents, and work on projects like RagBase for private document chat.

machine-learning-research
The 'machine-learning-research' repository is a comprehensive collection of resources related to mathematics, machine learning, deep learning, artificial intelligence, data science, and various scientific fields. It includes materials such as courses, tutorials, books, podcasts, communities, online courses, papers, and dissertations. The repository covers topics ranging from fundamental math skills to advanced machine learning concepts, with a focus on applications in healthcare, genetics, computational biology, precision health, and AI in science. It serves as a valuable resource for individuals interested in learning and researching in the fields of machine learning and related disciplines.

awesome-flux-ai
Awesome Flux AI is a curated list of resources, tools, libraries, and applications related to Flux AI technology. It serves as a comprehensive collection for developers, researchers, and enthusiasts interested in Flux AI. The platform offers open-source text-to-image AI models developed by Black Forest Labs, aiming to advance generative deep learning models for media, creativity, efficiency, and diversity.

SoM-LLaVA
SoM-LLaVA is a new data source and learning paradigm for Multimodal LLMs, empowering open-source Multimodal LLMs with Set-of-Mark prompting and improved visual reasoning ability. The repository provides a new dataset that is complementary to existing training sources, enhancing multimodal LLMs with Set-of-Mark prompting and improved general capacity. By adding 30k SoM data to the visual instruction tuning stage of LLaVA, the tool achieves 1% to 6% relative improvements on all benchmarks. Users can train SoM-LLaVA via command line and utilize the implementation to annotate COCO images with SoM. Additionally, the tool can be loaded in Huggingface for further usage.

Open-Medical-Reasoning-Tasks
Open Life Science AI: Medical Reasoning Tasks is a collaborative hub for developing cutting-edge reasoning tasks for Large Language Models (LLMs) in the medical, healthcare, and clinical domains. The repository aims to advance AI capabilities in healthcare by fostering accurate diagnoses, personalized treatments, and improved patient outcomes. It offers a diverse range of medical reasoning challenges such as Diagnostic Reasoning, Treatment Planning, Medical Image Analysis, Clinical Data Interpretation, Patient History Analysis, Ethical Decision Making, Medical Literature Comprehension, and Drug Interaction Assessment. Contributors can join the community of healthcare professionals, AI researchers, and enthusiasts to contribute to the repository by creating new tasks or improvements following the provided guidelines. The repository also provides resources including a task list, evaluation metrics, medical AI papers, and healthcare datasets for training and evaluation.

llm-rag-vectordb-python
This repository provides sample applications and tutorials to showcase the power of Amazon Bedrock with Python. It helps Python developers understand how to harness Amazon Bedrock in building generative AI-enabled applications. The resources also demonstrate integration with vector databases using RAG (Retrieval-augmented generation) and services like Amazon Aurora, RDS, and OpenSearch. Additionally, it explores using langchain and streamlit to create effective experimental applications.

multi-agent-orchestrator
Multi-Agent Orchestrator is a flexible and powerful framework for managing multiple AI agents and handling complex conversations. It intelligently routes queries to the most suitable agent based on context and content, supports dual language implementation in Python and TypeScript, offers flexible agent responses, context management across agents, extensible architecture for customization, universal deployment options, and pre-built agents and classifiers. It is suitable for various applications, from simple chatbots to sophisticated AI systems, accommodating diverse requirements and scaling efficiently.

Streamline-Analyst
Streamline Analyst is a cutting-edge, open-source application powered by Large Language Models (LLMs) designed to revolutionize data analysis. This Data Analysis Agent effortlessly automates tasks such as data cleaning, preprocessing, and complex operations like identifying target objects, partitioning test sets, and selecting the best-fit models based on your data. With Streamline Analyst, results visualization and evaluation become seamless. It aims to expedite the data analysis process, making it accessible to all, regardless of their expertise in data analysis. The tool is built to empower users to process data and achieve high-quality visualizations with unparalleled efficiency, and to execute high-performance modeling with the best strategies. Future enhancements include Natural Language Processing (NLP), neural networks, and object detection utilizing YOLO, broadening its capabilities to meet diverse data analysis needs.
For similar tasks

genai-os
Kuwa GenAI OS is an open, free, secure, and privacy-focused Generative-AI Operating System. It provides a multi-lingual turnkey solution for GenAI development and deployment on Linux and Windows. Users can enjoy features such as concurrent multi-chat, quoting, full prompt-list import/export/share, and flexible orchestration of prompts, RAGs, bots, models, and hardware/GPUs. The system supports various environments from virtual hosts to cloud, and it is open source, allowing developers to contribute and customize according to their needs.

Neurite
Neurite is an innovative project that combines chaos theory and graph theory to create a digital interface that explores hidden patterns and connections for creative thinking. It offers a unique workspace blending fractals with mind mapping techniques, allowing users to navigate the Mandelbrot set in real-time. Nodes in Neurite represent various content types like text, images, videos, code, and AI agents, enabling users to create personalized microcosms of thoughts and inspirations. The tool supports synchronized knowledge management through bi-directional synchronization between mind-mapping and text-based hyperlinking. Neurite also features FractalGPT for modular conversation with AI, local AI capabilities for multi-agent chat networks, and a Neural API for executing code and sequencing animations. The project is actively developed with plans for deeper fractal zoom, advanced control over node placement, and experimental features.

fast-stable-diffusion
Fast-stable-diffusion is a project that offers notebooks for RunPod, Paperspace, and Colab Pro adaptations with AUTOMATIC1111 Webui and Dreambooth. It provides tools for running and implementing Dreambooth, a stable diffusion project. The project includes implementations by XavierXiao and is sponsored by Runpod, Paperspace, and Colab Pro.

big-AGI
big-AGI is an AI suite designed for professionals seeking function, form, simplicity, and speed. It offers best-in-class Chats, Beams, and Calls with AI personas, visualizations, coding, drawing, side-by-side chatting, and more, all wrapped in a polished UX. The tool is powered by the latest models from 12 vendors and open-source servers, providing users with advanced AI capabilities and a seamless user experience. With continuous updates and enhancements, big-AGI aims to stay ahead of the curve in the AI landscape, catering to the needs of both developers and AI enthusiasts.

generative-ai
This repository contains codes related to Generative AI as per YouTube video. It includes various notebooks and files for different days covering topics like map reduce, text to SQL, LLM parameters, tagging, and Kaggle competition. The repository also includes resources like PDF files and databases for different projects related to Generative AI.

Cradle
The Cradle project is a framework designed for General Computer Control (GCC), empowering foundation agents to excel in various computer tasks through strong reasoning abilities, self-improvement, and skill curation. It provides a standardized environment with minimal requirements, constantly evolving to support more games and software. The repository includes released versions, publications, and relevant assets.

azure-functions-openai-extension
Azure Functions OpenAI Extension is a project that adds support for OpenAI LLM (GPT-3.5-turbo, GPT-4) bindings in Azure Functions. It provides NuGet packages for various functionalities like text completions, chat completions, assistants, embeddings generators, and semantic search. The project requires .NET 6 SDK or greater, Azure Functions Core Tools v4.x, and specific settings in Azure Function or local settings for development. It offers features like text completions, chat completion, assistants with custom skills, embeddings generators for text relatedness, and semantic search using vector databases. The project also includes examples in C# and Python for different functionalities.

rubra
Rubra is a collection of open-weight large language models enhanced with tool-calling capability. It allows users to call user-defined external tools in a deterministic manner while reasoning and chatting, making it ideal for agentic use cases. The models are further post-trained to teach instruct-tuned models new skills and mitigate catastrophic forgetting. Rubra extends popular inferencing projects for easy use, enabling users to run the models easily.
For similar jobs

sweep
Sweep is an AI junior developer that turns bugs and feature requests into code changes. It automatically handles developer experience improvements like adding type hints and improving test coverage.

teams-ai
The Teams AI Library is a software development kit (SDK) that helps developers create bots that can interact with Teams and Microsoft 365 applications. It is built on top of the Bot Framework SDK and simplifies the process of developing bots that interact with Teams' artificial intelligence capabilities. The SDK is available for JavaScript/TypeScript, .NET, and Python.

ai-guide
This guide is dedicated to Large Language Models (LLMs) that you can run on your home computer. It assumes your PC is a lower-end, non-gaming setup.

classifai
Supercharge WordPress Content Workflows and Engagement with Artificial Intelligence. Tap into leading cloud-based services like OpenAI, Microsoft Azure AI, Google Gemini and IBM Watson to augment your WordPress-powered websites. Publish content faster while improving SEO performance and increasing audience engagement. ClassifAI integrates Artificial Intelligence and Machine Learning technologies to lighten your workload and eliminate tedious tasks, giving you more time to create original content that matters.

chatbot-ui
Chatbot UI is an open-source AI chat app that allows users to create and deploy their own AI chatbots. It is easy to use and can be customized to fit any need. Chatbot UI is perfect for businesses, developers, and anyone who wants to create a chatbot.

BricksLLM
BricksLLM is a cloud native AI gateway written in Go. Currently, it provides native support for OpenAI, Anthropic, Azure OpenAI and vLLM. BricksLLM aims to provide enterprise level infrastructure that can power any LLM production use cases. Here are some use cases for BricksLLM: * Set LLM usage limits for users on different pricing tiers * Track LLM usage on a per user and per organization basis * Block or redact requests containing PIIs * Improve LLM reliability with failovers, retries and caching * Distribute API keys with rate limits and cost limits for internal development/production use cases * Distribute API keys with rate limits and cost limits for students

uAgents
uAgents is a Python library developed by Fetch.ai that allows for the creation of autonomous AI agents. These agents can perform various tasks on a schedule or take action on various events. uAgents are easy to create and manage, and they are connected to a fast-growing network of other uAgents. They are also secure, with cryptographically secured messages and wallets.

griptape
Griptape is a modular Python framework for building AI-powered applications that securely connect to your enterprise data and APIs. It offers developers the ability to maintain control and flexibility at every step. Griptape's core components include Structures (Agents, Pipelines, and Workflows), Tasks, Tools, Memory (Conversation Memory, Task Memory, and Meta Memory), Drivers (Prompt and Embedding Drivers, Vector Store Drivers, Image Generation Drivers, Image Query Drivers, SQL Drivers, Web Scraper Drivers, and Conversation Memory Drivers), Engines (Query Engines, Extraction Engines, Summary Engines, Image Generation Engines, and Image Query Engines), and additional components (Rulesets, Loaders, Artifacts, Chunkers, and Tokenizers). Griptape enables developers to create AI-powered applications with ease and efficiency.