Embodied_AI_Paper_List

[Embodied-AI-Survey-2024] Awesome Paper list for Embodied AI and its related projects and applications

Stars: 143

Visit

README:

Paper list for Embodied AI

We appreciate any useful suggestions for improvement of this paper list or survey from peers. Please raise issues or send an email to [email protected] and [email protected]. Thanks for your cooperation!

Aligning Cyber Space with Physical World: A Comprehensive Survey on Embodied AI
Yang Liu, Weixing Chen, Yongjie Bai, Guanbin Li, Wen Gao, Liang Lin

💥 Update Log

[2024.07.22] We have updated the paper list and other useful embodied projects!
[2024.07.10] We release the first version of the survey on Embodied AI PDF!
[2024.07.10] We release the first version of the paper list for Embodied AI. This page is continually updating!

Books & Surveys 🔝

Multimodal Large Models: The New Paradigm of Artificial General Intelligence, Publishing House of Electronics Industry (PHE), 2024
Yang Liu, Liang Lin
[Page]
A Survey on Vision-Language-Action Models for Embodied AI, arXiv:2405.14093, 2024
Yueen Ma, Zixing Song, Yuzheng Zhuang, Jianye Hao, Irwin King
[Paper]
Towards Generalist Robot Learning from Internet Video: A Survey, arXiv:2404.19664, 2024
McCarthy, Robert, Daniel CH Tan, Dominik Schmidt, Fernando Acero, Nathan Herr, Yilun Du, Thomas G. Thuruthel, and Zhibin Li.
[Paper]
A Survey on Robotics with Foundation Models: toward Embodied AI, arXiv:2402.02385, 2024
Zhiyuan Xu, Kun Wu, Junjie Wen, Jinming Li, Ning Liu, Zhengping Che, and Jian Tang.
[Paper]
Toward general-purpose robots via foundation models: A survey and meta-analysis, arXiv:2312.08782, 2023
Yafei Hu, Quanting Xie, Vidhi Jain, Jonathan Francis, Jay Patrikar, Nikhil Keetha, Seungchan Kim et al.
[Paper]
A survey of embodied ai: From simulators to research tasks, IEEE Transactions on Emerging Topics in Computational Intelligence, 2022
Jiafei Duan, Samson Yu, Hui Li Tan, Hongyuan Zhu, Cheston Tan
[Paper]
The development of embodied cognition: Six lessons from babies, Artificial life, 2005
Linda Smith, Michael Gasser
[Paper]
Embodied artificial intelligence: Trends and challenges, Lecture notes in computer science, 2004
Rolf Pfeifer, Fumiya Iida
[Paper]

Embodied Simulators 🔝

General Simulator

Nvidia isaac sim: Robotics simulation and synthetic data, NVIDIA, 2023 [page]
Design and use paradigms for gazebo, an open-source multi-robot simulator, IROS, 2004 Koenig, Nathan, Andrew, Howard. [page]
Pybullet, a python module for physics simulation for games, robotics and machine learning, 2016 Coumans, Erwin, Yunfei, Bai.
Webots: open-source robot simulator Cyberbotics [page, code]
MuJoCo: A physics engine for model-based control, IROS, 2012 Todorov, Emanuel, Tom, Erez, Yuval, Tassa. [page, code]
Unity: A general platform for intelligent agents, ArXiv, 2020 Juliani, Arthur, Vincent-Pierre, Berges, Ervin, Teng, Andrew, Cohen, Jonathan, Harper, Chris, Elion, Chris, Goy, Yuan, Gao, Hunter, Henry, Marwan, Mattar, Danny, Lange. [page]
AirSim: High-Fidelity Visual and Physical Simulation for Autonomous Vehicles, Field and Service Robotics, 2017 Shital Shah, , Debadeepta Dey, Chris Lovett, Ashish Kapoor. [page]
Modular open robots simulation engine: Morse, ICRA, 2011 Echeverria, Gilberto and Lassabe, Nicolas and Degroote, Arnaud and Lemaignan, S{'e}verin [page]
V-REP: A versatile and scalable robot simulation framework, IROS, 2013 Rohmer, Eric, Surya PN, Singh, Marc, Freese. [page]

Real-Scene Based Simulators

ThreeDWorld: A Platform for Interactive Multi-Modal Physical Simulation, NeurIPS, 2021
Gan, Chuang, J., Schwartz, Seth, Alter, Martin, Schrimpf, James, Traer, JulianDe, Freitas, Jonas, Kubilius, Abhishek, Bhandwaldar, Nick, Haber, Megumi, Sano, Kuno, Kim, Elias, Wang, Damian, Mrowca, Michael, Lingelbach, Aidan, Curtis, KevinT., Feigelis, DavidM., Bear, Dan, Gutfreund, DavidD., Cox, JamesJ., DiCarlo, JoshH., McDermott, JoshuaB., Tenenbaum, Daniel, Yamins.
[page]
iGibson 1.0: A Simulation Environment for Interactive Tasks in Large Realistic Scenes, IROS, 2021
Shen, Bokui, Fei, Xia, Chengshu, Li, Roberto, Martín-Martín, Linxi, Fan, Guanzhi, Wang, Claudia, Pérez-D’Arpino, Shyamal, Buch, Sanjana, Srivastava, Lyne, Tchapmi, Micael, Tchapmi, Kent, Vainio, Josiah, Wong, Li, Fei-Fei, Silvio, Savarese.
[page]
SAPIEN: A SimulAted Part-Based Interactive ENvironment, CVPR, 2020
Xiang, Fanbo, Yuzhe, Qin, Kaichun, Mo, Yikuan, Xia, Hao, Zhu, Fangchen, Liu, Minghua, Liu, Hanxiao, Jiang, Yifu, Yuan, He, Wang, Li, Yi, Angel X., Chang, Leonidas J., Guibas, Hao, Su.
[page]
Habitat: A Platform for Embodied AI Research, ICCV, 2019
Savva, Manolis, Abhishek, Kadian, Oleksandr, Maksymets, Yili, Zhao, Erik, Wĳmans, Bhavana, Jain, Julian, Straub, Jia, Liu, Vladlen, Koltun, Jitendra, Malik, Devi, Parikh, Dhruv, Batra.
[page]
VirtualHome: Simulating Household Activities Via Programs, CVPR, 2018
Puig, Xavier, Kevin, Ra, Marko, Boben, Jiaman, Li, Tingwu, Wang, Sanja, Fidler, Antonio, Torralba.
[page]
Matterport3D: Learning from RGB-D Data in Indoor Environments, 3DV, 2017
Chang, Angel, Angela, Dai, Thomas, Funkhouser, Maciej, Halber, Matthias, Niebner, Manolis, Savva, Shuran, Song, Andy, Zeng, Yinda, Zhang.
[page]
AI2-THOR: An Interactive 3D Environment for Visual AI. arXiv, 2017
Kolve, Eric, Roozbeh, Mottaghi, Daniel, Gordon, Yuke, Zhu, Abhinav, Gupta, Ali, Farhadi.
[page]
ProcTHOR: Large-Scale Embodied AI Using Procedural Generation, NeurIPS, 2022
Deitke, VanderBilt, Herrasti, Weihs, Salvador, Ehsani, Han, Kolve, Farhadi, Kembhavi, Mottaghi
[page]
RoboGen: Towards Unleashing Infinite Data for Automated Robot Learning via Generative Simulation, arXiv, 2023
Wang, Yufei, Zhou, Xian, Feng, Chen, Tsun-Hsuan, Wang, Yian, Wang, Katerina, Fragkiadaki, Zackory, Erickson, David, Held, Chuang, Gan.
[page]
Holodeck: Language Guided Generation of 3D Embodied AI Environments, CVPR, 2024
Yue Yang, , Fan-Yun Sun, Luca Weihs, Eli VanderBilt, Alvaro Herrasti, Winson Han, Jiajun Wu, Nick Haber, Ranjay Krishna, Lingjie Liu, Chris Callison-Burch, Mark Yatskar, Aniruddha Kembhavi, Christopher Clark.
[page]
PhyScene: Physically Interactable 3D Scene Synthesis for Embodied AI, CVPR, 2024
Yang, Yandan, Baoxiong, Jia, Peiyuan, Zhi, Siyuan, Huang.
[page]

Embodied Perception 🔝

Active Visual Exploration

MonoSLAM: Real-time single camera SLAM, IEEE T-PAMI 29. 6(2007): 1052–1067
Davison, Andrew J, Ian D, Reid, Nicholas D, Molton, Olivier, Stasse.
[page]
A multi-state constraint Kalman filter for vision-aided inertial navigation, IROS, 2007
Mourikis, Anastasios I, Stergios I, Roumeliotis.
[page]
Parallel tracking and mapping for small AR workspaces, ISMAR, 2007
Klein, Georg, David, Murray.
[page]
ORB-SLAM: a versatile and accurate monocular SLAM system IEEE T-RO 31. 5(2015): 1147–1163
Mur-Artal, Raul, Jose Maria Martinez, Montiel, Juan D, Tardos.
[page]
DTAM: Dense tracking and mapping in real-time, ICCV, 2011
Newcombe, Richard A, Steven J, Lovegrove, Andrew J, Davison.
[page]
LSD-SLAM: Large-scale direct monocular SLAM, ECCV, 2014
Engel, Jakob, Thomas, Schops, Daniel, Cremers.
[page]
Slam++: Simultaneous localisation and mapping at the level of objects, CVPR, 2013
Salas-Moreno, Renato F, Richard A, Newcombe, Hauke, Strasdat, Paul HJ, Kelly, Andrew J, Davison.
[page]
Cubeslam: Monocular 3-d object slam, IEEE T-RO 35. 4(2019): 925–938
Yang, Shichao, Sebastian, Scherer.
[page]
Hierarchical topic model based object association for semantic SLAM, IEEE T-VCG 25. 11(2019): 3052–3062
Zhang, Jianhua, Mengping, Gui, Qichao, Wang, Ruyu, Liu, Junzhe, Xu, Shengyong, Chen.
[page]
Quadricslam: Dual quadrics from object detections as landmarks in object-oriented slam, IEEE Robotics and Automation Letters 4. 1(2018): 1–8.
Nicholson, Lachlan, Michael, Milford, Niko, Sünderhauf.
[page]
So-slam: Semantic object slam with scale proportional and symmetrical texture constraints. IEEE Robotics and Automation Letters 7. 2(2022): 4008–4015.
Liao, Ziwei, Yutong, Hu, Jiadong, Zhang, Xianyu, Qi, Xiaoyu, Zhang, Wei, Wang.
[page]
DS-SLAM: A semantic visual SLAM towards dynamic environments, IROS, 2018
Yu, Chao, Zuxin, Liu, Xin-Jun, Liu, Fugui, Xie, Yi, Yang, Qi, Wei, Qiao, Fei.
[page]
DynaSLAM: Tracking, mapping, and inpainting in dynamic scenes, IEEE Robotics and Automation Letters 3. 4(2018): 4076–4083
Bescos, Berta, José M, Facil, Javier, Civera, José, Neira.
[page]
SG-SLAM: A real-time RGB-D visual SLAM toward dynamic scenes with semantic and geometric information, IEEE Transactions on Instrumentation and Measurement 72. (2022): 1–12. Cheng, Shuhong, Changhe, Sun, Shĳun, Zhang, Dianfan, Zhang. [page]
OVD-SLAM: An online visual SLAM for dynamic environments, IEEE Sensors Journal, 2023. He, Jiaming, Mingrui, Li, Yangyang, Wang, Hongyu, Wang. [page]
Gs-slam: Dense visual slam with 3d gaussian splatting, CVPR, 2024. Yan, Chi, Delin, Qu, Dan, Xu, Bin, Zhao, Zhigang, Wang, Dong, Wang, Xuelong, Li. [page]
Multi-view 3d object detection network for autonomous driving, CVPR, 2017. Chen, Xiaozhi, Huimin, Ma, Ji, Wan, Bo, Li, Tian, Xia. [page]
Pointpillars: Fast encoders for object detection from point clouds, CVPR, 2019. Lang, Alex H, Sourabh, Vora, Holger, Caesar, Lubing, Zhou, Jiong, Yang, Oscar, Beijbom. [page]
Multi-view convolutional neural networks for 3d shape recognition, ICCV, 2015. Su, Hang, Subhransu, Maji, Evangelos, Kalogerakis, Erik, Learned-Miller. [page]
Voxnet: A 3d convolutional neural network for real-time object recognition, IROS, 2015. Maturana, Daniel, Sebastian, Scherer. [page]
Semantic scene completion from a single depth image, CVPR, 2017. Song, Shuran, Fisher, Yu, Andy, Zeng, Angel X, Chang, Manolis, Savva, Thomas, Funkhouser. [page]
4d spatio-temporal convnets: Minkowski convolutional neural networks, CVPR, 2019. Choy, Christopher, JunYoung, Gwak, Silvio, Savarese. [page]
3d semantic segmentation with submanifold sparse convolutional networks, CVPR, 2018. Graham, Benjamin, Martin, Engelcke, Laurens, Van Der Maaten. [page]
Embodiedscan: A holistic multi-modal 3d perception suite towards embodied ai CVPR, 2024. Wang, Tai, Xiaohan, Mao, Chenming, Zhu, Runsen, Xu, Ruiyuan, Lyu, Peisen, Li, Xiao, Chen, Wenwei, Zhang, Kai, Chen, Tianfan, Xue, others. [page]
Pointnet: Deep learning on point sets for 3d classification and segmentation, CVPR, 2017. Qi, Charles R, Hao, Su, Kaichun, Mo, Leonidas J, Guibas. [page]
Pointnet++: Deep hierarchical feature learning on point sets in a metric space, NeurIPS, 2017 Qi, Charles Ruizhongtai, Li, Yi, Hao, Su, Leonidas J, Guibas. [page]
Rethinking network design and local geometry in point cloud: A simple residual MLP framework, arXiv, 2022. Ma, Xu, Can, Qin, Haoxuan, You, Haoxi, Ran, Yun, Fu. [page]
Point transformer, ICCV, 2021. Zhao, Hengshuang, Li, Jiang, Jiaya, Jia, Philip HS, Torr, Vladlen, Koltun. [page]
Swin3d: A pretrained transformer backbone for 3d indoor scene understanding, arXiv, 2023. Yang, Yu-Qi, Yu-Xiao, Guo, Jian-Yu, Xiong, Yang, Liu, Hao, Pan, Peng-Shuai, Wang, Xin, Tong, Baining, Guo. [page]
Point transformer v2: Grouped vector attention and partition-based pooling NeurIPS, 2022 Wu, Xiaoyang, Yixing, Lao, Li, Jiang, Xihui, Liu, Hengshuang, Zhao. [page]
Point Transformer V3: Simpler Faster Stronger, CVPR, 2024. Wu, Xiaoyang, Li, Jiang, Peng-Shuai, Wang, Zhijian, Liu, Xihui, Liu, Yu, Qiao, Wanli, Ouyang, Tong, He, Hengshuang, Zhao. [page]
PointMamba: A Simple State Space Model for Point Cloud Analysis, arXiv, 2024. Liang, Dingkang, Xin, Zhou, Xinyu, Wang, Xingkui, Zhu, Wei, Xu, Zhikang, Zou, Xiaoqing, Ye, Xiang, Bai. [page]
Point Could Mamba: Point Cloud Learning via State Space Model, arXiv, 2024. Zhang, Tao, Xiangtai, Li, Haobo, Yuan, Shunping, Ji, Shuicheng, Yan. [page]
Mamba3d: Enhancing local features for 3d point cloud analysis via state space model arXiv, 2024. Han, Xu, Yuan, Tang, Zhaoxuan, Wang, Xianzhi, Li. [page]
The curious robot: Learning visual representations via physical interactions, ECCV, 2016. Pinto, Lerrel, Dhiraj, Gandhi, Yuanfeng, Han, Yong-Lae, Park, Abhinav, Gupta. [page]
Transferring implicit knowledge of non-visual object properties across heterogeneous robot morphologies, ICRA, 2023. Tatiya, Gyan, Jonathan, Francis, Jivko, Sinapov. [page]
Learning to look around: Intelligently exploring unseen environments for unknown tasks, CVPR, 2018. Jayaraman, Dinesh, Kristen, Grauman. [page]
Neu-nbv: Next best view planning using uncertainty estimation in image-based neural rendering, IROS, 2023. Jin, Liren, Xieyuanli, Chen, Julius, Rückin, Marija, Popovi'c. [page]
Off-policy evaluation with online adaptation for robot exploration in challenging environments, IEEE Robotics and Automation Letters, 2023. Hu, Yafei, Junyi, Geng, Chen, Wang, John, Keller, Sebastian, Scherer. [page]
Evidential Active Recognition: Intelligent and Prudent Open-World Embodied Perception, CVPR, 2024. Fan, Lei, Mingfu, Liang, Yunxuan, Li, Gang, Hua, Ying, Wu. [page]

3D Visual Grounding

ScanRefer: 3D Object Localization in RGB-D Scans using Natural Language, ECCV, 2020 Chen, Dave Zhenyu and Chang, Angel X and Nie{\ss}ner, Matthias [page]
ReferIt3D: Neural Listeners for Fine-Grained 3D Object Identification in Real-World Scenes, ECCV, 2020 Achlioptas, Panos and Abdelreheem, Ahmed and Xia, Fei and Elhoseiny, Mohamed and Guibas, Leonidas [page]
Text-guided graph neural networks for referring 3D instance segmentation, AAAI, 2021 Huang, Pin-Hao and Lee, Han-Hung and Chen, Hwann-Tzong and Liu, Tyng-Luh [page]
InstanceRefer: Cooperative Holistic Understanding for Visual Grounding on Point Clouds through Instance Multi-level Contextual Referring, ICCV, 2021 Yuan, Zhihao and Yan, Xu and Liao, Yinghong and Zhang, Ruimao and Wang, Sheng and Li, Zhen and Cui, Shuguang [page]
Free-form Description Guided 3D Visual Graph Network for Object Grounding in Point Cloud, CVPR, 2021 Feng, Mingtao and Li, Zhen and Li, Qi and Zhang, Liang and Zhang, XiangDong and Zhu, Guangming and Zhang, Hui and Wang, Yaonan and Mian, Ajmal [page]
SAT: 2D Semantics Assisted Training for 3D Visual Grounding, CVPR, 2021 Yang, Zhengyuan and Zhang, Songyang and Wang, Liwei and Luo, Jiebo [page]
LanguageRefer: Spatiallanguage model for 3D visual grounding, CVPR, 2021 Roh, Junha and Desingh, Karthik and Farhadi, Ali and Fox, Dieter [page]
3DVG-Transformer: Relation Modeling for Visual Grounding on Point Clouds, ICCV, 2021 Zhao, Lichen and Cai, Daigang and Sheng, Lu and Xu, Dong [page]
TransRefer3D: Entity-and-relation aware transformer for fine-grained 3D visual grounding, CVPR, 2021 He, Dailan and Zhao, Yusheng and Luo, Junyu and Hui, Tianrui and Huang, Shaofei and Zhang, Aixi and Liu, Si [page]
Multi-view transformer for 3D visual grounding, CVPR, 2022 Huang, Shijia and Chen, Yilun and Jia, Jiaya and Wang, Liwei [page]
Look Around and Refer: 2D Synthetic Semantics Knowledge Distillation for 3D Visual Grounding, CVPR, 2022 Bakr, Eslam and Alsaedy, Yasmeen and Elhoseiny, Mohamed [page]
LLM-Grounder: Open-Vocabulary 3D Visual Grounding with Large Language Model as an Agent, arXix, 2023 Yang, Jianing and Chen, Xuweiyi and Qian, Shengyi and Madaan, Nikhil and Iyengar, Madhavan and Fouhey, David F and Chai, Joyce [page]
Visual Programming for Zero-shot Open-Vocabulary 3D Visual Grounding, arXix, 2023 Yuan, Zhihao and Ren, Jinke and Feng, Chun-Mei and Zhao, Hengshuang and Cui, Shuguang and Li, Zhen [page]
3D-SPS: Single-Stage 3D Visual Grounding via Referred Point Progressive Selection, CVPR, 2022 Luo, Junyu and Fu, Jiahui and Kong, Xianghao and Gao, Chen and Ren, Haibing and Shen, Hao and Xia, Huaxia and Liu, Si [page]
Bottom Up Top Down Detection Transformers for Language Grounding in Images and Point Clouds, ECCV, 2022 Jain, Ayush and Gkanatsios, Nikolaos and Mediratta, Ishita and Fragkiadaki, Katerina [page]
EDA: Explicit Text-Decoupling and Dense Alignment for 3D Visual Grounding, CVPR, 2023 Wu, Yanmin and Cheng, Xinhua and Zhang, Renrui and Cheng, Zesen and Zhang, Jian [page]
3d-vista: Pre-trained transformer for 3d vision and text alignment, ICCV, 2023
Ziyu Zhu, Xiaojian Ma, Yixin Chen, Zhidong Deng, Siyuan Huang, and Qing Li
[page]
SQA3D: Situated Question Answering in 3D Scenes, ICLR, 2023
Xiaojian Ma, Silong Yong, Zilong Zheng, Qing Li, Yitao Liang, Song-Chun Zhu, and Siyuan Huang
[page])
LEO: An Embodied Generalist Agent in 3D World, ICML, 2024
Jiangyong Huang, Silong Yong, Xiaojian Ma, Xiongkun Linghu, Puhao Li, Yan Wang, Qing Li, Song-Chun Zhu, Baoxiong Jia, and Siyuan Huang
[page]
SceneVerse: Scaling 3D Vision-Language Learning for Grounded Scene Understanding, ECCV, 2024
Baoxiong Jia, Yixin Chen, Huangyue Yu, Yan Wang, Xuesong Niu, Tengyu Liu, Qing Li, and Siyuan Huang
[page]
PQ3D: Unifying 3D Vision-Language Understanding via Promptable Queries, ECCV, 2024
Ziyu Zhu, Zhuofan Zhang, Xiaojian Ma, Xuesong Niu, Yixin Chen, Baoxiong Jia, Zhidong Deng, Siyuan Huang, and Qing Li
[page]

Visual Language Navigation

Vision-and-Language Navigation: Interpreting Visually-Grounded Navigation Instructions in Real Environments, CVPR, 2018. Anderson, Peter, Qi, Wu, Damien, Teney, Jake, Bruce, Mark, Johnson, Niko, Sunderhauf, Ian, Reid, Stephen, Gould, Anton, Hengel. [page]
Stay on the Path: Instruction Fidelity in Vision-and-Language Navigation, ACL, 2019. Jain, Vihan, Gabriel, Magalhaes, Alexander, Ku, Ashish, Vaswani, Eugene, Ie, Jason, Baldridge. [page]
Beyond the Nav-Graph: Vision-and-Language Navigation in Continuous Environments, ECCV, 2020, Krantz, Jacob and Wijmans, Erik and Majumdar, Arjun and Batra, Dhruv and Lee, Stefan. [page]
TOUCHDOWN: Natural Language Navigation and Spatial Reasoning in Visual Street Environments, CVPR, 2019. Chen, Howard, Alane, Suhr, Dipendra, Misra, Noah, Snavely, Yoav, Artzi. [page]
REVERIE: Remote Embodied Visual Referring Expression in Real Indoor Environments, CVPR, 2020. Qi, Yuankai, Qi, Wu, Peter, Anderson, Xin, Wang, William Yang, Wang, Chunhua, Shen, Anton, Hengel. [page]
SOON: Scenario Oriented Object Navigation with Graph-based Exploration, CVPR, 2021. Zhu, Fengda, Xiwen, Liang, Yi, Zhu, Qizhi, Yu, Xiaojun, Chang, Xiaodan, Liang. [page]
Find What You Want: Learning Demand-conditioned Object Attribute Space for Demand-driven Navigation, NIPS, 2023. Wang, Chen, Li, Wu, Dong. [page]
ALFRED: A Benchmark for Interpreting Grounded Instructions for Everyday Tasks, CVPR, 2020. Shridhar, Mohit, Jesse, Thomason, Daniel, Gordon, Yonatan, Bisk, Winson, Han, Roozbeh, Mottaghi, Luke, Zettlemoyer, Dieter, Fox. [page]
HomeRobot: Open-Vocabulary Mobile Manipulation, NIPS, 2023. Yenamandra, Sriram, Arun, Ramachandran, Karmesh, Yadav, Austin, Wang, Mukul, Khanna, Theophile, Gervet, Tsung-Yen, Yang, Vidhi, Jain, AlexanderWilliam, Clegg, John, Turner, Zsolt, Kira, Manolis, Savva, Angel, Chang, DevendraSingh, Chaplot, Dhruv, Batra, Roozbeh, Mottaghi, Yonatan, Bisk, Chris, Paxton. [page]
Behavior-1k: A benchmark for embodied ai with 1,000 everyday activities and realistic simulation, Conference on Robot Learning. 2023. Li, Chengshu, Ruohan, Zhang, Josiah, Wong, Cem, Gokmen, Sanjana, Srivastava, Roberto, Mart\in-Mart'\in, Chen, Wang, Gabrael, Levine, Michael, Lingelbach, Jiankai, Sun, others. [page]
Vision-and-dialog navigation, Conference on Robot Learning. 2020. Thomason, Jesse, Michael, Murray, Maya, Cakmak, Luke, Zettlemoyer. [page]
DialFRED: Dialogue-Enabled Agents for Embodied Instruction Following, arXiv, 2022. Gao, Xiaofeng, Qiaozi, Gao, Ran, Gong, Kaixiang, Lin, Govind, Thattai, GauravS., Sukhatme. [page]
Language and visual entity relationship graph for agent navigation, NeurIPS, 2020. Hong, Yicong, Cristian, Rodriguez, Yuankai, Qi, Qi, Wu, Stephen, Gould. [page]
Language-Guided Navigation via Cross-Modal Grounding and Alternate Adversarial Learning, IEEE T-CSVT 31. (2020): 3469-3481. Weixia Zhang, , Chao Ma, Qi Wu, Xiaokang Yang. [page]
Vision-Language Navigation Policy Learning and Adaptation, IEEE T-PAMI 43. 12(2021): 4205-4216. Wang, Xin, Qiuyuan, Huang, Asli, Celikyilmaz, Jianfeng, Gao, Dinghan, Shen, Yuan-Fang, Wang, William Yang, Wang, Lei, Zhang. [page]
FILM: Following Instructions in Language with Modular Methods, ICLR, 2022. So Yeon Min, , Devendra Singh Chaplot, Pradeep Kumar Ravikumar, Yonatan Bisk, Ruslan Salakhutdinov. [page]
LM-Nav: Robotic Navigation with Large Pre-Trained Models of Language, Vision, and Action, Conference on Robot Learning. 2022. Dhruv Shah, , Blazej Osinski, Brian Ichter, Sergey Levine. [page]
HOP: History-and-Order Aware Pretraining for Vision-and-Language Navigation, CVPR, 2022. Qiao, Yanyuan, Yuankai, Qi, Yicong, Hong, Zheng, Yu, Peng, Wang, Qi, Wu. [page]
Towards Learning a Generalist Model for Embodied Navigation, CVPR, 2024. Duo Zheng, , Shijia Huang, Lin Zhao, Yiwu Zhong, Liwei Wang. [page]
Fast-Slow Test-time Adaptation for Online Vision-and-Language Navigation ICML, 2024. Junyu Gao, , Xuan Yao, Changsheng Xu. [page]
Discuss before moving: Visual language navigation via multi-expert discussions, ICRA, 2024. Long, Yuxing, Xiaoqi, Li, Wenzhe, Cai, Hao, Dong. [page]
Vision-and-Language Navigation via Causal Learning, CVPR, 2024. Liuyi Wang, Qijun Chen. [page]
Volumetric Environment Representation for Vision-Language Navigation, CVPR, 2024. Rui Liu, Yi Yang. [page]
NaVid: Video-based VLM Plans the Next Step for Vision-and-Language Navigation, ArXiv, 2024. Jiazhao Zhang, , Kunyu Wang, Rongtao Xu, Gengze Zhou, Yicong Hong, Xiaomeng Fang, Qi Wu, Zhizheng Zhang, Wang He. [page]
Look Before You Leap: Bridging Model-Free and Model-Based Reinforcement Learning for Planned-Ahead Vision-and-Language Navigation, ECCV, 2018. Xin Eric Wang, , Wenhan Xiong, Hongmin Wang, William Yang Wang. [page]
Neighbor-view enhanced model for vision and language navigation, MM, 2021. An, Dong, Yuankai, Qi, Yan, Huang, Qi, Wu, Liang, Wang, Tieniu, Tan. [page]
Bridging the Gap Between Learning in Discrete and Continuous Environments for Vision-and-Language Navigation, CVPR, 2022. Hong, Yicong, Zun, Wang, Qi, Wu, Stephen, Gould. [page]
March in Chat: Interactive Prompting for Remote Embodied Referring Expression, ICCV, 2023. Qiao, Yanyuan, Yuankai, Qi, Zheng, Yu, Jing, Liu, Qi, Wu. [page]
Lookahead Exploration with Neural Radiance Representation for Continuous Vision-Language Navigation, CVPR 2024. Wang, Zihan, Xiangyang, Li, Jiahao, Yang, Yeqi, Liu, Junjie, Hu, Ming, Jiang, Shuqiang, Jiang. [page]
ETPNav: Evolving Topological Planning for Vision-Language Navigation in Continuous Environments, IEEE T-PAMI, 2024. An, Dong, Hanqing, Wang, Wenguan, Wang, Zun, Wang, Yan, Huang, Keji, He, Liang, Wang. [page]
Multi-level compositional reasoning for interactive instruction following, AAAI, 2023. Bhambri, Suvaansh, Byeonghwi, Kim, Jonghyun, Choi. [page]
Vision and Language Navigation in the Real World via Online Visual Language Mapping, ArXiv, 2023. Chengguang Xu, , Hieu T. Nguyen, Christopher Amato, Lawson L.S. Wong. [page]
Embodied Instruction Following in Unknown Environments, ArXiv, 2024. Wu, Wang, Xu, Lu, Yan. [page]
Bridging zero-shot object navigation and foundation models through pixel-guided navigation skill ICRA, 2024.
Wenzhe Cai, Siyuan Huang, Guangran Cheng, Yuxing Long, Peng Gao, Changyin Sun, and Hao Dong.
[page]

Non-Visual Perception: Tactile

Learning visuotactile skills with two multifingered hands, ArXiv, 2024. Lin, Toru and Zhang, Yu and Li, Qiyang and Qi, Haozhi and Yi, Brent and Levine, Sergey and Malik, Jitendra. [page]
Binding touch to everything: Learning unified multimodal tactile representations, CVPR, 2024. Yang, Fengyu and Feng, Chao and Chen, Ziyang and Park, Hyoungseob and Wang, Daniel and Dou, Yiming and Zeng, Ziyao and Chen, Xien and Gangopadhyay, Rit and Owens, Andrew and others. [page]
Give Me a Sign: Using Data Gloves for Static Hand-Shape Recognition, Sensors, 2023. Achenbach, Philipp and Laux, Sebastian and Purdack, Dennis and Müller, Philipp Niklas and Göbel, Stefan. [page]
Semantics-aware adaptive knowledge distillation for sensor-to-vision action recognition, IEEE Transactions on Image Processing, 2021. Liu, Yang and Wang, Keze and Li, Guanbin and Lin, Liang. [page]
Hand movements: A window into haptic object recognition, Cognitive psychology, 1987. Lederman, Susan J and Klatzky, Roberta L. [page]
Force and tactile sensing, Springer Handbook of Robotics, 2016. Cutkosky, Mark R and Howe, Robert D and Provancher, William R. [page]
Haptic perception: A tutorial, Attention, Perception, & Psychophysics, 2009. Lederman, Susan J and Klatzky, Roberta L. [page]
Flexible tactile sensing based on piezoresistive composites: A review, Sensors, 2014. Stassi, Stefano and Cauda, Valentina and Canavese, Giancarlo and Pirri, Candido Fabrizio. [page]
Tactile sensing in dexterous robot hands, Robotics and Autonomous Systems, 2015. Kappassov, Zhanat and Corrales, Juan-Antonio and Perdereau, Véronique. [page]
Bioinspired sensors and applications in intelligent robots: a review, Robotic Intelligence and Automation, 2024. Zhou, Yanmin and Yan, Zheng and Yang, Ye and Wang, Zhipeng and Lu, Ping and Yuan, Philip F and He, Bin. [page]
Sensing tactile microvibrations with the BioTac—Comparison with human sensitivity, BioRob, 2012. Fishel, Jeremy A and Loeb, Gerald E. [page]
Simulation of the SynTouch BioTac sensor, Intelligent Autonomous Systems 15: Proceedings of the 15th International Conference IAS-15, 2019. Ruppel, Philipp and Jonetzko, Yannick and Görner, Michael and Hendrich, Norman and Zhang, Jianwei. [page]
GelSight: High-Resolution Robot Tactile Sensors for Estimating Geometry and Force, Sensors, 2017. Yuan, Wenzhen and Dong, Siyuan and Adelson, Edward H. [page]
Gelslim 3.0: High-resolution measurement of shape, force and slip in a compact tactile-sensing finger, ICRA, 2022. Taylor, Ian H and Dong, Siyuan and Rodriguez, Alberto. [page]
DIGIT: A Novel Design for a Low-Cost Compact High-Resolution Tactile Sensor With Application to In-Hand Manipulation, IEEE Robotics and Automation Letters, 2020. Lambeta, Mike and Chou, Po-Wei and Tian, Stephen and Yang, Brian and Maloon, Benjamin and Most, Victoria Rose and Stroud, Dave and Santos, Raymond and Byagowi, Ahmad and Kammerer, Gregg and Jayaraman, Dinesh and Calandra, Roberto. [page]
9dtact: A compact vision-based tactile sensor for accurate 3D shape reconstruction and generalizable 6D force estimation, IEEE Robotics and Automation Letters, 2023. Lin, Changyi and Zhang, Han and Xu, Jikai and Wu, Lei and Xu, Huazhe. [page]
The tactip family: Soft optical tactile sensors with 3D-printed biomimetic morphologies, Soft Robotics, 2018. Ward-Cherrier, Benjamin and Pestell, Nicholas and Cramphorn, Luke and Winstone, Benjamin and Giannaccini, Maria Elena and Rossiter, Jonathan and Lepora, Nathan F. [page]
GelTip: A finger-shaped optical tactile sensor for robotic manipulation, IROS, 2020. Gomes, Daniel Fernandes and Lin, Zhonglin and Luo, Shan. [page]
Allsight: A low-cost and high-resolution round tactile sensor with zero-shot learning capability, IEEE Robotics and Automation Letters, 2023. Azulay, Osher and Curtis, Nimrod and Sokolovsky, Rotem and Levitski, Guy and Slomovik, Daniel and Lilling, Guy and Sintov, Avishai. [page]
Tacto: A fast, flexible, and open-source simulator for high-resolution vision-based tactile sensors, IEEE Robotics and Automation Letters, 2022. Wang, Shaoxiong and Lambeta, Mike and Chou, Po-Wei and Calandra, Roberto. [page]
Taxim: An example-based simulation model for GelSight tactile sensors, IEEE Robotics and Automation Letters, 2022. Si, Zilin and Yuan, Wenzhen. [page]
Vistac towards a unified multi-modal sensing finger for robotic manipulation, IEEE Sensors Journal, 2023. Athar, Sheeraz and Patel, Gaurav and Xu, Zhengtong and Qiu, Qiang and She, Yu. [page]
GelLink: A Compact Multi-phalanx Finger with Vision-based Tactile Sensing and Proprioception, arXiv, 2024. Ma, Yuxiang and Adelson, Edward. [page]
Sensing tactile microvibrations with the BioTac—Comparison with human sensitivity, BioRob, 2012. Fishel, Jeremy A and Loeb, Gerald E. [page]
The feeling of success: Does touch sensing help predict grasp outcomes?, arXiv, 2017. Calandra, Roberto and Owens, Andrew and Upadhyaya, Manu and Yuan, Wenzhen and Lin, Justin and Adelson, Edward H and Levine, Sergey. [page]
Robust learning of tactile force estimation through robot interaction, ICRA, 2019. Sundaralingam, Balakumar and Lambert, Alexander Sasha and Handa, Ankur and Boots, Byron and Hermans, Tucker and Birchfield, Stan and Ratliff, Nathan and Fox, Dieter. [page]
Objectfolder: A dataset of objects with implicit visual, auditory, and tactile representations, arXiv, 2021. Gao, Ruohan and Chang, Yen-Yu and Mall, Shivani and Fei-Fei, Li and Wu, Jiajun. [page]
Objectfolder 2.0: A multisensory object dataset for sim2real transfer, CVPR, 2022. Gao, Ruohan and Si, Zilin and Chang, Yen-Yu and Clarke, Samuel and Bohg, Jeannette and Fei-Fei, Li and Yuan, Wenzhen and Wu, Jiajun. [page]
Self-supervised visuo-tactile pretraining to locate and follow garment features, ArXiv, 2022. Kerr, Justin and Huang, Huang and Wilcox, Albert and Hoque, Ryan and Ichnowski, Jeffrey and Calandra, Roberto and Goldberg, Ken. [page]
Midastouch: Monte-carlo inference over distributions across sliding touch, CoRL, 2023. Suresh, Sudharshan and Si, Zilin and Anderson, Stuart and Kaess, Michael and Mukadam, Mustafa. [page]
The objectfolder benchmark: Multisensory learning with neural and real objects, CVPR, 2023. Gao, Ruohan and Dou, Yiming and Li, Hao and Agarwal, Tanmay and Bohg, Jeannette and Li, Yunzhu and Fei-Fei, Li and Wu, Jiajun. [page]
A Touch, Vision, and Language Dataset for Multimodal Alignment, ArXiv, 2024. Fu, Letian and Datta, Gaurav and Huang, Huang and Panitch, William Chung-Ho and Drake, Jaimyn and Ortiz, Joseph and Mukadam, Mustafa and Lambeta, Mike and Calandra, Roberto and Goldberg, Ken. [page]
Learning transferable visual models from natural language supervision, International Conference on Machine Learning, 2021. Radford, Alec and Kim, Jong Wook and Hallacy, Chris and Ramesh, Aditya and Goh, Gabriel and Agarwal, Sandhini and Sastry, Girish and Askell, Amanda and Mishkin, Pamela and Clark, Jack and others. [page]
Imagebind: One embedding space to bind them all, CVPR, 2023. Girdhar, Rohit and El-Nouby, Alaaeldin and Liu, Zhuang and Singh, Mannat and Alwala, Kalyan Vasudev and Joulin, Armand and Misra, Ishan. [page]
Improved GelSight tactile sensor for measuring geometry and slip, IROS, 2017. Dong, Siyuan and Yuan, Wenzhen and Adelson, Edward H. [page]
GelSight: High-resolution robot tactile sensors for estimating geometry and force, Sensors, vol. 17, no. 12, pp. 2762, 2017. Yuan, Wenzhen and Dong, Siyuan and Adelson, Edward H. [page]
Digit: A novel design for a low-cost compact high-resolution tactile sensor with application to in-hand manipulation, IEEE Robotics and Automation Letters, 2020. Lambeta, Mike and Chou, Po-Wei and Tian, Stephen and Yang, Brian and Maloon, Benjamin and Most, Victoria Rose and Stroud, Dave and Santos, Raymond and Byagowi, Ahmad and Kammerer, Gregg and others. [page]
From pixels to percepts: Highly robust edge perception and contour following using deep learning and an optical biomimetic tactile sensor, IEEE Robotics and Automation Letters, 2019. Lepora, Nathan F and Church, Alex and De Kerckhove, Conrad and Hadsell, Raia and Lloyd, John. [page]
Deep tactile experience: Estimating tactile sensor output from depth sensor data, IROS, 2020. Patel, Karankumar and Iba, Soshi and Jamali, Nawid. [page]
Touching a nerf: Leveraging neural radiance fields for tactile sensory data generation, Conference on Robot Learning, pp. 1618-1628, 2023. Zhong, Shaohong and Albini, Alessandro and Jones, Oiwi Parker and Maiolino, Perla and Posner, Ingmar. [page]
Learning to read braille: Bridging the tactile reality gap with diffusion models, ArXiv, 2023. Higuera, Carolina and Boots, Byron and Mukadam, Mustafa. [page]
Generating visual scenes from touch, CVPR, 2023. Yang, Fengyu and Zhang, Jiacheng and Owens, Andrew. [page]
GelSight wedge: Measuring high-resolution 3D contact geometry with a compact robot finger, ICRA, 2021. Wang, Shaoxiong and She, Yu and Romero, Branden and Adelson, Edward. [page]
Dtact: A vision-based tactile sensor that measures high-resolution 3D geometry directly from darkness, ICRA, 2023. Lin, Changyi and Lin, Ziqi and Wang, Shaoxiong and Xu, Huazhe. [page]
3D shape perception from monocular vision, touch, and shape priors, IROS, 2018. Wang, Shaoxiong and Wu, Jiajun and Sun, Xingyuan and Yuan, Wenzhen and Freeman, William T and Tenenbaum, Joshua B and Adelson, Edward H. [page]
Tactile mapping and localization from high-resolution tactile imprints, ICRA, 2019. Bauza, Maria and Canal, Oleguer and Rodriguez, Alberto. [page]
Tactile object pose estimation from the first touch with geometric contact rendering, CoRL, 2021. Villalonga, Maria Bauza and Rodriguez, Alberto and Lim, Bryan and Valls, Eric and Sechopoulos, Theo. [page]
Visuotactile 6D pose estimation of an in-hand object using vision and tactile sensor data, IEEE Robotics and Automation Letters, 2022. Dikhale, Snehal and Patel, Karankumar and Dhingra, Daksh and Naramura, Itoshi and Hayashi, Akinobu and Iba, Soshi and Jamali, Nawid. [page]
In-hand pose estimation using hand-mounted RGB cameras and visuotactile sensors, IEEE Access, 2023. Gao, Yuan and Matsuoka, Shogo and Wan, Weiwei and Kiyokawa, Takuya and Koyama, Keisuke and Harada, Kensuke. [page]
Collision-aware in-hand 6D object pose estimation using multiple vision-based tactile sensors, ICRA, 2023. Caddeo, Gabriele M and Piga, Nicola A and Bottarel, Fabrizio and Natale, Lorenzo. [page]
Shapemap 3-D: Efficient shape mapping through dense touch and vision, ICRA, 2022. Suresh, Sudharshan and Si, Zilin and Mangelson, Joshua G and Yuan, Wenzhen and Kaess, Michael. [page]
3D shape reconstruction from vision and touch, NeurIPS, 2020. Smith, Edward and Calandra, Roberto and Romero, Adriana and Gkioxari, Georgia and Meger, David and Malik, Jitendra and Drozdzal, Michal. [page]
Active 3D shape reconstruction from vision and touch, NeurIPS, 2021. Smith, Edward and Meger, David and Pineda, Luis and Calandra, Roberto and Malik, Jitendra and Romero Soriano, Adriana and Drozdzal, Michal. [page]
Implicit neural representation for 3D shape reconstruction using vision-based tactile sensing, ArXiv, 2023. Comi, Mauro and Church, Alex and Li, Kejie and Aitchison, Laurence and Lepora, Nathan F. [page]
Large-scale actionless video pre-training via discrete diffusion for efficient policy learning, ArXiv, 2024. He, Haoran and Bai, Chenjia and Pan, Ling and Zhang, Weinan and Zhao, Bin and Li, Xuelong. [page]
Sliding touch-based exploration for modeling unknown object shape with multi-fingered hands, IROS, 2023. Chen, Yiting and Tekden, Ahmet Ercan and Deisenroth, Marc Peter and Bekiroglu, Yasemin. [page]
Snap-it, Tap-it, Splat-it: Tactile-Informed 3D Gaussian Splatting for Reconstructing Challenging Surfaces, ArXiv, 2024. Comi, Mauro and Tonioni, Alessio and Yang, Max and Tremblay, Jonathan and Blukis, Valts and Lin, Yijiong and Lepora, Nathan F and Aitchison, Laurence. [page]
Tactile-augmented radiance fields, CVPR, 2024. Dou, Yiming and Yang, Fengyu and Liu, Yi and Loquercio, Antonio and Owens, Andrew. [page]
Visuotactile-rl: Learning multimodal manipulation policies with deep reinforcement learning, ICRA, 2022. Hansen, Johanna and Hogan, Francois and Rivkin, Dmitriy and Meger, David and Jenkin, Michael and Dudek, Gregory. [page]
General In-hand Object Rotation with Vision and Touch, CoRL, 2023. Qi, Haozhi and Yi, Brent and Suresh, Sudharshan and Lambeta, Mike and Ma, Yi and Calandra, Roberto and Malik, Jitendra. [page]
Sim-to-Real Model-Based and Model-Free Deep Reinforcement Learning for Tactile Pushing, IEEE Robotics and Automation Letters, 2023. Yang, Max and Lin, Yijiong and Church, Alex and Lloyd, John and Zhang, Dandan and Barton, David AW and Lepora, Nathan F. [page]
AnyRotate: Gravity-Invariant In-Hand Object Rotation with Sim-to-Real Touch, ArXiv, 2024. Yang, Max and Lu, Chenghua and Church, Alex and Lin, Yijiong and Ford, Chris and Li, Haoran and Psomopoulou, Efi and Barton, David AW and Lepora, Nathan F. [page]
Unsupervised adversarial domain adaptation for sim-to-real transfer of tactile images, IEEE Transactions on Instrumentation and Measurement, 2023. Jing, Xingshuo and Qian, Kun and Jianu, Tudor and Luo, Shan. [page]
Feature-level Sim2Real Regression of Tactile Images for Robot Manipulation, ICRA ViTac, 2024. Duan, Boyi and Qian, Kun and Zhao, Yongqiang and Zhang, Dongyuan and Luo, Shan. [page]
Tactile gym 2.0: Sim-to-real deep reinforcement learning for comparing low-cost high-resolution robot touch, IEEE Robotics and Automation Letters, 2022. Lin, Yijiong and Lloyd, John and Church, Alex and Lepora, Nathan F. [page]
Convolutional autoencoder for feature extraction in tactile sensing, IEEE Robotics and Automation Letters, 2019. Polic, Marsela and Krajacic, Ivona and Lepora, Nathan and Orsag, Matko. [page]
Supervised autoencoder joint learning on heterogeneous tactile sensory data: Improving material classification performance, IROS, 2020. Gao, Ruihan and Taunyazov, Tasbolat and Lin, Zhiping and Wu, Yan. [page]
Learn from incomplete tactile data: Tactile representation learning with masked autoencoders, IROS, 2023. Cao, Guanqun and Jiang, Jiaqi and Bollegala, Danushka and Luo, Shan. [page]
MAE4GM: Visuo-Tactile Learning for Property Estimation of Granular Material using Multimodal Autoencoder,ICRA ViTac, 2024. Zhang, Zeqing and Zheng, Guangze and Ji, Xuebo and Chen, Guanqi and Jia, Ruixing and Chen, Wentao and Chen, Guanhua and Zhang, Liangjun and Pan, Jia. [page]
Connecting look and feel: Associating the visual and tactile properties of physical materials, CVPR, 2017. Yuan, Wenzhen and Wang, Shaoxiong and Dong, Siyuan and Adelson, Edward. [page]
Making sense of vision and touch: Learning multimodal representations for contact-rich tasks, IEEE Transactions on Robotics, 2020. Lee, Michelle A and Zhu, Yuke and Zachares, Peter and Tan, Matthew and Srinivasan, Krishnan and Savarese, Silvio and Fei-Fei, Li and Garg, Animesh and Bohg, Jeannette. [page]
Learning to identify object instances by touch: Tactile recognition via multimodal matching, ICRA, 2019. Lin, Justin and Calandra, Roberto and Levine, Sergey. [page]
Touch and go: Learning from human-collected vision and touch, ArXiv, 2022. Yang, Fengyu and Ma, Chenyang and Zhang, Jiacheng and Zhu, Jing and Yuan, Wenzhen and Owens, Andrew. [page]
Dexterity from touch: Self-supervised pre-training of tactile representations with robotic play, ArXiv, 2023. Guzey, Irmak and Evans, Ben and Chintala, Soumith and Pinto, Lerrel. [page]
Octopi: Object Property Reasoning with Large Tactile-Language Models, arXiv preprint arXiv:2405.02794, 2024. Yu, Samson and Lin, Kelvin and Xiao, Anxing and Duan, Jiafei and Soh, Harold. [page]
Learning efficient haptic shape exploration with a rigid tactile sensor array, PloS One, 2020. Fleer, Sascha and Moringen, Alexandra and Klatzky, Roberta L and Ritter, Helge. [page]
Interpreting and predicting tactile signals via a physics-based and data-driven framework, ArXiv, 2020. Narang, Yashraj S and Van Wyk, Karl and Mousavian, Arsalan and Fox, Dieter. [page]
Interpreting and predicting tactile signals for the syntouch biotac, The International Journal of Robotics Research, 2021. Narang, Yashraj S and Sundaralingam, Balakumar and Van Wyk, Karl and Mousavian, Arsalan and Fox, Dieter. [page]
Stable reinforcement learning with autoencoders for tactile and visual data, IROS, 2016. Van Hoof, Herke and Chen, Nutan and Karl, Maximilian and van der Smagt, Patrick and Peters, Jan. [page]
Fast texture classification using tactile neural coding and spiking neural network, IROS, 2020. Taunyazov, Tasbolat and Chua, Yansong and Gao, Ruihan and Soh, Harold and Wu, Yan. [page]
When Vision Meets Touch: A Contemporary Review for Visuotactile Sensors from the Signal Processing Perspective, Arxiv, 2024. Li, Shoujie and Wang, Zihan and Wu, Changsheng and Li, Xiang and Luo, Shan and Fang, Bin and Sun, Fuchun and Zhang, Xiao-Ping and Ding, Wenbo. [page]

Embodied Interaction 🔝

Embodied Question Answering, CVPR, 2018 Das, Abhishek and Datta, Samyak and Gkioxari, Georgia and Lee, Stefan and Parikh, Devi and Batra, Dhruv [page]
Multi-Target Embodied Question Answering, CVPR, 2019 Yu, Licheng and Chen, Xinlei and Gkioxari, Georgia and Bansal, Mohit and Berg, Tamara L and Batra, Dhruv [page]
Embodied Question Answering in Photorealistic Environments with Point Cloud Perception, CVPR, 2019 Wijmans, Erik and Datta, Samyak and Maksymets, Oleksandr and Das, Abhishek and Gkioxari, Georgia and Lee, Stefan and Essa, Irfan and Parikh, Devi and Batra, Dhruv [page]
IQA: Visual Question Answering in Interactive Environments, CVPR, 2018 Gordon, Daniel and Kembhavi, Aniruddha and Rastegari, Mohammad and Redmon, Joseph and Fox, Dieter and Farhadi, Ali [page]
VideoNavQA: Bridging the Gap between Visual and Embodied Question Answering, BMVC, 2019 Cangea, C{\u{a}}t{\u{a}}lina and Belilovsky, Eugene and Li{`o}, Pietro and Courville, Aaron [page]
Knowledge-based Embodied Question Answering, TPAMI, 2023 Tan, Sinan and Ge, Mengmeng and Guo, Di and Liu, Huaping and Sun, Fuchun [page]
OpenEQA: Embodied Question Answering in the Era of Foundation Models, CVPR, 2024 Majumdar, Arjun and Ajay, Anurag and Zhang, Xiaohan and Putta, Pranav and Yenamandra, Sriram and Henaff, Mikael and Silwal, Sneha and Mcvay, Paul and Maksymets, Oleksandr and Arnaud, Sergio and others [page]
Explore until Confident: Efficient Exploration for Embodied Question Answering, ICRA Workshop VLMNM, 2024 Ren, Allen Z and Clark, Jaden and Dixit, Anushri and Itkina, Masha and Majumdar, Anirudha and Sadigh, Dorsa [page]
S-EQA: Tackling Situational Queries in Embodied Question Answering, arXix, 2024 Dorbala, Vishnu Sashank and Goyal, Prasoon and Piramuthu, Robinson and Johnston, Michael and Manocha, Dinesh and Ghanadhan, Reza [page]
Building Generalizable Agents with a Realistic and Rich 3D Environment, ECCV, 2018 Wu, Yi and Wu, Yuxin and Gkioxari, Georgia and Tian, Yuandong [page]
MINOS: Multimodal Indoor Simulator for Navigation in Complex Environments, ECCV, 2018 Savva, Manolis and Chang, Angel X and Dosovitskiy, Alexey and Funkhouser, Thomas and Koltun, Vladlen [page]
Matterport3D: Learning from rgb-d data in indoor environments,, IEEE International Conference on 3D Vision, 2017 Chang, Angel and Dai, Angela and Funkhouser, Thomas and Halber, Maciej and Niessner, Matthias and Savva, Manolis and Song, Shuran and Zeng, Andy and Zhang, Yinda [page]
ScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes, CVPR, 2017 Dai, Angela and Chang, Angel X and Savva, Manolis and Halber, Maciej and Funkhouser, Thomas and Nie{\ss}ner, Matthias [page]
Habitat-Matterport 3D Dataset (HM3D): 1000 Large-scale 3D Environments for Embodied AI, NeurIPS, 2021 Ramakrishnan, Santhosh K and Gokaslan, Aaron and Wijmans, Erik and Maksymets, Oleksandr and Clegg, Alex and Turner, John and Undersander, Eric and Galuba, Wojciech and Westbury, Andrew and Chang, Angel X and others [page]
Neural Modular Control for Embodied Question Answering, ECCV, 2018 Das, Abhishek and Gkioxari, Georgia and Lee, Stefan and Parikh, Devi and Batra, Dhruv [page]
Revisiting EmbodiedQA: A Simple Baseline and Beyond, IEEE Transactions on Image Processing, 2020 Wu, Yu and Jiang, Lu and Yang, Yi [page]
Multi-agent Embodied Question Answering in Interactive Environments, ECCV, 2020 Tan, Sinan and Xiang, Weilai and Liu, Huaping and Guo, Di and Sun, Fuchun [page]
A frontier-based approach for autonomous exploration, CIRA, 1997 Yamauchi, Brian [page]
Map-based Modular Approach for Zero-shot Embodied Question Answering, arXiv, 2024 Sakamoto, Koya and Azuma, Daichi and Miyanishi, Taiki and Kurita, Shuhei and Kawanabe, Motoaki [page]
Embodied Question Answering via Multi-LLM Systems, arXiv, 2024 Bhrij Patel and Vishnu Sashank Dorbala and Amrit Singh Bedi [page]
Language Models are Few-Shot Learners, NIPS, 2020 Brown, Tom and Mann, Benjamin and Ryder, Nick and Subbiah, Melanie and Kaplan, Jared D and Dhariwal, Prafulla and Neelakantan, Arvind and Shyam, Pranav and Sastry, Girish and Askell, Amanda and others [page]
Deep Learning Approaches to Grasp Synthesis: A Review, IEEE Transactions on Robotics, 2023 Newbury, Rhys and Gu, Morris and Chumbley, Lachlan and Mousavian, Arsalan and Eppner, Clemens and Leitner, J{"u}rgen and Bohg, Jeannette and Morales, Antonio and Asfour, Tamim and Kragic, Danica and others [page]
End-to-end Trainable Deep Neural Network for Robotic Grasp Detection and Semantic Segmentation from RGB, ICRA, 2021 Ainetter, Stefan and Fraundorfer, Friedrich [page]
Jacquard: A Large Scale Dataset for Robotic Grasp Detection, IROS, 2018 Depierre, Amaury and Dellandr{'e}a, Emmanuel and Chen, Liming [page]
Efficient grasping from RGBD images: Learning using a new rectangle representation, IEEE International Conference on Robotics and Automation, 2011 Jiang, Yun and Moseson, Stephen and Saxena, Ashutosh [page]
Shape Completion Enabled Robotic Grasping, IROS, 2017 Varley, Jacob and DeChant, Chad and Richardson, Adam and Ruales, Joaqu{'\i}n and Allen, Peter [page]
GraspNet-1Billion: A Large-Scale Benchmark for General Object Grasping, CVPR, 2020 Fang, Hao-Shu and Wang, Chenxi and Gou, Minghao and Lu, Cewu [page]
ACRONYM: A Large-Scale Grasp Dataset Based on Simulation, ICRA, 2021 Eppner, Clemens and Mousavian, Arsalan and Fox, Dieter [page]
6-DOF GraspNet: Variational Grasp Generation for Object Manipulation, ICCV, 2019 Mousavian, Arsalan and Eppner, Clemens and Fox, Dieter [page]
MultiGripperGrasp: A Dataset for Robotic Grasping from Parallel Jaw Grippers to Dexterous Hands, arXiv, 2024 Murrilo, Luis Felipe Casas and Khargonkar, Ninad and Prabhakaran, Balakrishnan and Xiang, Yu [page]
Language-guided Robot Grasping: CLIP-based Referring Grasp Synthesis in Clutter, CoRL, 2023 Tziafas, Georgios and Xu, Yucheng and Goel, Arushi and Kasaei, Mohammadreza and Li, Zhibin and Kasaei, Hamidreza [page]
Reasoning Grasping via Multimodal Large Language Model, arXiv, 2024 Jin, Shiyu and Xu, Jinxuan and Lei, Yutian and Zhang, Liangjun [page]
Reasoning Tuning Grasp: Adapting Multi-Modal Large Language Models for Robotic Grasping, CoRL, 2023 Xu, Jinxuan and Jin, Shiyu and Lei, Yutian and Zhang, Yuqian and Zhang, Liangjun [page]
SemGrasp: Semantic Grasp Generation via Language Aligned Discretization, CoRR, 2024 Li, Kailin and Wang, Jingbo and Yang, Lixin and Lu, Cewu and Dai, Bo [page]
CLIPort: What and Where Pathways for Robotic Manipulation, CoRL, 2022 Shridhar, Mohit and Manuelli, Lucas and Fox, Dieter [page]
Distilled Feature Fields Enable Few-Shot Language-Guided Manipulation, CoRL, 2023 Shen, William and Yang, Ge and Yu, Alan and Wong, Jansen and Kaelbling, Leslie Pack and Isola, Phillip [page]
GaussianGrasper: 3D Language Gaussian Splatting for Open-vocabulary Robotic Grasping, arXiv, 2024 Zheng, Yuhang and Chen, Xiangyu and Zheng, Yupeng and Gu, Songen and Yang, Runyi and Jin, Bu and Li, Pengfei and Zhong, Chengliang and Wang, Zengmao and Liu, Lina and others [page]
AnyGrasp: Robust and Efficient Grasp Perception in Spatial and Temporal Domains, IEEE Transactions on Robotics, 2023 Fang, Hao-Shu and Wang, Chenxi and Fang, Hongjie and Gou, Minghao and Liu, Jirong and Yan, Hengxu and Liu, Wenhai and Xie, Yichen and Lu, Cewu [page]

Embodied Agent 🔝

Rt-1: Robotics transformer for real-world control at scale, ArXiv, 2022. Brohan, Anthony, Noah, Brown, Justice, Carbajal, Yevgen, Chebotar, Joseph, Dabis, Chelsea, Finn, Keerthana, Gopalakrishnan, Karol, Hausman, Alex, Herzog, Jasmine, Hsu, others. [page]
Do as i can, not as i say: Grounding language in robotic affordances, Conference on robot learning. 2023. Brohan, Anthony, Yevgen, Chebotar, Chelsea, Finn, Karol, Hausman, Alexander, Herzog, Daniel, Ho, Julian, Ibarz, Alex, Irpan, Eric, Jang, Ryan, Julian. [page]
Q-transformer: Scalable offline reinforcement learning via autoregressive q-functions, Conference on Robot Learning. 2023. Chebotar, Yevgen, Quan, Vuong, Karol, Hausman, Fei, Xia, Yao, Lu, Alex, Irpan, Aviral, Kumar, Tianhe, Yu, Alexander, Herzog, Karl, Pertsch, others. [page]
Palm-e: An embodied multimodal language model, ArXiv, 2023. Driess, Danny, Fei, Xia, Mehdi SM, Sajjadi, Corey, Lynch, Aakanksha, Chowdhery, Brian, Ichter, Ayzaan, Wahid, Jonathan, Tompson, Quan, Vuong, Tianhe, Yu, others. [page]
Rt-2: Vision-language-action models transfer web knowledge to robotic control, Conference on Robot Learning. 2023. Zitkovich, Brianna, Tianhe, Yu, Sichun, Xu, Peng, Xu, Ted, Xiao, Fei, Xia, Jialin, Wu, Paul, Wohlhart, Stefan, Welker, Ayzaan, Wahid, others. [page]
Rt-h: Action hierarchies using language, ArXiv, 2024. Belkhale, Suneel, Tianli, Ding, Ted, Xiao, Pierre, Sermanet, Quon, Vuong, Jonathan, Tompson, Yevgen, Chebotar, Debidatta, Dwibedi, Dorsa, Sadigh. [page]
Open x-embodiment: Robotic learning datasets and rt-x models, arXiv preprint arXiv:2310.08864. (2023). Padalkar, , others. [page]
Embodiedgpt: Vision-language pre-training via embodied chain of thought, NeurIPS, 2024 Mu, Yao, Qinglong, Zhang, Mengkang, Hu, Wenhai, Wang, Mingyu, Ding, Jun, Jin, Bin, Wang, Jifeng, Dai, Yu, Qiao, Ping, Luo. [page]
Vision-language foundation models as effective robot imitators, ArXiv, 2023. Li, Xinghang, Minghuan, Liu, Hanbo, Zhang, Cunjun, Yu, Jie, Xu, Hongtao, Wu, Chilam, Cheang, Ya, Jing, Weinan, Zhang, Huaping, Liu, others. [page]
Autort: Embodied foundation models for large scale orchestration of robotic agents, arXiv preprint arXiv:2401.12963. (2024). Ahn, Michael, Debidatta, Dwibedi, Chelsea, Finn, Montse Gonzalez, Arenas, Keerthana, Gopalakrishnan, Karol, Hausman, Brian, Ichter, Alex, Irpan, Nikhil, Joshi, Ryan, Julian, others. [page]
Sara-rt: Scaling up robotics transformers with self-adaptive robust attention, ArXiv, 2023. Leal, Isabel, Krzysztof, Choromanski, Deepali, Jain, Avinava, Dubey, Jake, Varley, Michael, Ryoo, Yao, Lu, Frederick, Liu, Vikas, Sindhwani, Quan, Vuong, others. [page]
RoboMamba: Multimodal State Space Model for Efficient Robot Reasoning and Manipulation, ArXiv, 2024. Liu, Jiaming, Mengzhen, Liu, Zhenyu, Wang, Lily, Lee, Kaichen, Zhou, Pengju, An, Senqiao, Yang, Renrui, Zhang, Yandong, Guo, Shanghang, Zhang. [page]
Rt-h: Action hierarchies using language, ArXiv, 2024. Belkhale, Suneel, Tianli, Ding, Ted, Xiao, Pierre, Sermanet, Quon, Vuong, Jonathan, Tompson, Yevgen, Chebotar, Debidatta, Dwibedi, Dorsa, Sadigh. [page]
Cliport: What and where pathways for robotic manipulation, Conference on robot learning. 2022. Shridhar, Mohit, Lucas, Manuelli, Dieter, Fox. [page]
Strips: A new approach to the application of theorem proving to problem solving, Artificial Intelligence 2. 3(1971): 189-208. Richard E. Fikes, , Nils J. Nilsson. [page]
PDDL-the planning domain definition language, Technical Report. 1998. Drew McDermott, , Malik Ghallab, Adele E. Howe, Craig A. Knoblock, Ashwin Ram, Manuela M. Veloso, Daniel S. Weld, David E. Wilkins. [page]
The Monte Carlo method, Journal of the American Statistical Association 44 247. (1949): 335-41 . Nicholas C. Metropolis, , S. M. Ulam. [page]
A Formal Basis for the Heuristic Determination of Minimum Cost Paths, IEEE Trans. Syst. Sci. Cybern. 4. (1968): 100-107. Peter E. Hart, , Nils J. Nilsson, Bertram Raphael. [page]
Language models as zero-shot planners: Extracting actionable knowledge for embodied agents, ICML, 2022. Huang, Wenlong, Pieter, Abbeel, Deepak, Pathak, Igor, Mordatch. [page]
Inner Monologue: Embodied Reasoning through Planning with Language Models, Conference on Robot Learning, 2022. Huang, Wenlong, Fei, Xia, Ted, Xiao, Harris, Chan, Jacky, Liang, Pete, Florence, Andy, Zeng, Jonathan, Tompson, Igor, Mordatch, Yevgen, Chebotar, Pierre, Sermanet, Noah, Brown, Tomas, Jackson, Linda, Luu, Sergey, Levine, Karol, Hausman, Brian, Ichter. [page]
Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents, ICML, 2022. Huang, Wenlong, Pieter, Abbeel, Deepak, Pathak, Igor, Mordatch. [page]
Lota-bench: Benchmarking language-oriented task planners for embodied agents, ArXiv, 2024. Choi, Jae-Woo, Youngwoo, Yoon, Hyobin, Ong, Jaehong, Kim, Minsu, Jang. [page]
LLM-Planner: Few-Shot Grounded Planning for Embodied Agents with Large Language Models, ICCV, 2023 Chan Hee Song, , Jiaman Wu, Clay Washington, Brian M. Sadler, Wei-Lun Chao, Yu Su. [page]
Open-Ended Instructable Embodied Agents with Memory-Augmented Large Language Models EMNLP, 2023. Sarch, Gabriel, Yue, Wu, Michael J., Tarr, Katerina, Fragkiadaki. [page]
Voyager: An Open-Ended Embodied Agent with Large Language Models, (2023). Wang, Guanzhi, Yuqi, Xie, Yunfan, Jiang, Ajay, Mandlekar, Chaowei, Xiao, Yuke, Zhu, Linxi, Fan, Anima, Anandkumar. [page]
Skill Induction and Planning with Latent Language, ACL, 2021. Pratyusha Sharma, , Antonio Torralba, Jacob Andreas. [page]
ReAct: Synergizing Reasoning and Acting in Language Models (2023). Yao, Shunyu, Jeffrey, Zhao, Dian, Yu, Nan, Du, Izhak, Shafran, Karthik, Narasimhan, Yuan, Cao. [page]
ProgPrompt: Generating Situated Robot Task Plans Using Large Language Models (2022). Singh, Ishika, Valts, Blukis, Arsalan, Mousavian, Ankit, Goyal, Danfei, Xu, Jonathan, Tremblay, Dieter, Fox, Jesse, Thomason, Animesh, Garg. [page]
ChatGPT for Robotics: Design Principles and Model Abilities, IEEE Access 12. (2023): 55682-55696. Sai Vemprala, , Rogerio Bonatti, Arthur Fender C. Bucker, Ashish Kapoor. [page]
Code as Policies: Language Model Programs for Embodied Control, ICRA, 2023, Jacky Liang, , Wenlong Huang, F. Xia, Peng Xu, Karol Hausman, Brian Ichter, Peter R. Florence, Andy Zeng. [page]
Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language (2022). Zeng, Andy, Maria, Attarian, Brian, Ichter, Krzysztof, Choromanski, Adrian, Wong, Stefan, Welker, Federico, Tombari, Aveek, Purohit, Michael, Ryoo, Vikas, Sindhwani, Johnny, Lee, Vincent, Vanhoucke, Pete, Florence. [page]
Socratic Planner: Inquiry-Based Zero-Shot Planning for Embodied Instruction Following (2024). Suyeon Shin, , Sujin jeon, Junghyun Kim, Gi-Cheon Kang, Byoung-Tak Zhang. [page]
The Monte Carlo method, Journal of the American Statistical Association 44 247. (1949): 335-41 . Nicholas C. Metropolis, , S. M. Ulam. [page]
Large Language Models as Commonsense Knowledge for Large-Scale Task Planning (2023). Zhao, Zirui, Wee Sun, Lee, David, Hsu. [page]
Reasoning with Language Model Is Planning with World Model (2023). Hao, Shibo, Yi, Gu, Haodi, Ma, Joshua Jiahua, Hong, Zhen, Wang, Daisy Zhe, Wang, Zhiting, Hu. [page]
LGMCTS: Language-Guided Monte-Carlo Tree Search for Executable Semantic Object Rearrangement, arXiv, 2023. Haonan Chang, , Kai Gao, Kowndinya Boyalakuntla, Alex Lee, Baichuan Huang, Harish Udhaya Kumar, Jinjin Yu, Abdeslam Boularias. [page]
Translating Natural Language to Planning Goals with Large-Language Models, arXiv, 2023. Xie, Yaqi, Chen, Yu, Tongyao, Zhu, Jinbin, Bai, Ze, Gong, Harold, Soh. [page]
LLM+P: Empowering Large Language Models with Optimal Planning Proficiency, arXiv, 2023. Liu, Bo, Yuqian, Jiang, Xiaohan, Zhang, Qiang, Liu, Shiqi, Zhang, Joydeep, Biswas, Peter, Stone. [page]
Generalized Planning in PDDL Domains with Pretrained Large Language Models, arXiv, 2023. Silver, Tom, Soham, Dan, Kavitha, Srinivas, Joshua B., Tenenbaum, Leslie Pack, Kaelbling, Michael, Katz. [page]
Dynamic Planning with a LLM, arXiv, 2023. Dagan, Gautier, Frank, Keller, Alex, Lascarides. [page]
Towards Efficient LLM Grounding for Embodied Multi-Agent Collaboration arXiv, 2024. Zhang, Yang, Shixin, Yang, Chenjia, Bai, Fei, Wu, Xiu, Li, Xuelong, Li, Zhen, Wang. [page]
Embodied Task Planning with Large Language Models, arXiv, 2023. Wu, Zhenyu, Ziwei, Wang, Xiuwei, Xu, Jiwen, Lu, Haibin, Yan. [page]
SayPlan: Grounding Large Language Models using 3D Scene Graphs for Scalable Task Planning, Conference on Robot Learning. 2023. Krishan Rana, , Jesse Haviland, Sourav Garg, Jad Abou-Chakra, Ian D. Reid, Niko Sunderhauf. [page]
ConceptGraphs: Open-Vocabulary 3D Scene Graphs for Perception and Planning, ArXiv, 2023. Qiao Gu, , Ali Kuwajerwala, Sacha Morin, Krishna Murthy Jatavallabhula, Bipasha Sen, Aditya Agarwal, Corban Rivera, William Paul, Kirsty Ellis, Ramalingam Chellappa, Chuang Gan, Celso Miguel de Melo, Joshua B Tenenbaum, Antonio Torralba, Florian Shkurti, Liam Paull. [page]
RoboGPT: an intelligent agent of making embodied long-term decisions for daily instruction tasks, arXiv, 2023. Yaran Chen, , Wenbo Cui, Yuanwen Chen, Mining Tan, Xinyao Zhang, Dong Zhao, He Wang. [page]
Embodied Instruction Following in Unknown Environments, arXiv, 2024. Zhenyu Wu, , Ziwei Wang, Xiuwei Xu, Jiwen Lu, Haibin Yan. [page]
Chat with the Environment: Interactive Multimodal Perception Using Large Language Models (2023). Zhao, Xufeng, Mengdi, Li, Cornelius, Weber, Muhammad Burhan, Hafez, Stefan, Wermter. [page]
Video Language Planning, (2023). Du, Yilun, Mengjiao, Yang, Pete, Florence, Fei, Xia, Ayzaan, Wahid, Brian, Ichter, Pierre, Sermanet, Tianhe, Yu, Pieter, Abbeel, Joshua B., Tenenbaum, Leslie, Kaelbling, Andy, Zeng, Jonathan, Tompson. [page]
Code as Policies: Language Model Programs for Embodied Control, ICRA, 2023, Jacky Liang, , Wenlong Huang, F. Xia, Peng Xu, Karol Hausman, Brian Ichter, Peter R. Florence, Andy Zeng. [page]
Reflexion: an autonomous agent with dynamic memory and self-reflection, ArXiv, 2023. Noah Shinn, , Beck Labash, A. Gopinath. [page]
Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agents ArXiv, 2023. Zihao Wang, , Shaofei Cai, Anji Liu, Xiaojian Ma, Yitao Liang. [page]
Instruct2Act: Mapping Multi-modality Instructions to Robotic Actions with Large Language Model ArXiv, 2023.
Siyuan Huang, Zhengkai Jiang, Hao Dong, Yu Qiao, Peng Gao, and Hongsheng Li.
[page]
ManipVQA: Injecting Robotic Affordance and Physically Grounded Information into Multi-Modal Large Language Models IROS, 2024.
Siyuan Huang, Iaroslav Ponomarenko, Zhengkai Jiang, Xiaoqi Li, Xiaobin Hu, Peng Gao, Hongsheng Li, and Hao Dong.
[page]
A3VLM: Actionable Articulation-Aware Vision Language Model ArXiv, 2024.
Siyuan Huang, Haonan Chang, Yuhan Liu, Yimeng Zhu, Hao Dong, Peng Gao, Abdeslam Boularias, and Hongsheng Li.
[page]

Sim-to-Real Adaptation 🔝

World Models, NIPS, 2018 Ha, David and Schmidhuber, J{"u}rgen [page]
Pandora: Towards General World Model with Natural Language Actions and Video States, arXiv, 2024 Xiang, Jiannan and Liu, Guangyi and Gu, Yi and Gao, Qiyue and Ning, Yuting and Zha, Yuheng and Feng, Zeyu and Tao, Tianhua and Hao, Shibo and Shi, Yemin and others [page]
3D-VLA: A 3D Vision-Language-Action Generative World Model, ICML, 2024 Zhen, Haoyu and Qiu, Xiaowen and Chen, Peihao and Yang, Jincheng and Yan, Xin and Du, Yilun and Hong, Yining and Gan, Chuang [page]
Diffusion World Model: Future Modeling Beyond Step-by-Step Rollout for Offline Reinforcement Learning, arXiv, 2024 Ding, Zihan and Zhang, Amy and Tian, Yuandong and Zheng, Qinqing [page]
MC-JEPA: A Joint-Embedding Predictive Architecture for Self-Supervised Learning of Motion and Content Features, ICLR, 2024 Bardes, Adrien and Ponce, Jean and LeCun, Yann [page]
A-JEPA: Joint-Embedding Predictive Architecture Can Listen, arXiv, 2023 Fei, Zhengcong and Fan, Mingyuan and Huang, Junshi [page]
Learning and Leveraging World Models in Visual Representation Learning, arXiv, 2024 Garrido, Quentin and Assran, Mahmoud and Ballas, Nicolas and Bardes, Adrien and Najman, Laurent and LeCun, Yann [page]
iVideoGPT: Interactive VideoGPTs are Scalable World Models, arXiv, 2024 Wu, Jialong and Yin, Shaofeng and Feng, Ningya and He, Xu and Li, Dong and Hao, Jianye and Long, Mingsheng [page]
Spatiotemporal Predictive Pre-training for Robotic Motor Control, arXiv, 2024 Yang, Jiange and Liu, Bei and Fu, Jianlong and Pan, Bocheng and Wu, Gangshan and Wang, Limin [page]
MuDreamer: Learning Predictive World Models without Reconstruction, ICLR, 2024 Burchi, Maxime and Timofte, Radu [page]
From Word Models to World Models: Translating from Natural Language to the Probabilistic Language of Thought, arXiv, 2024 Wong, Lionel and Grand, Gabriel and Lew, Alexander K and Goodman, Noah D and Mansinghka, Vikash K and Andreas, Jacob and Tenenbaum, Joshua B [page]
ElastoGen: 4D Generative Elastodynamics, arXiv, 2024 Feng, Yutao and Shang, Yintong and Feng, Xiang and Lan, Lei and Zhe, Shandian and Shao, Tianjia and Wu, Hongzhi and Zhou, Kun and Su, Hao and Jiang, Chenfanfu and others [page]
One-2-3-45: Any Single Image to 3D Mesh in 45 Seconds without Per-Shape Optimization, NeurIPS, 2023 Liu, Minghua and Xu, Chao and Jin, Haian and Chen, Linghao and Varma T, Mukund and Xu, Zexiang and Su, Hao [page]
LEGENT: Open Platform for Embodied Agents, arXiv, 2024 Cheng, Zhili and Wang, Zhitong and Hu, Jinyi and Hu, Shengding and Liu, An and Tu, Yuge and Li, Pengkai and Shi, Lei and Liu, Zhiyuan and Sun, Maosong [page]
Point-JEPA: A Joint Embedding Predictive Architecture for Self-Supervised Learning on Point Cloud, arXiv, 2024 Saito, Ayumu and Poovvancheri, Jiju [page]
A Path Towards Autonomous Machine Intelligence Version 0.9.2, 2022-06-27, Open Review, 2022 Yann LeCun [page]
Introduction to Latent Variable Energy-Based Models: A Path Towards Autonomous Machine Intelligence, arXiv, 2023 Dawid, Anna and LeCun, Yann [page]
Real2Sim2Real: Self-Supervised Learning of Physical Single-Step Dynamic Actions for Planar Robot Casting, ICRA, 2022 Lim, Vincent and Huang, Huang and Chen, Lawrence Yunliang and Wang, Jonathan and Ichnowski, Jeffrey and Seita, Daniel and Laskey, Michael and Goldberg, Ken [page]
Universal Manipulation Interface: In-The-Wild Robot Teaching Without In-The-Wild Robots, arXiv， 2024 Chi, Cheng and Xu, Zhenjia and Pan, Chuer and Cousineau, Eric and Burchfiel, Benjamin and Feng, Siyuan and Tedrake, Russ and Song, Shuran [page]
Mobile ALOHA: Learning Bimanual Mobile Manipulation with Low-Cost Whole-Body Teleoperation, arXiv, 2024 Fu, Zipeng and Zhao, Tony Z and Finn, Chelsea [page]
Human-Agent Joint Learning for Efficient Robot Manipulation Skill Acquisition, arXiv, 2024 Luo, Shengcheng and Peng, Quanquan and Lv, Jun and Hong, Kaiwen and Driggs-Campbell, Katherine Rose and Lu, Cewu and Li, Yong-Lu [page]
Transporter Networks: Rearranging the Visual World for Robotic Manipulation, CoRL, 2021 Zeng, Andy and Florence, Pete and Tompson, Jonathan and Welker, Stefan and Chien, Jonathan and Attarian, Maria and Armstrong, Travis and Krasin, Ivan and Duong, Dan and Sindhwani, Vikas and others [page]
GAPartNet: Cross-Category Domain-Generalizable Object Perception and Manipulation via Generalizable and Actionable Parts, CVPR, 2023 Geng, Haoran and Xu, Helin and Zhao, Chengyang and Xu, Chao and Yi, Li and Huang, Siyuan and Wang, He [page]
Reconciling Reality through Simulation: A Real-to-Sim-to-Real Approach for Robust Manipulation, arXiv, 2024 Torne, Marcel and Simeonov, Anthony and Li, Zechu and Chan, April and Chen, Tao and Gupta, Abhishek and Agrawal, Pulkit [page]
TRANSIC: Sim-to-Real Policy Transfer by Learning from Online Correction, arXiv, 2024 Jiang, Yunfan and Wang, Chen and Zhang, Ruohan and Wu, Jiajun and Fei-Fei, Li [page]
Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World, IROS, 2017 Tobin, Josh and Fong, Rachel and Ray, Alex and Schneider, Jonas and Zaremba, Wojciech and Abbeel, Pieter [page]
Learning Dexterous In-Hand Manipulation, The International Journal of Robotics Research, 2020 Andrychowicz, OpenAI: Marcin and Baker, Bowen and Chociej, Maciek and Jozefowicz, Rafal and McGrew, Bob and Pachocki, Jakub and Petron, Arthur and Plappert, Matthias and Powell, Glenn and Ray, Alex and others [page]
Sim-to-Real Reinforcement Learning for Deformable Object Manipulation, CoRL, 2018 Matas, Jan and James, Stephen and Davison, Andrew J [page]
Sim2Real Transfer for Reinforcement Learning without Dynamics Randomization, IROS, 2020 Kaspar, Manuel and Osorio, Juan D Mu{~n}oz and Bock, J{"u}rgen [page]
Preparing for the Unknown: Learning a Universal Policy with Online System Identification, RSS, 2017 Yu, Wenhao and Tan, Jie and Liu, C Karen and Turk, Greg [page]
Natural Language Can Help Bridge the Sim2Real Gap, arXiv, 2024 Yu, Albert and Foote, Adeline and Mooney, Raymond and Mart{'\i}n-Mart{'\i}n, Roberto [page]
Reward-Adaptive Reinforcement Learning: Dynamic Policy Gradient Optimization for Bipedal Locomotion, IEEE TPAMI, 2023 Huang, Changxin and Wang, Guangrun and Zhou, Zhibo and Zhang, Ronghui and Lin, Liang [page]
DeepGait: Planning and Control of Quadrupedal Gaits using Deep Reinforcement Learning, IEEE Robotics and Automation Letters, 2020 Tsounis, Vassilios and Alge, Mitja and Lee, Joonho and Farshidian, Farbod and Hutter, Marco [page]
Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware, ICML, 2023 Zhao, Tony Z and Kumar, Vikash and Levine, Sergey and Finn, Chelsea [page]
Visual Whole-Body Control for Legged Loco-Manipulation, arXiv, 2024 Liu, Minghuan and Chen, Zixuan and Cheng, Xuxin and Ji, Yandong and Yang, Ruihan and Wang, Xiaolong [page]
Dynamic walk of a biped, The International Journal of Robotics Research, 1984 Miura, Hirofumi and Shimoyama, Isao [page]
A Compliant Hybrid Zero Dynamics Controller for Stable, Efficient and Fast Bipedal Walking on MABEL, The International Journal of Robotics Research, 2011 Sreenath, Koushil and Park, Hae-Won and Poulakakis, Ioannis and Grizzle, Jessy W [page]
MIT Cheetah 3: Design and Control of a Robust, Dynamic Quadruped Robot, IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2018 Bledt, Gerardo and Powell, Matthew J and Katz, Benjamin and Di Carlo, Jared and Wensing, Patrick M and Kim, Sangbae [page]
ANYmal - a highly mobile and dynamic quadrupedal robot, IEEE/RSJ international conference on intelligent robots and systems (IROS), 2016 Hutter, Marco and Gehring, Christian and Jud, Dominic and Lauber, Andreas and Bellicoso, C Dario and Tsounis, Vassilios and Hwangbo, Jemin and Bodie, Karen and Fankhauser, Peter and Bloesch, Michael and others [page]
Optimization Based Full Body Control for the Atlas Robot, IEEE-RAS International Conference on Humanoid Robots, 2014 Feng, Siyuan and Whitman, Eric and Xinjilefu, X and Atkeson, Christopher G [page]
Optimized Jumping on the MIT Cheetah 3 Robot, ICRA, 2019 Nguyen, Quan and Powell, Matthew J and Katz, Benjamin and Di Carlo, Jared and Kim, Sangbae [page]
Continuous Jumping for Legged Robots on Stepping Stones via Trajectory Optimization and Model Predictive Control, IEEE CDC, 2022 Nguyen, Chuong and Bao, Lingfan and Nguyen, Quan [page]
Practice Makes Perfect: An Optimization-Based Approach to Controlling Agile Motions for a Quadruped Robot, IEEE Robotics & Automation Magazine, 2016 Gehring, Christian and Coros, Stelian and Hutter, Marco and Bellicoso, Carmine Dario and Heijnen, Huub and Diethelm, Remo and Bloesch, Michael and Fankhauser, P{'e}ter and Hwangbo, Jemin and Hoepflinger, Mark and others [page]
Dynamic Walking on Randomly-Varying Discrete Terrain With One-Step Preview, Robotics: Science and Systems, 2017 Nguyen, Quan and Agrawal, Ayush and Da, Xingye and Martin, William C and Geyer, Hartmut and Grizzle, Jessy W and Sreenath, Koushil [page]
Deep Kernels for Optimizing Locomotion Controllers, CoRL, 2017 Antonova, Rika and Rai, Akshara and Atkeson, Christopher G [page]
Expressive Whole-Body Control for Humanoid Robots, arXiv, 2024 Cheng, Xuxin and Ji, Yandong and Chen, Junming and Yang, Ruihan and Yang, Ge and Wang, Xiaolong [page]
The MIT Humanoid Robot: Design, Motion Planning, and Control for Acrobatic Behaviors, IEEE-RAS 20th International Conference on Humanoid Robots (Humanoids), 2021 Chignoli, Matthew and Kim, Donghyun and Stanger-Jones, Elijah and Kim, Sangbae [page]

Other Useful Embodied Projects

Awesome-Embodied-Agent-with-LLMs
Awesome Embodied Vision
Habitat-Lab
GibsonEnv
Habitat-Sim
GRUtopia: Dream General Robots in a City at Scale
MANIPULATE-ANYTHING：Automating Real-World Robots using Vision-Language Models
Demonstrating HumanTHOR
RoboMamba
LEGENT: Open Platform for Embodied Agents
Octopus: Embodied Vision-Language Programmer from Environmental Feedback
Holodeck: Language Guided Generation of 3D Embodied AI Environments
AllenAct: An open source framework for research in Embodied AI
LEO: An Embodied Generalist Agent in 3D World
EmbodiedScan
EmbodiedQA
Voyager: An Open-Ended Embodied Agent with Large Language Models

Acknowledgement

We sincerely thank Jingzhou Luo, Xinshuai Song, Kaixuan Jiang, Zhida Li, and Ganlong Zhao for their contributions.

📰 Citation

If you think this survey is helpful, please feel free to leave a star ⭐️ and cite our paper:

@article{liu2024aligning,
  title={Aligning Cyber Space with Physical World: A Comprehensive Survey on Embodied AI},
  author={Liu, Yang and Chen, Weixing and Bai, Yongjie and Li, Guanbin and Gao, Wen and Lin, Liang},
  journal={arXiv preprint arXiv:2407.06886},
  year={2024}
}

For Tasks:

Click tags to check more tools for each tasks

For Jobs:

Alternative AI tools for Embodied_AI_Paper_List

Similar Open Source Tools

This repository contains a curated list of papers focusing on the self-correction of large language models (LLMs) during inference. It covers various frameworks for self-correction, including intrinsic self-correction, self-correction with external tools, self-correction with information retrieval, and self-correction with training designed specifically for self-correction. The list includes survey papers, negative results, and frameworks utilizing reinforcement learning and OpenAI o1-like approaches. Contributions are welcome through pull requests following a specific format.

github

: 69

Prompt4ReasoningPapers

Prompt4ReasoningPapers is a repository dedicated to reasoning with language model prompting. It provides a comprehensive survey of cutting-edge research on reasoning abilities with language models. The repository includes papers, methods, analysis, resources, and tools related to reasoning tasks. It aims to support various real-world applications such as medical diagnosis, negotiation, etc.

github

: 908

Awesome-LLM-Reasoning

**Curated collection of papers and resources on how to unlock the reasoning ability of LLMs and MLLMs.** **Description in less than 400 words, no line breaks and quotation marks.** Large Language Models (LLMs) have revolutionized the NLP landscape, showing improved performance and sample efficiency over smaller models. However, increasing model size alone has not proved sufficient for high performance on challenging reasoning tasks, such as solving arithmetic or commonsense problems. This curated collection of papers and resources presents the latest advancements in unlocking the reasoning abilities of LLMs and Multimodal LLMs (MLLMs). It covers various techniques, benchmarks, and applications, providing a comprehensive overview of the field. **5 jobs suitable for this tool, in lowercase letters.** - content writer - researcher - data analyst - software engineer - product manager **Keywords of the tool, in lowercase letters.** - llm - reasoning - multimodal - chain-of-thought - prompt engineering **5 specific tasks user can use this tool to do, in less than 3 words, Verb + noun form, in daily spoken language.** - write a story - answer a question - translate a language - generate code - summarize a document

github

: 2.3k

LLMAgentPapers

LLM Agents Papers is a repository containing must-read papers on Large Language Model Agents. It covers a wide range of topics related to language model agents, including interactive natural language processing, large language model-based autonomous agents, personality traits in large language models, memory enhancements, planning capabilities, tool use, multi-agent communication, and more. The repository also provides resources such as benchmarks, types of tools, and a tool list for building and evaluating language model agents. Contributors are encouraged to add important works to the repository.

github

: 1.6k

Awesome-LLM-Strawberry

Awesome LLM Strawberry is a collection of research papers and blogs related to OpenAI Strawberry(o1) and Reasoning. The repository is continuously updated to track the frontier of LLM Reasoning.

github

: 6.3k

AI-PhD-S25

AI-PhD-S25 is a mono-repo for the DOTE 6635 course on AI for Business Research at CUHK Business School. The course aims to provide a fundamental understanding of ML/AI concepts and methods relevant to business research, explore applications of ML/AI in business research, and discover cutting-edge AI/ML technologies. The course resources include Google CoLab for code distribution, Jupyter Notebooks, Google Sheets for group tasks, Overleaf template for lecture notes, replication projects, and access to HPC Server compute resource. The course covers topics like AI/ML in business research, deep learning basics, attention mechanisms, transformer models, LLM pretraining, posttraining, causal inference fundamentals, and more.

github

: 64

LLM-Synthetic-Data

LLM-Synthetic-Data is a repository focused on real-time, fine-grained LLM-Synthetic-Data generation. It includes methods, surveys, and application areas related to synthetic data for language models. The repository covers topics like pre-training, instruction tuning, model collapse, LLM benchmarking, evaluation, and distillation. It also explores application areas such as mathematical reasoning, code generation, text-to-SQL, alignment, reward modeling, long context, weak-to-strong generalization, agent and tool use, vision and language, factuality, federated learning, generative design, and safety.

github

: 101

AI-PhD-S24

AI-PhD-S24 is a mono-repo for the PhD course 'AI for Business Research' at CUHK Business School in Spring 2024. The course aims to provide a basic understanding of machine learning and artificial intelligence concepts/methods used in business research, showcase how ML/AI is utilized in business research, and introduce state-of-the-art AI/ML technologies. The course includes scribed lecture notes, class recordings, and covers topics like AI/ML fundamentals, DL, NLP, CV, unsupervised learning, and diffusion models.

github

: 90

Time-LLM

Time-LLM is a reprogramming framework that repurposes large language models (LLMs) for time series forecasting. It allows users to treat time series analysis as a 'language task' and effectively leverage pre-trained LLMs for forecasting. The framework involves reprogramming time series data into text representations and providing declarative prompts to guide the LLM reasoning process. Time-LLM supports various backbone models such as Llama-7B, GPT-2, and BERT, offering flexibility in model selection. The tool provides a general framework for repurposing language models for time series forecasting tasks.

github

: 764

SLAM-LLM

SLAM-LLM is a deep learning toolkit for training custom multimodal large language models (MLLM) focusing on speech, language, audio, and music processing. It provides detailed recipes for training and high-performance checkpoints for inference. The toolkit supports various tasks such as automatic speech recognition (ASR), text-to-speech (TTS), visual speech recognition (VSR), automated audio captioning (AAC), spatial audio understanding, and music caption (MC). Users can easily extend to new models and tasks, utilize mixed precision training for faster training with less GPU memory, and perform multi-GPU training with data and model parallelism. Configuration is flexible based on Hydra and dataclass, allowing different configuration methods.

github

: 647

AIRS

AIRS is a collection of open-source software tools, datasets, and benchmarks focused on Artificial Intelligence for Science in Quantum, Atomistic, and Continuum Systems. The goal is to develop and maintain an integrated, open, reproducible, and sustainable set of resources to advance the field of AI for Science. The current resources include tools for Quantum Mechanics, Density Functional Theory, Small Molecules, Protein Science, Materials Science, Molecular Interactions, and Partial Differential Equations.

github

: 588

Awesome-Machine-Generated-Text

github

: 170

Awesome-LLM-in-Social-Science

This repository compiles a list of academic papers that evaluate, align, simulate, and provide surveys or perspectives on the use of Large Language Models (LLMs) in the field of Social Science. The papers cover various aspects of LLM research, including assessing their alignment with human values, evaluating their capabilities in tasks such as opinion formation and moral reasoning, and exploring their potential for simulating social interactions and addressing issues in diverse fields of Social Science. The repository aims to provide a comprehensive resource for researchers and practitioners interested in the intersection of LLMs and Social Science.

github

: 136

IvyGPT

IvyGPT is a medical large language model that aims to generate the most realistic doctor consultation effects. It has been fine-tuned on high-quality medical Q&A data and trained using human feedback reinforcement learning. The project features full-process training on medical Q&A LLM, multiple fine-tuning methods support, efficient dataset creation tools, and a dataset of over 300,000 high-quality doctor-patient dialogues for training.

github

: 56

AICIty-reID-2020

AICIty-reID 2020 is a repository containing the 1st Place submission to AICity Challenge 2020 re-id track by Baidu-UTS. It includes models trained on Paddlepaddle and Pytorch, with performance metrics and trained models provided. Users can extract features, perform camera and direction prediction, and access related repositories for drone-based building re-id, vehicle re-ID, person re-ID baseline, and person/vehicle generation. Citations are also provided for research purposes.

github

: 449

For similar tasks

No tools available

For similar jobs

No tools available

Embodied_AI_Paper_List

README:

Paper list for Embodied AI

We appreciate any useful suggestions for improvement of this paper list or survey from peers. Please raise issues or send an email to [email protected] and [email protected]. Thanks for your cooperation!

💥 Update Log

📚 Table of Contents

Books & Surveys 🔝

Embodied Simulators 🔝

General Simulator

Real-Scene Based Simulators

Embodied Perception 🔝

Active Visual Exploration

3D Visual Grounding

Visual Language Navigation

Non-Visual Perception: Tactile

Embodied Interaction 🔝

Embodied Agent 🔝

Sim-to-Real Adaptation 🔝

Other Useful Embodied Projects

Acknowledgement

📰 Citation

For Tasks:

For Jobs:

Alternative AI tools for Embodied_AI_Paper_List

Similar Open Source Tools

Embodied_AI_Paper_List

llm-self-correction-papers

Prompt4ReasoningPapers

Awesome-LLM-Reasoning

LLMAgentPapers

Awesome-LLM-Strawberry

AI-PhD-S25

LLM-Synthetic-Data

AI-PhD-S24

Time-LLM

SLAM-LLM

AIRS

Awesome-Machine-Generated-Text

Awesome-LLM-in-Social-Science

IvyGPT

AICIty-reID-2020

For similar tasks

For similar jobs