
Awesome-World-Models
A comprehensive list of papers for the definition of World Models and using World Models for General Video Generation, Embodied AI, and Autonomous Driving, including papers, codes, and related websites.
Stars: 75

This repository is a curated list of papers related to World Models for General Video Generation, Embodied AI, and Autonomous Driving. It includes foundation papers, blog posts, technical reports, surveys, benchmarks, and specific world models for different applications. The repository serves as a valuable resource for researchers and practitioners interested in world models and their applications in robotics and AI.
README:
This repository provides a curated list of papers for World Models for General Video Generation, Embodied AI, and Autonomous Driving. Template from Awesome-LLM-Robotics and Awesome-World-Model
Contributions are welcome! Please feel free to submit pull requests or reach out via email to add papers!
If you find this repository useful, please consider citing and giving this list a star ⭐. Feel free to share it with others!
- Foundation paper of World Model
- Blog or Technical Report
- Surveys
- Benchmarks
- General World Models
- World Models for Embodied AI
- World Models for Autonomous Driving
- Citation
-
Cosmos
, Cosmos World Foundation Model Platform for Physical AI. [Paper] [Website] [Code] -
1X Technologies
, 1X World Model. [Blog] -
Runway
, Introducing General World Models. [Blog] -
Wayve
, Introducing GAIA-1: A Cutting-Edge Generative AI Model for Autonomy. [Paper] [Blog] -
Yann LeCun
, A Path Towards Autonomous Machine Intelligence. [Paper]
- "The Role of World Models in Shaping Autonomous Driving: A Comprehensive Survey",
arXiv 2025.02
. [Paper] [Code] - "A Survey of World Models for Autonomous Driving",
TPAMI
. [Paper] - "Understanding World or Predicting Future? A Comprehensive Survey of World Models",
arXiv 2024.11
. [Paper] - "World Models: The Safety Perspective",
ISSRE WDMD
. [Paper] - "Exploring the Interplay Between Video Generation and World Models in Autonomous Driving: A Survey",
arXiv 2024.11
. [Paper] - "From Efficient Multimodal Models to World Models: A Survey",
arXiv 2024.07
. [Paper] - "Aligning Cyber Space with Physical World: A Comprehensive Survey on Embodied AI",
arXiv 2024.07
. [Paper] [Code] - "Is Sora a World Simulator? A Comprehensive Survey on General World Models and Beyond",
arXiv 2024.05
. [Paper] [Code] - "World Models for Autonomous Driving: An Initial Survey",
TIV
. [Paper] - "A survey on multimodal large language models for autonomous driving",
WACVW 2024
. [Paper] [Code]
-
Text2World: "Text2World: Benchmarking Large Language Models for Symbolic World Model Generation",
arxiv 2025.02
. [Paper] [Website] -
ACT-Bench: "ACT-Bench: Towards Action Controllable World Models for Autonomous Driving",
arxiv 2024.12
. [Paper] -
WorldSimBench: "WorldSimBench: Towards Video Generation Models as World Simulators",
arxiv 2024.10
. [Paper] [Website] -
EVA: "EVA: An Embodied World Model for Future Video Anticipation",
arxiv 2024.10
. [Paper] [Website] -
AeroVerse: "AeroVerse: UAV-Agent Benchmark Suite for Simulating, Pre-training, Finetuning, and Evaluating Aerospace Embodied World Models",
arxiv 2024.08
. [Paper] -
CityBench: "CityBench: Evaluating the Capabilities of Large Language Model as World Model",
arXiv 2024.06
. [Paper] [Code] - "Imagine the Unseen World: A Benchmark for Systematic Generalization in Visual World Models",
NIPS 2023
. [Paper]
- "Learning To Explore With Predictive World Model Via Self-Supervised Learning",
arXiv 2025.02
. [Paper] -
M^3:: "M^3: A Modular World Model over Streams of Tokens",
arXiv 2025.02
. [Paper] - "When do neural networks learn world models?",
arXiv 2025.02
. [Paper] - "Pre-Trained Video Generative Models as World Simulators",
arXiv 2025.02
. [Paper] -
DMWM:: "DMWM: Dual-Mind World Model with Long-Term Imagination",
arXiv 2025.02
. [Paper] -
EvoAgent:: "EvoAgent: Agent Autonomous Evolution with Continual World Model for Long-Horizon Tasks",
arXiv 2025.02
. [Paper] - "Acquisition through My Eyes and Steps: A Joint Predictive Agent Model in Egocentric Worlds",
arXiv 2025.02
. [Paper] - "Generating Symbolic World Models via Test-time Scaling of Large Language Models",
arXiv 2025.02
. [Paper] [Website] - "Improving Transformer World Models for Data-Efficient RL",
arXiv 2025.02
. [Paper] - "Trajectory World Models for Heterogeneous Environments",
arXiv 2025.02
. [Paper] - "Enhancing Memory and Imagination Consistency in Diffusion-based World Models via Linear-Time Sequence Modeling",
arXiv 2025.02
. [Paper] - "Objects matter: object-centric world models improve reinforcement learning in visually complex environments",
arXiv 2025.01
. [Paper] -
GLAM: "GLAM: Global-Local Variation Awareness in Mamba-based World Model",
arXiv 2025.01
. [Paper] -
GAWM: "GAWM: Global-Aware World Model for Multi-Agent Reinforcement Learning",
arXiv 2025.01
. [Paper] - "Generative Emergent Communication: Large Language Model is a Collective World Model",
arXiv 2025.01
. [Paper] - "Towards Unraveling and Improving Generalization in World Models",
arXiv 2025.01
. [Paper] - "Towards Physically Interpretable World Models: Meaningful Weakly Supervised Representations for Visual Trajectory Prediction",
arXiv 2024.12
. [Paper] - "Transformers Use Causal World Models in Maze-Solving Tasks",
arXiv 2024.12
. [Paper] - "Causal World Representation in the GPT Model",
NIPS 2024 Workshop
. [Paper] -
Owl-1: "Owl-1: Omni World Model for Consistent Long Video Generation",
arXiv 2024.12
. [Paper] - "Navigation World Models",
arXiv 2024.12
. [Paper] [Website] - "Evaluating World Models with LLM for Decision Making",
arXiv 2024.11
. [Paper] -
LLMPhy: "LLMPhy: Complex Physical Reasoning Using Large Language Models and World Models",
arXiv 2024.11
. [Paper] -
WebDreamer: "Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents",
arXiv 2024.11
. [Paper] [Code] - "Scaling Laws for Pre-training Agents and World Models",
arXiv 2024.11
. [Paper] -
DINO-WM: "DINO-WM: World Models on Pre-trained Visual Features enable Zero-shot Planning",
arXiv 2024.11
. [Paper] [Website] - "Learning World Models for Unconstrained Goal Navigation",
NIPS 2024
. [Paper] - "How Far is Video Generation from World Model: A Physical Law Perspective",
arXiv 2024.11
. [Paper] [Website] [Code] -
Adaptive World Models: "Adaptive World Models: Learning Behaviors by Latent Imagination Under Non-Stationarity",
NIPS 2024 Workshop Adaptive Foundation Models
. [Paper] -
LLMCWM: "Language Agents Meet Causality -- Bridging LLMs and Causal World Models",
arXiv 2024.10
. [Paper] [Code] - "Reward-free World Models for Online Imitation Learning",
arXiv 2024.10
. [Paper] - "Web Agents with World Models: Learning and Leveraging Environment Dynamics in Web Navigation",
arXiv 2024.10
. [Paper] -
AVID: "AVID: Adapting Video Diffusion Models to World Models",
arXiv 2024.10
. [Paper] [Code] -
SMAC: "Grounded Answers for Multi-agent Decision-making Problem through Generative World Model",
NeurIPS 2024
. [Paper] -
OSWM: "One-shot World Models Using a Transformer Trained on a Synthetic Prior",
arXiv 2024.09
. [Paper] - "Making Large Language Models into World Models with Precondition and Effect Knowledge",
arXiv 2024.09
. [Paper] - "Efficient Exploration and Discriminative World Model Learning with an Object-Centric Abstraction",
arXiv 2024.08
. [Paper] -
MoReFree: "World Models Increase Autonomy in Reinforcement Learning",
arXiv 2024.08
. [Paper] [Project] -
UrbanWorld: "UrbanWorld: An Urban World Model for 3D City Generation",
arXiv 2024.07
. [Paper] -
PWM: "PWM: Policy Learning with Large World Models",
arXiv 2024.07
. [Paper] [Code] - "Predicting vs. Acting: A Trade-off Between World Modeling & Agent Modeling",
arXiv 2024.07
. [Paper] -
GenRL: "GenRL: Multimodal foundation world models for generalist embodied agents",
arXiv 2024.06
. [Paper] [Code] -
DLLM: "World Models with Hints of Large Language Models for Goal Achieving",
arXiv 2024.06
. [Paper] - "Cognitive Map for Language Models: Optimal Planning via Verbally Representing the World Model",
arXiv 2024.06
. [Paper] -
CoDreamer: "CoDreamer: Communication-Based Decentralised World Models",
arXiv 2024.06
. [Paper] -
Pandora: "Pandora: Towards General World Model with Natural Language Actions and Video States",
arXiv 2024.06
. [Paper] [Code] -
EBWM: "Cognitively Inspired Energy-Based World Models",
arXiv 2024.06
. [Paper] - "Evaluating the World Model Implicit in a Generative Model",
arXiv 2024.06
. [Paper] [Code] - "Transformers and Slot Encoding for Sample Efficient Physical World Modelling",
arXiv 2024.05
. [Paper] [Code] -
Puppeteer: "Hierarchical World Models as Visual Whole-Body Humanoid Controllers",
arXiv 2024.05
. [Paper] [Code] -
BWArea Model: "BWArea Model: Learning World Model, Inverse Dynamics, and Policy for Controllable Language Generation",
arXiv 2024.05
. [Paper] -
WKM: "Agent Planning with World Knowledge Model",
arXiv 2024.05
. [Paper] [Code] -
Diamond: "Diffusion for World Modeling: Visual Details Matter in Atari",
arXiv 2024.05
. [Paper] [Code] - "Compete and Compose: Learning Independent Mechanisms for Modular World Models",
arXiv 2024.04
. [Paper] - "Dreaming of Many Worlds: Learning Contextual World Models Aids Zero-Shot Generalization",
arXiv 2024.03
. [Paper] [Code] -
V-JEPA: "V-JEPA: Video Joint Embedding Predictive Architecture",
Meta AI
. [Blog] [Paper] [Code] -
IWM: "Learning and Leveraging World Models in Visual Representation Learning",
Meta AI
. [Paper] -
Genie: "Genie: Generative Interactive Environments",
DeepMind
. [Paper] [Blog] -
Sora: "Video generation models as world simulators",
OpenAI
. [Technical report] -
LWM: "World Model on Million-Length Video And Language With RingAttention",
arXiv 2024.02
. [Paper] [Code] - "Planning with an Ensemble of World Models",
OpenReview
. [Paper] -
WorldDreamer: "WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens",
arXiv 2024.01
. [Paper] [Code] -
CWM: "Understanding Physical Dynamics with Counterfactual World Modeling",
ECCV 2024
. [Paper] [Code] -
Δ-IRIS: "Efficient World Models with Context-Aware Tokenization",
ICML 2024
. [Paper] [Code] -
LLM-Sim: "Can Language Models Serve as Text-Based World Simulators?",
ACL
. [Paper] [Code] -
AD3: "AD3: Implicit Action is the Key for World Models to Distinguish the Diverse Visual Distractors",
ICML 2024
. [Paper] -
MAMBA: "MAMBA: an Effective World Model Approach for Meta-Reinforcement Learning",
ICLR 2024
. [Paper] [Code] -
R2I: "Mastering Memory Tasks with World Models",
ICLR 2024
. [Paper] [Website] [Code] -
HarmonyDream: "HarmonyDream: Task Harmonization Inside World Models",
ICML 2024
. [Paper] [Code] -
REM: "Improving Token-Based World Models with Parallel Observation Prediction",
ICML 2024
. [Paper] [Code] - "Do Transformer World Models Give Better Policy Gradients?"",
ICML 2024
. [Paper] -
DreamSmooth: "DreamSmooth: Improving Model-based Reinforcement Learning via Reward Smoothing",
ICLR 2024
. [Paper] -
TD-MPC2: "TD-MPC2: Scalable, Robust World Models for Continuous Control",
ICLR 2024
. [Paper] [Torch Code] -
Hieros: "Hieros: Hierarchical Imagination on Structured State Space Sequence World Models",
ICML 2024
. [Paper] -
CoWorld: "Making Offline RL Online: Collaborative World Models for Offline Visual Reinforcement Learning",
NeurIPS 2024
. [Paper]
- "Strengthening Generative Robot Policies through Predictive World Modeling",
arXiv 2025.02
. [Paper] [Website] -
Robotic World Model: "Robotic World Model: A Neural Network Simulator for Robust Policy Optimization in Robotics",
arXiv 2025.01
. [Paper] -
RoboHorizon: "RoboHorizon: An LLM-Assisted Multi-View World Model for Long-Horizon Robotic Manipulation",
arXiv 2025.01
. [Paper] -
Dream to Manipulate: "Dream to Manipulate: Compositional World Models Empowering Robot Imitation Learning with Imagination",
arXiv 2024.12
. [Paper] [Website] -
WHALE: "WHALE: Towards Generalizable and Scalable World Models for Embodied Decision-making",
arXiv 2024.11
. [Paper] -
VisualPredicator: "VisualPredicator: Learning Abstract World Models with Neuro-Symbolic Predicates for Robot Planning",
arXiv 2024.10
. [Paper] - "Multi-Task Interactive Robot Fleet Learning with Visual World Models",
CoRL 2024
. [Paper] [Code] -
X-MOBILITY: "X-MOBILITY: End-To-End Generalizable Navigation via World Modeling",
arXiv 2024.10
. [Paper] -
PIVOT-R: "PIVOT-R: Primitive-Driven Waypoint-Aware World Model for Robotic Manipulation",
NeurIPS 2024
. [Paper] -
GLIMO: "Grounding Large Language Models In Embodied Environment With Imperfect World Models",
arXiv 2024.10
. [Paper] -
EVA: "EVA: An Embodied World Model for Future Video Anticipation",
arxiv 2024.10
. [Paper] [Website] -
PreLAR: "PreLAR: World Model Pre-training with Learnable Action Representation",
ECCV 2024
. [Paper] [Code] -
WMP: "World Model-based Perception for Visual Legged Locomotion",
arXiv 2024.09
. [Paper] [Project] -
R-AIF: "R-AIF: Solving Sparse-Reward Robotic Tasks from Pixels with Active Inference and World Models",
arXiv 2024.09
. [Paper] - "Representing Positional Information in Generative World Models for Object Manipulation"
arXiv 2024.09
[Paper] -
DexSim2Real$^2$: "DexSim2Real$^2: Building Explicit World Model for Precise Articulated Object Dexterous Manipulation",
arXiv 2024.09
. [Paper] -
DWL: "Advancing Humanoid Locomotion: Mastering Challenging Terrains with Denoising World Model Learning",
RSS 2024 (Best Paper Award Finalist)
. [Paper] - "Physically Embodied Gaussian Splatting: A Realtime Correctable World Model for Robotics",
arXiv 2024.06
. [Paper] [Website] -
HRSSM: "Learning Latent Dynamic Robust Representations for World Models",
ICML 2024
. [Paper] [Code] -
RoboDreamer: "RoboDreamer: Learning Compositional World Models for Robot Imagination",
ICML 2024
. [Paper] [Code] -
COMBO: "COMBO: Compositional World Models for Embodied Multi-Agent Cooperation",
ECCV 2024
. [Paper] [Website] [Code] -
3D-VLA: "3D-VLA: A 3D Vision-Language-Action Generative World Model",
ICML 2024
. [Paper] -
ManiGaussian: "ManiGaussian: Dynamic Gaussian Splatting for Multi-task Robotic Manipulation",
arXiv 2024.03
. [Paper] [Code]
-
MaskGWM: "MaskGWM: A Generalizable Driving World Model with Video Mask Reconstruction",
arXiv 2025.02
. [Paper] -
Dream to Drive: "Dream to Drive: Model-Based Vehicle Control Using Analytic World Models",
arXiv 2025.02
. [Paper] - "Semi-Supervised Vision-Centric 3D Occupancy World Model for Autonomous Driving",
ICLR 2025
. [Paper] - "Dream to Drive with Predictive Individual World Model",
IEEE TIV
. [Paper] [Code] -
HERMES: "HERMES: A Unified Self-Driving World Model for Simultaneous 3D Scene Understanding and Generation",
arXiv 2025.01
. [Paper] -
AdaWM: "AdaWM: Adaptive World Model based Planning for Autonomous Driving",
ICLR 2025
. [Paper] -
AD-L-JEPA: "AD-L-JEPA: Self-Supervised Spatial World Models with Joint Embedding Predictive Architecture for Autonomous Driving with LiDAR Data",
arXiv 2025.01
. [Paper] -
DrivingWorld: "DrivingWorld: Constructing World Model for Autonomous Driving via Video GPT",
arXiv 2024.12
. [Paper] [Code] [Project Page] -
DrivingGPT: "DrivingGPT: Unifying Driving World Modeling and Planning with Multi-modal Autoregressive Transformers",
arXiv 2024.12
. [Paper] [Project Page] - "An Efficient Occupancy World Model via Decoupled Dynamic Flow and Image-assisted Training",
arXiv 2024.12
. [Paper] -
GEM: "GEM: A Generalizable Ego-Vision Multimodal World Model for Fine-Grained Ego-Motion, Object Dynamics, and Scene Composition Control",
arXiv 2024.12
. [Paper] [Project Page] -
GaussianWorld: "GaussianWorld: Gaussian World Model for Streaming 3D Occupancy Prediction",
arXiv 2024.12
. [Paper] [Code] -
Doe-1: "Doe-1: Closed-Loop Autonomous Driving with Large World Model",
arXiv 2024.12
. [Paper] [Project Page] [Code] - "Pysical Informed Driving World Model",
arXiv 2024.12
. [Paper] [Project Page] -
InfiniCube: "InfiniCube: Unbounded and Controllable Dynamic 3D Driving Scene Generation with World-Guided Video Models",
arXiv 2024.12
. [Paper] [Project Page] -
InfinityDrive: "InfinityDrive: Breaking Time Limits in Driving World Models",
arXiv 2024.12
. [Paper] [Project Page] -
ReconDreamer: "ReconDreamer: Crafting World Models for Driving Scene Reconstruction via Online Restoration",
arXiv 2024.11
. [Paper] [Project Page] -
Imagine-2-Drive: "Imagine-2-Drive: High-Fidelity World Modeling in CARLA for Autonomous Vehicles",
ICRA 2025
. [Paper] [Project Page] -
DriveDreamer4D: "World Models Are Effective Data Machines for 4D Driving Scene Representation",
arXiv 2024.10
. [Paper] [Project Page] -
DOME: "Taming Diffusion Model into High-Fidelity Controllable Occupancy World Model",
arXiv 2024.10
. [Paper] [Project Page] -
SSR: "Does End-to-End Autonomous Driving Really Need Perception Tasks?",
arXiv 2024.09
. [Paper] [Code] - "Mitigating Covariate Shift in Imitation Learning for Autonomous Vehicles Using Latent Space Generative World Models",
arXiv 2024.09
. [Paper] -
LatentDriver: "Learning Multiple Probabilistic Decisions from Latent World Model in Autonomous Driving",
arXiv 2024.09
. [Paper] [Code] -
RenderWorld: "World Model with Self-Supervised 3D Label",
arXiv 2024.09
. [Paper] -
OccLLaMA: "An Occupancy-Language-Action Generative World Model for Autonomous Driving",
arXiv 2024.09
. [Paper] -
DriveGenVLM: "Real-world Video Generation for Vision Language Model based Autonomous Driving",
arXiv 2024.08
. [Paper] -
Drive-OccWorld: "Driving in the Occupancy World: Vision-Centric 4D Occupancy Forecasting and Planning via World Models for Autonomous Driving",
arXiv 2024.08
. [Paper] -
CarFormer: "Self-Driving with Learned Object-Centric Representations",
ECCV 2024
. [Paper] [Code] -
BEVWorld: "A Multimodal World Model for Autonomous Driving via Unified BEV Latent Space",
arXiv 2024.07
. [Paper] [Code] -
TOKEN: "Tokenize the World into Object-level Knowledge to Address Long-tail Events in Autonomous Driving",
arXiv 2024.07
. [Paper] -
UMAD: "Unsupervised Mask-Level Anomaly Detection for Autonomous Driving",
arXiv 2024.06
. [Paper] -
SimGen: "Simulator-conditioned Driving Scene Generation",
arXiv 2024.06
. [Paper] [Code] -
AdaptiveDriver: "Planning with Adaptive World Models for Autonomous Driving",
arXiv 2024.06
. [Paper] [Code] -
UnO: "Unsupervised Occupancy Fields for Perception and Forecasting",
CVPR 2024
. [Paper] [Code] -
LAW: "Enhancing End-to-End Autonomous Driving with Latent World Model",
arXiv 2024.06
. [Paper] [Code] -
Delphi: "Unleashing Generalization of End-to-End Autonomous Driving with Controllable Long Video Generation",
arXiv 2024.06
. [Paper] [Code] -
OccSora: "4D Occupancy Generation Models as World Simulators for Autonomous Driving",
arXiv 2024.05
. [Paper] [Code] -
MagicDrive3D: "Controllable 3D Generation for Any-View Rendering in Street Scenes",
arXiv 2024.05
. [Paper] [Code] -
Vista: "A Generalizable Driving World Model with High Fidelity and Versatile Controllability",
NeurIPS 2024
. [Paper] [Code] -
CarDreamer: "Open-Source Learning Platform for World Model based Autonomous Driving",
arXiv 2024.05
. [Paper] [Code] -
DriveSim: "Probing Multimodal LLMs as World Models for Driving",
arXiv 2024.05
. [Paper] [Code] -
DriveWorld: "4D Pre-trained Scene Understanding via World Models for Autonomous Driving",
CVPR 2024
. [Paper] -
LidarDM: "Generative LiDAR Simulation in a Generated World",
arXiv 2024.04
. [Paper] [Code] -
SubjectDrive: "Scaling Generative Data in Autonomous Driving via Subject Control",
arXiv 2024.03
. [Paper] [Project] -
DriveDreamer-2: "LLM-Enhanced World Models for Diverse Driving Video Generation",
arXiv 2024.03
. [Paper] [Code] -
Think2Drive: "Efficient Reinforcement Learning by Thinking in Latent World Model for Quasi-Realistic Autonomous Driving",
ECCV 2024
. [Paper] -
MARL-CCE: "Modelling Competitive Behaviors in Autonomous Driving Under Generative World Model",
ECCV 2024
. [Paper] [Code] -
GenAD: "Generalized Predictive Model for Autonomous Driving",
CVPR 2024
. [Paper] [Data] -
GenAD: "Generative End-to-End Autonomous Driving",
ECCV 2024
. [Paper] [Code] -
NeMo: "Neural Volumetric World Models for Autonomous Driving",
ECCV 2024
. [Paper] -
MARL-CCE: "Modelling-Competitive-Behaviors-in-Autonomous-Driving-Under-Generative-World-Model",
ECCV 2024
. [Code] -
ViDAR: "Visual Point Cloud Forecasting enables Scalable Autonomous Driving",
CVPR 2024
. [Paper] [Code] -
Drive-WM: "Driving into the Future: Multiview Visual Forecasting and Planning with World Model for Autonomous Driving",
CVPR 2024
. [Paper] [Code] -
Cam4DOCC: "Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Applications",
CVPR 2024
. [Paper] [Code] -
Panacea: "Panoramic and Controllable Video Generation for Autonomous Driving",
CVPR 2024
. [Paper] [Code] -
OccWorld: "Learning a 3D Occupancy World Model for Autonomous Driving",
ECCV 2024
. [Paper] [Code] -
Copilot4D: "Learning Unsupervised World Models for Autonomous Driving via Discrete Diffusion",
ICLR 2024
. [Paper] -
DrivingDiffusion: "Layout-Guided multi-view driving scene video generation with latent diffusion model",
ECCV 2024
. [Paper] [Code] -
SafeDreamer: "Safe Reinforcement Learning with World Models",
ICLR 2024
. [Paper] [Code] -
MagicDrive: "Street View Generation with Diverse 3D Geometry Control",
ICLR 2024
. [Paper] [Code] -
DriveDreamer: "Towards Real-world-driven World Models for Autonomous Driving",
ECCV 2024
. [Paper] [Code] -
SEM2: "Enhance Sample Efficiency and Robustness of End-to-end Urban Autonomous Driving via Semantic Masked World Model",
TITS
. [Paper]
If you find this repository useful, please consider citing this list:
@misc{leo2024worldmodelspaperslist,
title = {Awesome-World-Models},
author = {Leo Fan},
journal = {GitHub repository},
url = {https://github.com/leofan90/Awesome-World-Models},
year = {2024},
}
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for Awesome-World-Models
Similar Open Source Tools

Awesome-World-Models
This repository is a curated list of papers related to World Models for General Video Generation, Embodied AI, and Autonomous Driving. It includes foundation papers, blog posts, technical reports, surveys, benchmarks, and specific world models for different applications. The repository serves as a valuable resource for researchers and practitioners interested in world models and their applications in robotics and AI.

Paper-Reading-ConvAI
Paper-Reading-ConvAI is a repository that contains a list of papers, datasets, and resources related to Conversational AI, mainly encompassing dialogue systems and natural language generation. This repository is constantly updating.

Everything-LLMs-And-Robotics
The Everything-LLMs-And-Robotics repository is the world's largest GitHub repository focusing on the intersection of Large Language Models (LLMs) and Robotics. It provides educational resources, research papers, project demos, and Twitter threads related to LLMs, Robotics, and their combination. The repository covers topics such as reasoning, planning, manipulation, instructions and navigation, simulation frameworks, perception, and more, showcasing the latest advancements in the field.

Awesome-LLM-Interpretability
Awesome-LLM-Interpretability is a curated list of materials related to LLM (Large Language Models) interpretability, covering tutorials, code libraries, surveys, videos, papers, and blogs. It includes resources on transformer mechanistic interpretability, visualization, interventions, probing, fine-tuning, feature representation, learning dynamics, knowledge editing, hallucination detection, and redundancy analysis. The repository aims to provide a comprehensive overview of tools, techniques, and methods for understanding and interpreting the inner workings of large language models.

Awesome_papers_on_LLMs_detection
This repository is a curated list of papers focused on the detection of Large Language Models (LLMs)-generated content. It includes the latest research papers covering detection methods, datasets, attacks, and more. The repository is regularly updated to include the most recent papers in the field.

Awesome_Test_Time_LLMs
This repository focuses on test-time computing, exploring various strategies such as test-time adaptation, modifying the input, editing the representation, calibrating the output, test-time reasoning, and search strategies. It covers topics like self-supervised test-time training, in-context learning, activation steering, nearest neighbor models, reward modeling, and multimodal reasoning. The repository provides resources including papers and code for researchers and practitioners interested in enhancing the reasoning capabilities of large language models.

Awesome-Robotics-3D
Awesome-Robotics-3D is a curated list of 3D Vision papers related to Robotics domain, focusing on large models like LLMs/VLMs. It includes papers on Policy Learning, Pretraining, VLM and LLM, Representations, and Simulations, Datasets, and Benchmarks. The repository is maintained by Zubair Irshad and welcomes contributions and suggestions for adding papers. It serves as a valuable resource for researchers and practitioners in the field of Robotics and Computer Vision.

Awesome-LLM-Robotics
This repository contains a curated list of **papers using Large Language/Multi-Modal Models for Robotics/RL**. Template from awesome-Implicit-NeRF-Robotics Please feel free to send me pull requests or email to add papers! If you find this repository useful, please consider citing and STARing this list. Feel free to share this list with others! ## Overview * Surveys * Reasoning * Planning * Manipulation * Instructions and Navigation * Simulation Frameworks * Citation

VoAPI
VoAPI is a new high-value/high-performance AI model interface management and distribution system. It is a closed-source tool for personal learning use only, not for commercial purposes. Users must comply with upstream AI model service providers and legal regulations. The system offers a visually appealing interface, independent development documentation page support, service monitoring page configuration support, and third-party login support. It also optimizes interface elements, user registration time support, data operation button positioning, and more.

ABigSurveyOfLLMs
ABigSurveyOfLLMs is a repository that compiles surveys on Large Language Models (LLMs) to provide a comprehensive overview of the field. It includes surveys on various aspects of LLMs such as transformers, alignment, prompt learning, data management, evaluation, societal issues, safety, misinformation, attributes of LLMs, efficient LLMs, learning methods for LLMs, multimodal LLMs, knowledge-based LLMs, extension of LLMs, LLMs applications, and more. The repository aims to help individuals quickly understand the advancements and challenges in the field of LLMs through a collection of recent surveys and research papers.

awesome-LLM-game-agent-papers
This repository provides a comprehensive survey of research papers on large language model (LLM)-based game agents. LLMs are powerful AI models that can understand and generate human language, and they have shown great promise for developing intelligent game agents. This survey covers a wide range of topics, including adventure games, crafting and exploration games, simulation games, competition games, cooperation games, communication games, and action games. For each topic, the survey provides an overview of the state-of-the-art research, as well as a discussion of the challenges and opportunities for future work.

Awesome-Embodied-Agent-with-LLMs
This repository, named Awesome-Embodied-Agent-with-LLMs, is a curated list of research related to Embodied AI or agents with Large Language Models. It includes various papers, surveys, and projects focusing on topics such as self-evolving agents, advanced agent applications, LLMs with RL or world models, planning and manipulation, multi-agent learning and coordination, vision and language navigation, detection, 3D grounding, interactive embodied learning, rearrangement, benchmarks, simulators, and more. The repository provides a comprehensive collection of resources for individuals interested in exploring the intersection of embodied agents and large language models.

hugging-llm
HuggingLLM is a project that aims to introduce ChatGPT to a wider audience, particularly those interested in using the technology to create new products or applications. The project focuses on providing practical guidance on how to use ChatGPT-related APIs to create new features and applications. It also includes detailed background information and system design introductions for relevant tasks, as well as example code and implementation processes. The project is designed for individuals with some programming experience who are interested in using ChatGPT for practical applications, and it encourages users to experiment and create their own applications and demos.

MedLLMsPracticalGuide
This repository serves as a practical guide for Medical Large Language Models (Medical LLMs) and provides resources, surveys, and tools for building, fine-tuning, and utilizing LLMs in the medical domain. It covers a wide range of topics including pre-training, fine-tuning, downstream biomedical tasks, clinical applications, challenges, future directions, and more. The repository aims to provide insights into the opportunities and challenges of LLMs in medicine and serve as a practical resource for constructing effective medical LLMs.

simpletransformers
Simple Transformers is a library based on the Transformers library by HuggingFace, allowing users to quickly train and evaluate Transformer models with only 3 lines of code. It supports various tasks such as Information Retrieval, Language Models, Encoder Model Training, Sequence Classification, Token Classification, Question Answering, Language Generation, T5 Model, Seq2Seq Tasks, Multi-Modal Classification, and Conversational AI.

Awesome-Quantization-Papers
This repo contains a comprehensive paper list of **Model Quantization** for efficient deep learning on AI conferences/journals/arXiv. As a highlight, we categorize the papers in terms of model structures and application scenarios, and label the quantization methods with keywords.
For similar tasks

agents
The LiveKit Agent Framework is designed for building real-time, programmable participants that run on servers. Easily tap into LiveKit WebRTC sessions and process or generate audio, video, and data streams. The framework includes plugins for common workflows, such as voice activity detection and speech-to-text. Agents integrates seamlessly with LiveKit server, offloading job queuing and scheduling responsibilities to it. This eliminates the need for additional queuing infrastructure. Agent code developed on your local machine can scale to support thousands of concurrent sessions when deployed to a server in production.

awesome-generative-ai
A curated list of Generative AI projects, tools, artworks, and models

TeroSubtitler
Tero Subtitler is an open source, cross-platform, and free subtitle editing software with a user-friendly interface. It offers fully fledged editing with SMPTE and MEDIA modes, support for various subtitle formats, multi-level undo/redo, search and replace, auto-backup, source and transcription modes, translation memory, audiovisual preview, timeline with waveform visualizer, manipulation tools, formatting options, quality control features, translation and transcription capabilities, validation tools, automation for correcting errors, and more. It also includes features like exporting subtitles to MP3, importing/exporting Blu-ray SUP format, generating blank video, generating video with hardcoded subtitles, video dubbing, and more. The tool utilizes powerful multimedia playback engines like mpv, advanced audio/video manipulation tools like FFmpeg, tools for automatic transcription like whisper.cpp/Faster-Whisper, auto-translation API like Google Translate, and ElevenLabs TTS for video dubbing.

novel2video
Novel2Video is a tool designed to batch convert novel content into images and audio, ultimately generating novel tweets. It uses llama-3.1-405b for extracting novel scenes, compatible with openaiapi. It supports Stable Diffusion web UI and ComfyUI, character locking for consistency, batch image output, single image redraw, and EdgeTTS for text-to-speech conversion.

Awesome-World-Models
This repository is a curated list of papers related to World Models for General Video Generation, Embodied AI, and Autonomous Driving. It includes foundation papers, blog posts, technical reports, surveys, benchmarks, and specific world models for different applications. The repository serves as a valuable resource for researchers and practitioners interested in world models and their applications in robotics and AI.

nerve
Nerve is a tool that allows creating stateful agents with any LLM of your choice without writing code. It provides a framework of functionalities for planning, saving, or recalling memories by dynamically adapting the prompt. Nerve is experimental and subject to changes. It is valuable for learning and experimenting but not recommended for production environments. The tool aims to instrument smart agents without code, inspired by projects like Dreadnode's Rigging framework.

ReasonablePlanningAI
Reasonable Planning AI is a robust design and data-driven AI solution for game developers. It provides an AI Editor that allows creating AI without Blueprints or C++. The AI can think for itself, plan actions, adapt to the game environment, and act dynamically. It consists of Core components like RpaiGoalBase, RpaiActionBase, RpaiPlannerBase, RpaiReasonerBase, and RpaiBrainComponent, as well as Composer components for easier integration by Game Designers. The tool is extensible, cross-compatible with Behavior Trees, and offers debugging features like visual logging and heuristics testing. It follows a simple path of execution and supports versioning for stability and compatibility with Unreal Engine versions.

dogoap
Data-Oriented GOAP (Goal-Oriented Action Planning) is a library that implements GOAP in a data-oriented way, allowing for dynamic setup of states, actions, and goals. It includes bevy_dogoap for Bevy integration. It is useful for NPCs performing tasks dependent on each other, enabling NPCs to improvise reaching goals, and offers a middle ground between Utility AI and HTNs. The library is inspired by the F.E.A.R GDC talk and provides a minimal Bevy example for implementation.
For similar jobs

weave
Weave is a toolkit for developing Generative AI applications, built by Weights & Biases. With Weave, you can log and debug language model inputs, outputs, and traces; build rigorous, apples-to-apples evaluations for language model use cases; and organize all the information generated across the LLM workflow, from experimentation to evaluations to production. Weave aims to bring rigor, best-practices, and composability to the inherently experimental process of developing Generative AI software, without introducing cognitive overhead.

LLMStack
LLMStack is a no-code platform for building generative AI agents, workflows, and chatbots. It allows users to connect their own data, internal tools, and GPT-powered models without any coding experience. LLMStack can be deployed to the cloud or on-premise and can be accessed via HTTP API or triggered from Slack or Discord.

VisionCraft
The VisionCraft API is a free API for using over 100 different AI models. From images to sound.

kaito
Kaito is an operator that automates the AI/ML inference model deployment in a Kubernetes cluster. It manages large model files using container images, avoids tuning deployment parameters to fit GPU hardware by providing preset configurations, auto-provisions GPU nodes based on model requirements, and hosts large model images in the public Microsoft Container Registry (MCR) if the license allows. Using Kaito, the workflow of onboarding large AI inference models in Kubernetes is largely simplified.

PyRIT
PyRIT is an open access automation framework designed to empower security professionals and ML engineers to red team foundation models and their applications. It automates AI Red Teaming tasks to allow operators to focus on more complicated and time-consuming tasks and can also identify security harms such as misuse (e.g., malware generation, jailbreaking), and privacy harms (e.g., identity theft). The goal is to allow researchers to have a baseline of how well their model and entire inference pipeline is doing against different harm categories and to be able to compare that baseline to future iterations of their model. This allows them to have empirical data on how well their model is doing today, and detect any degradation of performance based on future improvements.

tabby
Tabby is a self-hosted AI coding assistant, offering an open-source and on-premises alternative to GitHub Copilot. It boasts several key features: * Self-contained, with no need for a DBMS or cloud service. * OpenAPI interface, easy to integrate with existing infrastructure (e.g Cloud IDE). * Supports consumer-grade GPUs.

spear
SPEAR (Simulator for Photorealistic Embodied AI Research) is a powerful tool for training embodied agents. It features 300 unique virtual indoor environments with 2,566 unique rooms and 17,234 unique objects that can be manipulated individually. Each environment is designed by a professional artist and features detailed geometry, photorealistic materials, and a unique floor plan and object layout. SPEAR is implemented as Unreal Engine assets and provides an OpenAI Gym interface for interacting with the environments via Python.

Magick
Magick is a groundbreaking visual AIDE (Artificial Intelligence Development Environment) for no-code data pipelines and multimodal agents. Magick can connect to other services and comes with nodes and templates well-suited for intelligent agents, chatbots, complex reasoning systems and realistic characters.