awesome-vla-for-ad

awesome-vla-for-ad

🌐 Vision-Language-Action Models for Autonomous Driving: Past, Present, and Future

Stars: 287

Visit
 screenshot

README:

Awesome Logo arXiv Visitors PR's Welcome

😎 Awesome VLA for Autonomous Driving

Autonomous driving has long relied on modular "Perception-Decision-Action" pipelines, whose hand-crafted interfaces and rule-based components often struggle in complex, dynamic, or long-tailed scenarios. Their cascaded structure also amplifies upstream perception errors, undermining downstream planning and control.

This survey reviews vision-action (VA) models and vision-language-action (VLA) models for autonomous driving. We trace the evolution from early VA approaches to modern VLA frameworks, and organize existing methods into two principal paradigms:

  • End-to-End VLA, which integrates perception, reasoning, and planning within a single model.
  • Dual-System VLA, which separates slow deliberation (via VLMs) from fast, safety-critical execution (via planners).

For more details, kindly refer to our πŸ“š Paper, 🌐 Project Page, and πŸ€— HuggingFace Leaderboard.

πŸ“š Citation

If you find this work helpful for your research, please kindly consider citing our paper:

@article{survey_vla4ad,
    title   = {Vision-Language-Action Models for Autonomous Driving: Past, Present, and Future},
    author  = {Tianshuai Hu and Xiaolu Liu and Song Wang and Yiyao Zhu and Ao Liang and Lingdong Kong and Guoyang Zhao and Zeying Gong and Jun Cen and Zhiyu Huang and Xiaoshuai Hao and Linfeng Li and Hang Song and Xiangtai Li and Jun Ma and Shaojie Shen and Jianke Zhu and Dacheng Tao and Ziwei Liu and Junwei Liang},
    journal = {arXiv preprint arXiv:2512.16760},
    year    = {2025},
}

Table of Contents

1. Vision-Action Models

1️⃣ Action-Only Models

⏲️ In chronological order, from the earliest to the latest.

Model Paper Venue Website GitHub
LBC arXiv
Learning by Cheating
CoRL 2020 - GitHub
Latent-DRL arXiv
End-to-End Model-Free Reinforcement Learning for Urban Driving using Implicit Affordances
CVPR 2020 - -
NEAT arXiv
NEAT: Neural Attention Fields for End-to-End Autonomous Driving
ICCV 2021 - GitHub
Roach arXiv
End-to-End Urban Driving by Imitating a Reinforcement Learning Coach
ICCV 2021 Website GitHub
WoR arXiv
Learning to Drive from A World on Rails
ICCV 2021 Website GitHub
TCP arXiv
Trajectory-guided Control Prediction for End-to-end Autonomous Driving: A Simple yet Strong Baseline
NeurIPS 2022 - GitHub
Urban-Driver arXiv
Urban Driver: Learning to Drive from Real-world Demonstrations Using Policy Gradients
CoRL 2022 Website GitHub
LAV arXiv
Learning from All Vehicles
CVPR 2022 Website GitHub
TransFuser arXiv
TransFuser: Imitation with Transformer-Based Sensor Fusion for Autonomous Driving
TPAMI 2023 - GitHub
GRI arXiv
GRI: General Reinforced Imitation and its Application to Vision-Based Autonomous Driving
Robotics 2023 - -
BEVPlanner arXiv
Is Ego Status All You Need for Open-Loop End-to-End Autonomous Driving?
CVPR 2024 - GitHub
Raw2Drive arXiv
Raw2Drive: Reinforcement Learning with Aligned World Models for End-to-End Autonomous Driving (in CARLA v2)
NeurIPS 2025 - -
RAD arXiv
RAD: Training an End-to-End Driving Policy via Large-Scale 3DGS-based Reinforcement Learning
NeurIPS 2025 Website -
TrajDiff arXiv
TrajDiff: End-to-End Autonomous Driving without Perception Annotation
arXiv 2025 - GitHub
SimScale arXiv
SimScale: Learning to Drive via Real-World Simulation at Scale
arXiv 2025 Website GitHub

2️⃣ Perception-Action Models

⏲️ In chronological order, from the earliest to the latest.

Model Paper Venue Website GitHub
ST-P3 arXiv
ST-P3: End-to-End Vision-Based Autonomous Driving via Spatial-Temporal Feature Learning
ECCV 2022 - GitHub
UniAD arXiv
Planning-Oriented Autonomous Driving
CVPR 2023 - GitHub
VAD arXiv
VAD: Vectorized Scene Representation for Efficient Autonomous Driving
ICCV 2023 - GitHub
OccNet arXiv
Scene as Occupancy
ICCV 2023 - GitHub
GenAD arXiv
GenAD: Generative End-to-End Autonomous Driving
ECCV 2024 - GitHub
PARA-Drive CVPR
PARA-Drive: Parallelized Architecture for Real-Time Autonomous Driving
CVPR 2024 Website -
Hydra-MDP CVPRW
Hydra-MDP: End-to-End Multimodal Planning with Multi-Target Hydra-Distillation
CVPRW 2024 Website GitHub
SparseAD arXiv
SparseAD: Sparse Query-Centric Paradigm for Efficient End-to-End Autonomous Driving
arXiv 2024 - -
GaussianAD arXiv
GaussianAD: Gaussian-Centric End-to-End Autonomous Driving
arXiv 2024 - -
DiFSD arXiv
DiFSD: Ego-Centric Fully Sparse Paradigm with Uncertainty Denoising and Iterative Refinement for Efficient End-to-End Self-Driving
arXiv 2024 - GitHub
DriveTransformer arXiv
DriveTransformer: Unified Transformer for Scalable End-to-End Autonomous Driving
ICLR 2025 - GitHub
SparseDrive arXiv
SparseDrive: End-to-End Autonomous Driving via Sparse Scene Representation
ICRA 2025 - GitHub
DiffusionDrive arXiv
DiffusionDrive: Truncated Diffusion Model for End-to-End Autonomous Driving
CVPR 2025 - GitHub
GoalFlow arXiv
GoalFlow: Goal-Driven Flow Matching for Multimodal Trajectories Generation in End-to-End Autonomous Driving
CVPR 2025 Website GitHub
GuideFlow arXiv
GuideFlow: Constraint-Guided Flow Matching for Planning in End-to-End Autonomous Driving
arXiv 2025 - GitHub
ETA arXiv
ETA: Efficiency through Thinking Ahead, A Dual Approach to Self-Driving with Large Models
arXiv 2025 - GitHub
Geo arXiv
Spatial Retrieval Augmented Autonomous Driving
arXiv 2025 - -
DiffusionDriveV2 arXiv
DiffusionDriveV2: Reinforcement Learning-Constrained Truncated Diffusion Modeling in End-to-End Autonomous Driving
arXiv 2025 - GitHub
NaviHydra arXiv
NaviHydra: Controllable Navigation-Guided End-to-End Autonomous Driving with Hydra Distillation
arXiv 2025 - -
Mimir arXiv
Mimir: Hierarchical Goal-Driven Diffusion with Uncertainty Propagation for End-to-End Autonomous Driving
arXiv 2025 - GitHub
FROST-Drive arXiv
FROST-Drive: Scalable and Efficient End-to-End Driving with a Frozen Vision Encoder
arXiv 2026 - -
DrivoR arXiv
Driving on Registers
arXiv 2026 Website GitHub
SPS arXiv
See Less, Drive Better: Generalizable End-to-End Autonomous Driving via Foundation Models Stochastic Patch Selection
arXiv 2026 - -

3️⃣ Image-Based World Models

⏲️ In chronological order, from the earliest to the latest.

Model Paper Venue Website GitHub
DriveDreamer arXiv
DriveDreamer: Towards Real-World-Driven World Models for Autonomous Driving
ECCV 2024 Website GitHub
GenAD arXiv
GenAD: Generalized Predictive Model for Autonomous Driving
CVPR 2024 - GitHub
Drive-WM arXiv
Driving into the Future: Multiview Visual Forecasting and Planning with World Model for Autonomous Driving
CVPR 2024 Website GitHub
DrivingWorld arXiv
DrivingWorld: Constructing World Model for Autonomous Driving via Video GPT
arXiv 2024 Website GitHub
Imagine-2-Drive arXiv
Imagine-2-Drive: Leveraging High-Fidelity World Models via Multi-Modal Diffusion Policies
IROS 2025 Website -
DrivingGPT arXiv
DrivingGPT: Unifying Driving World Modeling and Planning with Multi-Modal Autoregressive Transformers
ICCV 2025 Website -
Epona arXiv
Epona: Autoregressive Diffusion World Model for Autonomous Driving
ICCV 2025 Website GitHub
VaViM arXiv
VaViM and VaVAM: Autonomous Driving through Video Generative Modeling
arXiv 2025 Website GitHub
UniDrive-WM arXiv
UniDrive-WM: Unified Understanding, Planning and Generation World Model For Autonomous Driving
arXiv 2026 Website -

4️⃣ Occupancy-Based World Models

⏲️ In chronological order, from the earliest to the latest.

Model Paper Venue Website GitHub
OccWorld arXiv
OccWorld: Learning a 3D Occupancy World Model for Autonomous Driving
ECCV 2024 Website GitHub
NeMo ECCV
Neural Volumetric World Models for Autonomous Driving
ECCV 2024 - -
OccVAR OpenReview
OCCVAR: Scalable 4D Occupancy Prediction via Next-Scale Prediction
OpenReview 2024 - -
RenderWorld arXiv
RenderWorld: World Model with Self-Supervised 3D Label
arXiv 2024 - -
DFIT-OccWorld arXiv
An Efficient Occupancy World Model via Decoupled Dynamic Flow and Image-assisted Training
arXiv 2024 - -
Drive-OccWorld arXiv
Driving in the Occupancy World: Vision-Centric 4D Occupancy Forecasting and Planning via World Models for Autonomous Driving
AAAI 2025 Website GitHub
TΒ³Former arXiv
Temporal Triplane Transformers as Occupancy World Models
arXiv 2025 - -
AD-R1 arXiv
AD-R1: Closed-Loop Reinforcement Learning for End-to-End Autonomous Driving with Impartial World Models
arXiv 2025 - -
SparseOccVLA arXiv
SparseOccVLA: Bridging Occupancy and Vision-Language Models via Sparse Queries for Unified 4D Scene Understanding and Planning
arXiv 2026 - GitHub

5️⃣ Latent-Based World Models

⏲️ In chronological order, from the earliest to the latest.

Model Paper Venue Website GitHub
Covariate-Shift arXiv
Mitigating Covariate Shift in Imitation Learning for Autonomous Vehicles Using Latent Space Generative World Models
arXiv 2024 - -
World4Drive arXiv
World4Drive: End-to-End Autonomous Driving via Intention-aware Physical Latent World Model
ICCV 2025 - -
WoTE arXiv
End-to-End Driving with Online Trajectory Evaluation via BEV World Model
ICCV 2025 - GitHub
LAW arXiv
Enhancing End-to-End Autonomous Driving with Latent World Model
ICLR 2025 - GitHub
SSR arXiv
Navigation-Guided Sparse Scene Representation for End-to-End Autonomous Driving
ICLR 2025 - GitHub
Echo-Planning arXiv
Echo Planning for Autonomous Driving: From Current Observations to Future Trajectories and Back
arXiv 2025 - -
SeerDrive arXiv
Future-Aware End-to-End Driving: Bidirectional Modeling of Trajectory Planning and Scene Evolution
NeurIPS 2025 - GitHub

2. Vision-Language-Action Models

1️⃣ Textual Action Generator

⏲️ In chronological order, from the earliest to the latest.

Model Paper Venue Website GitHub
DriveMLM arXiv
DriveMLM: Aligning Multi-Modal Large Language Models with Behavioral Planning States for Autonomous Driving
arXiv 2023 - GitHub
RAG-Driver arXiv
RAG-Driver: Generalisable Driving Explanations with Retrieval-Augmented In-Context Learning in Multi-Modal Large Language Model
RSS 2024 Website GitHub
RDA-Driver arXiv
Making Large Language Models Better Planners with Reasoning-Decision Alignment
ECCV 2024 - -
DriveLM arXiv
DriveLM: Driving with Graph Visual Question Answering
ECCV 2024 Website GitHub
DriveGPT4 arXiv
DriveGPT4: Interpretable End-to-end Autonomous Driving via Large Language Model
RA-L 2024 Website -
DriVLMe arXiv
DriVLMe: Enhancing LLM-based Autonomous Driving Agents with Embodied and Social Experience
IROS 2024 Website GitHub
LLaDA arXiv
Driving Everywhere with Large Language Model Policy Adaptation
CVPR 2024 Website GitHub
VLAAD WACVW
VLAAD: Vision and Language Assistant for Autonomous Driving
WACVW 2024 - GitHub
OccLLaMA arXiv
OccLLaMA: A Unified Occupancy-Language-Action World Model for Understanding and Generation Tasks in Autonomous Driving
arXiv 2024 Website -
Doe-1 arXiv
Doe-1: Closed-Loop Autonomous Driving with Large World Model
arXiv 2024 Website GitHub
LINGO-2 arXiv
LINGO-2: Driving with Natural Language
- Website -
SafeAuto arXiv
SafeAuto: Knowledge-Enhanced Safe Autonomous Driving with Multimodal Foundation Models
ICML 2025 - GitHub
OpenEMMA arXiv
OpenEMMA: Open-Source Multimodal Model for End-to-End Autonomous Driving
WACV 2025 - GitHub
ReasonPlan arXiv
ReasonPlan: Unified Scene Prediction and Decision Reasoning for Closed-loop Autonomous Driving
CoRL 2025 - GitHub
WKER arXiv
World Knowledge-Enhanced Reasoning Using Instruction-Guided Interactor in Autonomous Driving
AAAI 2025 - -
OmniDrive arXiv
OmniDrive: A Holistic LLM-Agent Framework for Autonomous Driving with 3D Perception, Reasoning and Planning
CVPR 2025 - GitHub
S4-Driver arXiv
S4-Driver: Scalable Self-Supervised Driving Multimodal Large Language Model with Spatio-Temporal Visual Representation
CVPR 2025 Website -
Occ-LLM arXiv
Occ-LLM: Enhancing Autonomous Driving with Occupancy-BasedLarge Language Models
ICRA 2025 - -
DriveBench arXiv
Are VLMs Ready for Autonomous Driving? An Empirical Study from the Reliability, Data, and Metric Perspectives
ICCV 2025 Website GitHub
FutureSightDrive arXiv
FutureSightDrive: Thinking Visually with Spatio-Temporal CoT for Autonomous Driving
NeurIPS 2025 Website GitHub
ImpromptuVLA arXiv
Impromptu VLA: Open Weights and Open Data for Driving Vision-Language-Action Models
NeurIPS 2025 Website GitHub
Sce2DriveX arXiv
Sce2DriveX: A Generalized MLLM Framework for Scene-to-Drive Learning
RA-L 2025 - -
EMMA arXiv
EMMA: End-to-End Multimodal Model for Autonomous Driving
TMLR 2025 Website -
DriveAgent-R1 arXiv
DriveAgent-R1: Advancing VLM-Based Autonomous Driving with Hybrid Thinking and Active Perception
arXiv 2025 - -
Drive-R1 arXiv
Drive-R1: Bridging Reasoning and Planning in VLMs for Autonomous Driving with Reinforcement Learning
arXiv 2025 - -
FastDriveVLA arXiv
FastDriveVLA: Efficient End-to-End Driving via Plug-and-Play Reconstruction-Based Token Pruning
arXiv 2025 - -
WiseAD arXiv
WiseAD: Knowledge Augmented End-to-End Autonomous Driving with Vision-Language Model
arXiv 2025 Website GitHub
AutoDrive-RΒ² arXiv
AutoDrive-RΒ²: Incentivizing Reasoning and Self-Reflection Capacity for VLA Model in Autonomous Driving
arXiv 2025 - -
OmniReason arXiv
OmniReason: A Temporal-Guided Vision-Language-Action Framework for Autonomous Driving
arXiv 2025 - -
OpenREAD arXiv
OpenREAD: Reinforced Open-Ended Reasoning for End-to-End Autonomous Driving with LLM-as-Critic
arXiv 2025 - GitHub
dVLM-AD arXiv
dVLM-AD: Enhance Diffusion Vision-Language-Model for Driving via Controllable Reasoning
arXiv 2025 - -
PLA arXiv
A Unified Perception-Language-Action Framework for Adaptive Autonomous Driving
arXiv 2025 - -
AlphaDrive arXiv
AlphaDrive: Unleashing the Power of VLMs in Autonomous Driving via Reinforcement Learning and Reasoning
arXiv 2025 - GitHub
CoReVLA arXiv
CoReVLA: A Dual-Stage End-to-End Autonomous Driving Framework for Long-Tail Scenarios via Collect-and-Refine
arXiv 2025 Website GitHub
WAM-Diff arXiv
WAM-Diff: A Masked Diffusion VLA Framework with MoE and Online Reinforcement Learning for Autonomous Driving
arXiv 2025 - GitHub

2️⃣ Numerical Action Generator

⏲️ In chronological order, from the earliest to the latest.

Model Paper Venue Website GitHub
LMDrive arXiv
LMDrive: Closed-Loop End-to-End Driving with Large Language Models
CVPR 2024 Website GitHub
BEVDriver arXiv
BEVDriver: Leveraging BEV Maps in LLMs for Robust Closed-Loop Driving
IROS 2025 - -
CoVLA-Agent arXiv
CoVLA: Comprehensive Vision-Language-Action Dataset for Autonomous Driving
WACV 2025 Website -
ORION arXiv
ORION: A Holistic End-to-End Autonomous Driving Framework by Vision-Language Instructed Action Generation
ICCV 2025 Website GitHub
SimLingo arXiv
SimLingo: Vision-Only Closed-Loop Autonomous Driving with Language-Action Alignment
CVPR 2025 Website GitHub
DriveGPT4-V2 CVPR
DriveGPT4-V2: Harnessing Large Language Model Capabilities for Enhanced Closed-Loop Autonomous Driving
CVPR 2025 - -
AutoVLA arXiv
AutoVLA: A Vision-Language-Action Model for End-to-End Autonomous Driving with Adaptive Reasoning and Reinforcement Fine-Tuning
NeurIPS 2025 Website GitHub
DriveMoE arXiv
DriveMoE: Mixture-of-Experts for Vision-Language-Action Model in End-to-End Autonomous Driving
arXiv 2025 Website GitHub
DSDrive arXiv
DSDrive: Distilling Large Language Model for Lightweight End-to-End Autonomous Driving with Unified Reasoning and Planning
arXiv 2025 - -
OccVLA arXiv
OccVLA: Vision-Language-Action Model with Implicit 3D Occupancy Supervision.
arXiv 2025 - -
VDRive arXiv
VDRive: Leveraging Reinforced VLA and Diffusion Policy for End-to-End Autonomous Driving
arXiv 2025 - -
ReflectDrive arXiv
Discrete Diffusion for Reflective Vision-Language-Action Models in Autonomous Driving
arXiv 2025 - GitHub
E3AD arXiv
E3AD: An Emotion-Aware Vision-Language-Action Model for Human-Centric End-to-End Autonomous Driving
arXiv 2025 - -
LCDrive arXiv
Latent Chain-of-Thought World Modeling for End-to-End Driving
arXiv 2025 - -
Alpamayo-R1 arXiv
Alpamayo-R1: Bridging Reasoning and Action Prediction for Generalizable Autonomous Driving in the Long Tail
arXiv 2025 - -
UniUGP arXiv
UniUGP: Unifying understanding, generation, and planing for end-to-end autonomous driving.
arXiv 2025 - -
MindDrive arXiv
MindDrive: An All-in-One Framework Bridging World Models and Vision-Language Model for End-to-End Autonomous Driving
arXiv 2025 - -
AdaThinkDrive arXiv
AdaThinkDrive: Adaptive Thinking via Reinforcement Learning for Autonomous Driving
arXiv 2025 - -
Percept-WAM arXiv
Percept-WAM: Perception-Enhanced World-Awareness-Action Model for Robust End-to-End Autonomous Driving
arXiv 2025 - -
Reasoning-VLA arXiv
Reasoning-VLA: A Fast and General Vision-Language-Action Reasoning Model for Autonomous Driving
arXiv 2025 - -
SpaceDrive arXiv
SpaceDrive: Infusing Spatial Awareness into VLM-Based Autonomous Driving
arXiv 2025 - -
OpenDriveVLA arXiv
OpenDriveVLA: Towards End-to-end Autonomous Driving with Large Vision Language Action Model
AAAI 2026 Website GitHub
WAM-Flow arXiv
WAM-Flow: Parallel Coarse-to-Fine Motion Planning via Discrete Flow Matching for Autonomous Driving
arXiv 2025 GitHub

3️⃣ Explicit Action Guidance

⏲️ In chronological order, from the earliest to the latest.

Model Paper Venue Website GitHub
DriveVLM arXiv
DriveVLM: The Convergence of Autonomous Driving and Large Vision-Language Models
CoRL 2024 Website -
LeapAD arXiv
Continuously Learning, Adapting, and Improving: A Dual-Process Approach to Autonomous Driving
NeurIPS 2024 Website GitHub
FasionAD arXiv
FASIONAD: Fast and Slow Fusion Thinking Systems for Human-Like Autonomous Driving with Adaptive Feedback
arXiv 2024 - -
Senna arXiv
Senna: Bridging Large Vision-Language Models and End-to-End Autonomous Driving
arXiv 2024 - GitHub
DualAD arXiv
DualAD: Dual-Layer Planning for Reasoning in Autonomous Driving
IROS 2025 Website GitHub
DME-Driver arXiv
DME-Driver: Integrating Human Decision Logic and 3D Scene Perception in Autonomous Driving
AAAI 2025 - -
SOLVE arXiv
SOLVE: Synergy of Language-Vision and End-to-End Networks for Autonomous Driving
CVPR 2025 - -
ReAL-AD arXiv
ReAL-AD: Towards Human-Like Reasoning in End-to-End Autonomous Driving
ICCV 2025 Website -
LeapVAD arXiv
LeapVAD: A Leap in Autonomous Driving via Cognitive Perception and Dual-Process Thinking
TNNLS 2025 - -
DiffVLA arXiv
DiffVLA: Vision-Language Guided Diffusion Planning for Autonomous Driving
arXiv 2025 - -
FasionAD++ arXiv
FASIONAD++: Integrating High-Level Instruction and Information Bottleneck in Fast-Slow fusion Systems for Enhanced Safety in Autonomous Driving with Adaptive Feedback
arXiv 2025 - -

4️⃣ Implicit Representations Transfer

⏲️ In chronological order, from the earliest to the latest.

Model Paper Venue Website GitHub
VLP arXiv
VLP: Vision Language Planning for Autonomous Driving
CVPR 2024 - -
VLM-AD arXiv
VLM-AD: End-to-End Autonomous Driving through Vision-Language Model Supervision
CoRL 2025 - -
DiMA arXiv
Distilling Multi-modal Large Language Models for Autonomous Driving
CVPR 2025 - -
DINO-Foresight arXiv
DINO-Foresight: Looking into the Future with DINO
NeurIPS 2025 Website GitHub
ALN-P3 arXiv
ALN-P3: Unified Language Alignment for Perception, Prediction, and Planning in Autonomous Driving
arXiv 2025 - -
VERDI arXiv
VERDI: VLM-Embedded Reasoning for Autonomous Driving
arXiv 2025 - -
VLM-E2E arXiv
VLM-E2E: Enhancing End-to-End Autonomous Driving with Multimodal Driver Attention Fusion
arXiv 2025 - -
ReCogDrive arXiv
ReCogDrive: A Reinforced Cognitive Framework for End-to-End Autonomous Driving
arXiv 2025 Website GitHub
InsightDrive arXiv
InsightDrive: Insight Scene Representation for End-to-End Autonomous Driving
arXiv 2025 - GitHub
NetRoller arXiv
NetRoller: Interfacing General and Specialized Models for End-to-End Autonomous Driving
arXiv 2025 - GitHub
ViLaD arXiv
ViLaD: A Large Vision Language Diffusion Framework for End-to-End Autonomous Driving
arXiv 2025 - -
OmniScene arXiv
OmniScene: Attention-Augmented Multimodal 4D Scene Understanding for Autonomous Driving
arXiv 2025 - -
LMAD arXiv
LMAD: Integrated End-to-End VisionLanguage Model for Explainable Autonomous Driving
arXiv 2025 - -

3. Datasets & Benchmarks

⏲️ In chronological order, from the earliest to the latest.

1️⃣ Vision-Action Datasets

Dataset Paper Venue Website GitHub
BDD100K arXiv
BDD100K: A Diverse Driving Dataset for Heterogeneous Multitask Learning
CVPR 2020 Website GitHub
nuScenes arXiv
nuScenes: A Multimodal Dataset for Autonomous Driving
CVPR 2020 Website -
Waymo arXiv
Scalability in Perception for Autonomous Driving: Waymo Open Dataset
CVPR 2020 Website GitHub
nuPlan arXiv
nuPlan: A Closed-Loop ML-Based Planning Benchmark for Autonomous Vehicles
arXiv 2021 Website GitHub
Argoverse 2 arXiv
Argoverse 2: Next Generation Datasets for Self-Driving Perception and Forecasting
NeurIPS 2021 Website GitHub
Bench2Drive arXiv
Bench2Drive: Towards Multi-Ability Benchmarking of Closed-Loop End-to-End Autonomous Driving
NeurIPS 2024 - GitHub
RoboBEV arXiv
Benchmarking and Improving Bird's Eye View Perception Robustness in Autonomous Driving
TPAMI 2025 - GitHub
WOD-E2E arXiv
WOD-E2E: Waymo Open Dataset for End-to-End Driving in Challenging Long-Tail Scenarios
arXiv 2025 Website GitHub

2️⃣ Vision-Language-Action Datasets

Dataset Paper Venue Website GitHub
BDD-X arXiv
Textual Explanations for Self-Driving Vehicles
ECCV 2018 - GitHub
Talk2Car IEEE
Talk2Car: Predicting Physical Trajectories for Natural Language Commands
IEEE Access 2022 - GitHub
SDN arXiv
DOROTHIE: Spoken Dialogue for Handling Unexpected Situations in Interactive Autonomous Driving Agents
EMNLP 2022 - GitHub
DriveMLM arXiv
DriveMLM: Aligning Multi-Modal Large Language Models with Behavioral Planning States for Autonomous Driving
arXiv 2023 - GitHub
LMDrive arXiv
LMDrive: Closed-Loop End-to-End Driving with Large Language Models
CVPR 2024 Website GitHub
DriveLM-nuScenes arXiv
DriveLM: Driving with Graph Visual Question Answering
ECCV 2024 Website GitHub
HBD arXiv
DME-Driver: Integrating Human Decision Logic and 3D Scene Perception in Autonomous Driving
AAAI 2025 - -
VLAAD WACVW
VLAAD: Vision and Language Assistant for Autonomous Driving
WACVW 2024 - GitHub
SUP-AD arXiv
DriveVLM: The Convergence of Autonomous Driving and Large Vision-Language Models
CoRL 2024 Website -
NuInstruct arXiv
Holistic Autonomous Driving Understanding by Bird's-Eye-View Injected Multi-Modal Large Models
CVPR 2024 - GitHub
WOMD-Reasoning arXiv
WOMD-Reasoning: A Large-Scale Dataset for Interaction Reasoning in Driving
ICML 2025 Website GitHub
DriveCoT arXiv
DriveCoT: Integrating Chain-of-Thought Reasoning with End-to-End Driving
arXiv 2024 Website -
Reason2Drive arXiv
Reason2Drive: Towards Interpretable and Chain-Based Reasoning for Autonomous Driving
ECCV 2024 - GitHub
DriveBench arXiv
Are VLMs Ready for Autonomous Driving? An Empirical Study from the Reliability, Data, and Metric Perspectives
ICCV 2025 Website GitHub
MetaAD arXiv
AlphaDrive: Unleashing the Power of VLMs in Autonomous Driving via Reinforcement Learning and Reasoning
arXiv 2025 Website GitHub
OmniDrive arXiv
OmniDrive: A Holistic LLM-Agent Framework for Autonomous Driving with 3D Perception, Reasoning and Planning
CVPR 2025 - GitHub
NuInteract arXiv
Extending Large Vision-Language Model for Diverse Interactive Tasks in Autonomous Driving
arXiv 2025 - -
DriveAction arXiv
DriveAction: A Benchmark for Exploring Human-like Driving Decisions in VLA Models
arXiv 2025 - -
ImpromptuVLA arXiv
Impromptu VLA: Open Weights and Open Data for Driving Vision-Language-Action Models
arXiv 2025 Website GitHub
CoVLA arXiv
CoVLA: Comprehensive Vision-Language-Action Dataset for Autonomous Driving
WACV 2025 Website -
OmniReason-nuScenes arXiv
OmniReason: A Temporal-Guided Vision-Language-Action Framework for Autonomous Driving
arXiv 2025 - -
OmniReason-B2D arXiv
OmniReason: A Temporal-Guided Vision-Language-Action Framework for Autonomous Driving
arXiv 2025 - -

4. Applications

5. Other Resources

For Tasks:

Click tags to check more tools for each tasks

For Jobs:

Alternative AI tools for awesome-vla-for-ad

Similar Open Source Tools

For similar tasks

No tools available

For similar jobs

No tools available