Autonomous driving has long relied on modular "Perception-Decision-Action" pipelines, whose hand-crafted interfaces and rule-based components often struggle in complex, dynamic, or long-tailed scenarios. Their cascaded structure also amplifies upstream perception errors, undermining downstream planning and control. The project is written primarily in HTML, distributed under the MIT License license, first published in 2025. Key topics include: 3d, autonomous-driving, awesome-list, embodied-ai, large-language-models.
:sunglasses: Awesome VLA for Autonomous Driving
Autonomous driving has long relied on modular "Perception-Decision-Action" pipelines, whose hand-crafted interfaces and rule-based components often struggle in complex, dynamic, or long-tailed scenarios. Their cascaded structure also amplifies upstream perception errors, undermining downstream planning and control.
This survey reviews vision-action (VA) models and vision-language-action (VLA) models for autonomous driving. We trace the evolution from early VA approaches to modern VLA frameworks, and organize existing methods into two principal paradigms:
End-to-End VLA, which integrates perception, reasoning, and planning within a single model.
Dual-System VLA, which separates slow deliberation (via VLMs) from fast, safety-critical execution (via planners).
If you find this work helpful for your research, please kindly consider citing our paper:
bib
@article{survey_vla4ad,
title = {Vision-Language-Action Models for Autonomous Driving: Past, Present, and Future},
author = {Tianshuai Hu and Xiaolu Liu and Song Wang and Yiyao Zhu and Ao Liang and Lingdong Kong and Guoyang Zhao and Zeying Gong and Jun Cen and Zhiyu Huang and Xiaoshuai Hao and Linfeng Li and Hang Song and Xiangtai Li and Jun Ma and Shaojie Shen and Jianke Zhu and Dacheng Tao and Ziwei Liu and Junwei Liang},
journal = {arXiv preprint arXiv:2512.16760},
year = {2025},
}
bib
@article{survey_3d_4d_world_models,
title = {{3D} and {4D} World Modeling: A Survey},
author = {Lingdong Kong and Wesley Yang and Jianbiao Mei and Youquan Liu and Ao Liang and Dekai Zhu and Dongyue Lu and Wei Yin and Xiaotao Hu and Mingkai Jia and Junyuan Deng and Kaiwen Zhang and Yang Wu and Tianyi Yan and Shenyuan Gao and Song Wang and Linfeng Li and Liang Pan and Yong Liu and Jianke Zhu and Wei Tsang Ooi and Steven C. H. Hoi and Ziwei Liu},
journal = {arXiv preprint arXiv:2509.07996},
year = {2025}
}
<br>DiffusionDriveV2: Reinforcement Learning-Constrained Truncated Diffusion Modeling in End-to-End Autonomous Driving
arXiv 2025
-
NaviHydra
<br>NaviHydra: Controllable Navigation-Guided End-to-End Autonomous Driving with Hydra Distillation
arXiv 2025
-
-
Mimir
<br>Mimir: Hierarchical Goal-Driven Diffusion with Uncertainty Propagation for End-to-End Autonomous Driving
arXiv 2025
-
FROST-Drive
<br>FROST-Drive: Scalable and Efficient End-to-End Driving with a Frozen Vision Encoder
arXiv 2026
-
-
DrivoR
<br>Driving on Registers
arXiv 2026
SPS
<br>See Less, Drive Better: Generalizable End-to-End Autonomous Driving via Foundation Models Stochastic Patch Selection
arXiv 2026
-
-
BevAD
<br>What Matters for Scalable and Robust Learning in End-to-End Driving Planners?
CVPR 2026
:three: Image-Based World Models
:timer_clock: In chronological order, from the earliest to the latest.
Model
Paper
Venue
Website
GitHub
DriveDreamer
<br>DriveDreamer: Towards Real-World-Driven World Models for Autonomous Driving
ECCV 2024
GenAD
<br>GenAD: Generalized Predictive Model for Autonomous Driving
CVPR 2024
-
Drive-WM
<br>Driving into the Future: Multiview Visual Forecasting and Planning with World Model for Autonomous Driving
CVPR 2024
DrivingWorld
<br>DrivingWorld: Constructing World Model for Autonomous Driving via Video GPT
arXiv 2024
Imagine-2-Drive
<br>Imagine-2-Drive: Leveraging High-Fidelity World Models via Multi-Modal Diffusion Policies
IROS 2025
-
DrivingGPT
<br>DrivingGPT: Unifying Driving World Modeling and Planning with Multi-Modal Autoregressive Transformers
ICCV 2025
-
Epona
<br>Epona: Autoregressive Diffusion World Model for Autonomous Driving
ICCV 2025
VaViM
<br>VaViM and VaVAM: Autonomous Driving through Video Generative Modeling
arXiv 2025
UniDrive-WM
<br>UniDrive-WM: Unified Understanding, Planning and Generation World Model For Autonomous Driving
arXiv 2026
-
DwD
<br>Driving with DINO: Vision Foundation Features as a Unified Bridge for Sim-to-Real Generation in Autonomous Driving
arXiv 2026
-
-
WorldDrive
<br>Bridging Scene Generation and Planning: Driving with World Model via Unifying Vision and Motion Representation
arXiv 2026
-
:four: Occupancy-Based World Models
:timer_clock: In chronological order, from the earliest to the latest.
Model
Paper
Venue
Website
GitHub
OccWorld
<br>OccWorld: Learning a 3D Occupancy World Model for Autonomous Driving
ECCV 2024
NeMo
<br>Neural Volumetric World Models for Autonomous Driving
ECCV 2024
-
-
OccVAR
<br>OCCVAR: Scalable 4D Occupancy Prediction via Next-Scale Prediction
OpenReview 2024
-
-
RenderWorld
<br>RenderWorld: World Model with Self-Supervised 3D Label
arXiv 2024
-
-
DFIT-OccWorld
<br>An Efficient Occupancy World Model via Decoupled Dynamic Flow and Image-assisted Training
arXiv 2024
-
-
Drive-OccWorld
<br>Driving in the Occupancy World: Vision-Centric 4D Occupancy Forecasting and Planning via World Models for Autonomous Driving
AAAI 2025
TยณFormer
<br>Temporal Triplane Transformers as Occupancy World Models
arXiv 2025
-
-
OmniNWM
<br>OmniNWM: Omniscient Driving Navigation World Models
arXiv 2025
-
AD-R1
<br>AD-R1: Closed-Loop Reinforcement Learning for End-to-End Autonomous Driving with Impartial World Models
arXiv 2025
-
-
SparseOccVLA
<br>SparseOccVLA: Bridging Occupancy and Vision-Language Models via Sparse Queries for Unified 4D Scene Understanding and Planning
arXiv 2026
-
:five: Latent-Based World Models
:timer_clock: In chronological order, from the earliest to the latest.
Model
Paper
Venue
Website
GitHub
Covariate-Shift
<br>Mitigating Covariate Shift in Imitation Learning for Autonomous Vehicles Using Latent Space Generative World Models
arXiv 2024
-
-
World4Drive
<br>World4Drive: End-to-End Autonomous Driving via Intention-aware Physical Latent World Model
ICCV 2025
-
-
WoTE
<br>End-to-End Driving with Online Trajectory Evaluation via BEV World Model
ICCV 2025
-
LAW
<br>Enhancing End-to-End Autonomous Driving with Latent World Model
ICLR 2025
-
SSR
<br>Navigation-Guided Sparse Scene Representation for End-to-End Autonomous Driving
ICLR 2025
-
Echo-Planning
<br>Echo Planning for Autonomous Driving: From Current Observations to Future Trajectories and Back
arXiv 2025
-
-
SeerDrive
<br>Future-Aware End-to-End Driving: Bidirectional Modeling of Trajectory Planning and Scene Evolution
NeurIPS 2025
-
Drive-JEPA
<br>Drive-JEPA: Video JEPA Meets Multimodal Trajectory Distillation for End-to-End Driving
arXiv 2026
-
2. Vision-Language-Action Models
:one: Textual Action Generator
:timer_clock: In chronological order, from the earliest to the latest.
Model
Paper
Venue
Website
GitHub
DriveMLM
<br>DriveMLM: Aligning Multi-Modal Large Language Models with Behavioral Planning States for Autonomous Driving
arXiv 2023
-
RAG-Driver
<br>RAG-Driver: Generalisable Driving Explanations with Retrieval-Augmented In-Context Learning in Multi-Modal Large Language Model
RSS 2024
RDA-Driver
<br>Making Large Language Models Better Planners with Reasoning-Decision Alignment
ECCV 2024
-
-
DriveLM
<br>DriveLM: Driving with Graph Visual Question Answering
ECCV 2024
DriveGPT4
<br>DriveGPT4: Interpretable End-to-end Autonomous Driving via Large Language Model
RA-L 2024
-
DriVLMe
<br>DriVLMe: Enhancing LLM-based Autonomous Driving Agents with Embodied and Social Experience
IROS 2024
LLaDA
<br>Driving Everywhere with Large Language Model Policy Adaptation
CVPR 2024
VLAAD
<br>VLAAD: Vision and Language Assistant for Autonomous Driving
WACVW 2024
-
OccLLaMA
<br>OccLLaMA: A Unified Occupancy-Language-Action World Model for Understanding and Generation Tasks in Autonomous Driving
arXiv 2024
-
Doe-1
<br>Doe-1: Closed-Loop Autonomous Driving with Large World Model
arXiv 2024
LINGO-2
<br>LINGO-2: Driving with Natural Language
-
-
SafeAuto
<br>SafeAuto: Knowledge-Enhanced Safe Autonomous Driving with Multimodal Foundation Models
ICML 2025
-
OpenEMMA
<br>OpenEMMA: Open-Source Multimodal Model for End-to-End Autonomous Driving
WACV 2025
-
ReasonPlan
<br>ReasonPlan: Unified Scene Prediction and Decision Reasoning for Closed-loop Autonomous Driving
CoRL 2025
-
WKER
<br>World Knowledge-Enhanced Reasoning Using Instruction-Guided Interactor in Autonomous Driving
AAAI 2025
-
-
OmniDrive
<br>OmniDrive: A Holistic LLM-Agent Framework for Autonomous Driving with 3D Perception, Reasoning and Planning
CVPR 2025
-
S4-Driver
<br>S4-Driver: Scalable Self-Supervised Driving Multimodal Large Language Model with Spatio-Temporal Visual Representation
CVPR 2025
-
Occ-LLM
<br>Occ-LLM: Enhancing Autonomous Driving with Occupancy-BasedLarge Language Models
ICRA 2025
-
-
DriveBench
<br>Are VLMs Ready for Autonomous Driving? An Empirical Study from the Reliability, Data, and Metric Perspectives
ICCV 2025
FutureSightDrive
<br>FutureSightDrive: Thinking Visually with Spatio-Temporal CoT for Autonomous Driving
NeurIPS 2025
ImpromptuVLA
<br>Impromptu VLA: Open Weights and Open Data for Driving Vision-Language-Action Models
NeurIPS 2025
Sce2DriveX
<br>Sce2DriveX: A Generalized MLLM Framework for Scene-to-Drive Learning
RA-L 2025
-
-
EMMA
<br>EMMA: End-to-End Multimodal Model for Autonomous Driving
TMLR 2025
-
DriveAgent-R1
<br>DriveAgent-R1: Advancing VLM-Based Autonomous Driving with Hybrid Thinking and Active Perception
arXiv 2025
-
-
Drive-R1
<br>Drive-R1: Bridging Reasoning and Planning in VLMs for Autonomous Driving with Reinforcement Learning
arXiv 2025
-
-
FastDriveVLA
<br>FastDriveVLA: Efficient End-to-End Driving via Plug-and-Play Reconstruction-Based Token Pruning
arXiv 2025
-
-
WiseAD
<br>WiseAD: Knowledge Augmented End-to-End Autonomous Driving with Vision-Language Model
arXiv 2025
AutoDrive-Rยฒ
<br>AutoDrive-Rยฒ: Incentivizing Reasoning and Self-Reflection Capacity for VLA Model in Autonomous Driving
arXiv 2025
-
-
OmniReason
<br>OmniReason: A Temporal-Guided Vision-Language-Action Framework for Autonomous Driving
arXiv 2025
-
-
OpenREAD
<br>OpenREAD: Reinforced Open-Ended Reasoning for End-to-End Autonomous Driving with LLM-as-Critic
arXiv 2025
-
dVLM-AD
<br>dVLM-AD: Enhance Diffusion Vision-Language-Model for Driving via Controllable Reasoning
arXiv 2025
-
-
PLA
<br>A Unified Perception-Language-Action Framework for Adaptive Autonomous Driving
arXiv 2025
-
-
AlphaDrive
<br>AlphaDrive: Unleashing the Power of VLMs in Autonomous Driving via Reinforcement Learning and Reasoning
arXiv 2025
-
CoReVLA
<br>CoReVLA: A Dual-Stage End-to-End Autonomous Driving Framework for Long-Tail Scenarios via Collect-and-Refine
arXiv 2025
WAM-Diff
<br>WAM-Diff: A Masked Diffusion VLA Framework with MoE and Online Reinforcement Learning for Autonomous Driving
arXiv 2025
-
:two: Numerical Action Generator
:timer_clock: In chronological order, from the earliest to the latest.
Model
Paper
Venue
Website
GitHub
LMDrive
<br>LMDrive: Closed-Loop End-to-End Driving with Large Language Models
CVPR 2024
BEVDriver
<br>BEVDriver: Leveraging BEV Maps in LLMs for Robust Closed-Loop Driving
IROS 2025
-
-
CoVLA-Agent
<br>CoVLA: Comprehensive Vision-Language-Action Dataset for Autonomous Driving
WACV 2025
-
ORION
<br>ORION: A Holistic End-to-End Autonomous Driving Framework by Vision-Language Instructed Action Generation
ICCV 2025
SimLingo
<br>SimLingo: Vision-Only Closed-Loop Autonomous Driving with Language-Action Alignment
CVPR 2025
DriveGPT4-V2
<br>DriveGPT4-V2: Harnessing Large Language Model Capabilities for Enhanced Closed-Loop Autonomous Driving
CVPR 2025
-
-
AutoVLA
<br>AutoVLA: A Vision-Language-Action Model for End-to-End Autonomous Driving with Adaptive Reasoning and Reinforcement Fine-Tuning
NeurIPS 2025
DriveMoE
<br>DriveMoE: Mixture-of-Experts for Vision-Language-Action Model in End-to-End Autonomous Driving
arXiv 2025
DSDrive
<br>DSDrive: Distilling Large Language Model for Lightweight End-to-End Autonomous Driving with Unified Reasoning and Planning
arXiv 2025
-
-
OccVLA
<br>OccVLA: Vision-Language-Action Model with Implicit 3D Occupancy Supervision.
arXiv 2025
-
-
VDRive
<br>VDRive: Leveraging Reinforced VLA and Diffusion Policy for End-to-End Autonomous Driving
arXiv 2025
-
-
ReflectDrive
<br>Discrete Diffusion for Reflective Vision-Language-Action Models in Autonomous Driving
arXiv 2025
-
E3AD
<br>E3AD: An Emotion-Aware Vision-Language-Action Model for Human-Centric End-to-End Autonomous Driving
arXiv 2025
-
-
LCDrive
<br>Latent Chain-of-Thought World Modeling for End-to-End Driving
arXiv 2025
-
-
Alpamayo-R1
<br>Alpamayo-R1: Bridging Reasoning and Action Prediction for Generalizable Autonomous Driving in the Long Tail
arXiv 2025
-
-
UniUGP
<br>UniUGP: Unifying understanding, generation, and planing for end-to-end autonomous driving.
arXiv 2025
-
-
MindDrive
<br>MindDrive: An All-in-One Framework Bridging World Models and Vision-Language Model for End-to-End Autonomous Driving
arXiv 2025
-
-
AdaThinkDrive
<br>AdaThinkDrive: Adaptive Thinking via Reinforcement Learning for Autonomous Driving
arXiv 2025
-
-
Percept-WAM
<br>Percept-WAM: Perception-Enhanced World-Awareness-Action Model for Robust End-to-End Autonomous Driving
arXiv 2025
-
-
Reasoning-VLA
<br>Reasoning-VLA: A Fast and General Vision-Language-Action Reasoning Model for Autonomous Driving
arXiv 2025
-
-
SpaceDrive
<br>SpaceDrive: Infusing Spatial Awareness into VLM-Based Autonomous Driving
arXiv 2025
-
-
OpenDriveVLA
<br>OpenDriveVLA: Towards End-to-end Autonomous Driving with Large Vision Language Action Model
AAAI 2026
WAM-Flow
<br>WAM-Flow: Parallel Coarse-to-Fine Motion Planning via Discrete Flow Matching for Autonomous Driving
CVPR 2026
ColaVLA
<br>ColaVLA: Leveraging Cognitive Latent Reasoning for Hierarchical Parallel Trajectory Planning in Autonomous Driving
CVPR 2026
:three: Explicit Action Guidance
:timer_clock: In chronological order, from the earliest to the latest.
Model
Paper
Venue
Website
GitHub
DriveVLM
<br>DriveVLM: The Convergence of Autonomous Driving and Large Vision-Language Models
CoRL 2024
-
LeapAD
<br>Continuously Learning, Adapting, and Improving: A Dual-Process Approach to Autonomous Driving
NeurIPS 2024
FasionAD
<br>FASIONAD: Fast and Slow Fusion Thinking Systems for Human-Like Autonomous Driving with Adaptive Feedback
arXiv 2024
-
-
Senna
<br>Senna: Bridging Large Vision-Language Models and End-to-End Autonomous Driving
arXiv 2024
-
DualAD
<br>DualAD: Dual-Layer Planning for Reasoning in Autonomous Driving
IROS 2025
DME-Driver
<br>DME-Driver: Integrating Human Decision Logic and 3D Scene Perception in Autonomous Driving
AAAI 2025
-
-
SOLVE
<br>SOLVE: Synergy of Language-Vision and End-to-End Networks for Autonomous Driving
CVPR 2025
-
-
ReAL-AD
<br>ReAL-AD: Towards Human-Like Reasoning in End-to-End Autonomous Driving
ICCV 2025
-
LeapVAD
<br>LeapVAD: A Leap in Autonomous Driving via Cognitive Perception and Dual-Process Thinking
TNNLS 2025
-
-
DiffVLA
<br>DiffVLA: Vision-Language Guided Diffusion Planning for Autonomous Driving
arXiv 2025
-
-
FasionAD++
<br>FASIONAD++: Integrating High-Level Instruction and Information Bottleneck in Fast-Slow fusion Systems for Enhanced Safety in Autonomous Driving with Adaptive Feedback
arXiv 2025
-
-
HiST-VLA
<br>HiST-VLA: A Hierarchical Spatio-Temporal Vision-Language-Action Model for End-to-End Autonomous Driving
arXiv 2026
-
-
:four: Implicit Representations Transfer
:timer_clock: In chronological order, from the earliest to the latest.
Model
Paper
Venue
Website
GitHub
VLP
<br>VLP: Vision Language Planning for Autonomous Driving
CVPR 2024
-
-
VLM-AD
<br>VLM-AD: End-to-End Autonomous Driving through Vision-Language Model Supervision
CoRL 2025
-
-
DiMA
<br>Distilling Multi-modal Large Language Models for Autonomous Driving
CVPR 2025
-
-
DINO-Foresight
<br>DINO-Foresight: Looking into the Future with DINO
NeurIPS 2025
ALN-P3
<br>ALN-P3: Unified Language Alignment for Perception, Prediction, and Planning in Autonomous Driving
arXiv 2025
-
-
VERDI
<br>VERDI: VLM-Embedded Reasoning for Autonomous Driving
arXiv 2025
-
-
VLM-E2E
<br>VLM-E2E: Enhancing End-to-End Autonomous Driving with Multimodal Driver Attention Fusion
arXiv 2025
-
-
ReCogDrive
<br>ReCogDrive: A Reinforced Cognitive Framework for End-to-End Autonomous Driving
arXiv 2025
InsightDrive
<br>InsightDrive: Insight Scene Representation for End-to-End Autonomous Driving
arXiv 2025
-
NetRoller
<br>NetRoller: Interfacing General and Specialized Models for End-to-End Autonomous Driving
arXiv 2025
-
ViLaD
<br>ViLaD: A Large Vision Language Diffusion Framework for End-to-End Autonomous Driving
arXiv 2025
-
-
OmniScene
<br>OmniScene: Attention-Augmented Multimodal 4D Scene Understanding for Autonomous Driving
arXiv 2025
-
-
LMAD
<br>LMAD: Integrated End-to-End VisionLanguage Model for Explainable Autonomous Driving
arXiv 2025
-
-
BEVLM
<br>BEVLM: Distilling Semantic Knowledge from LLMs into Bird's-Eye View Representations
arXiv 2026
-
-
3. Datasets & Benchmarks
:timer_clock: In chronological order, from the earliest to the latest.
:one: Vision-Action Datasets
Dataset
Paper
Venue
Website
GitHub
BDD100K
<br>BDD100K: A Diverse Driving Dataset for Heterogeneous Multitask Learning
CVPR 2020
nuScenes
<br>nuScenes: A Multimodal Dataset for Autonomous Driving
CVPR 2020
-
Waymo
<br>Scalability in Perception for Autonomous Driving: Waymo Open Dataset
CVPR 2020
nuPlan
<br>nuPlan: A Closed-Loop ML-Based Planning Benchmark for Autonomous Vehicles
arXiv 2021
Argoverse 2
<br>Argoverse 2: Next Generation Datasets for Self-Driving Perception and Forecasting
NeurIPS 2021
Bench2Drive
<br>Bench2Drive: Towards Multi-Ability Benchmarking of Closed-Loop End-to-End Autonomous Driving
NeurIPS 2024
-
RoboBEV
<br>Benchmarking and Improving Bird's Eye View Perception Robustness in Autonomous Driving
TPAMI 2025
-
WOD-E2E
<br>WOD-E2E: Waymo Open Dataset for End-to-End Driving in Challenging Long-Tail Scenarios
arXiv 2025
navdream
<br>The Constant Eye: Benchmarking and Bridging Appearance Robustness in Autonomous Driving
arXiv 2026
-
-
:two: Vision-Language-Action Datasets
Dataset
Paper
Venue
Website
GitHub
BDD-X
<br>Textual Explanations for Self-Driving Vehicles
ECCV 2018
-
Talk2Car
<br>Talk2Car: Predicting Physical Trajectories for Natural Language Commands
IEEE Access 2022
-
SDN
<br>DOROTHIE: Spoken Dialogue for Handling Unexpected Situations in Interactive Autonomous Driving Agents
EMNLP 2022
-
DriveMLM
<br>DriveMLM: Aligning Multi-Modal Large Language Models with Behavioral Planning States for Autonomous Driving
arXiv 2023
-
LMDrive
<br>LMDrive: Closed-Loop End-to-End Driving with Large Language Models
CVPR 2024
DriveLM-nuScenes
<br>DriveLM: Driving with Graph Visual Question Answering
ECCV 2024
HBD
<br>DME-Driver: Integrating Human Decision Logic and 3D Scene Perception in Autonomous Driving
AAAI 2025
-
-
VLAAD
<br>VLAAD: Vision and Language Assistant for Autonomous Driving
WACVW 2024
-
SUP-AD
<br>DriveVLM: The Convergence of Autonomous Driving and Large Vision-Language Models
CoRL 2024
-
NuInstruct
<br>Holistic Autonomous Driving Understanding by Bird's-Eye-View Injected Multi-Modal Large Models
CVPR 2024
-
WOMD-Reasoning
<br>WOMD-Reasoning: A Large-Scale Dataset for Interaction Reasoning in Driving
ICML 2025
DriveCoT
<br>DriveCoT: Integrating Chain-of-Thought Reasoning with End-to-End Driving
arXiv 2024
-
Reason2Drive
<br>Reason2Drive: Towards Interpretable and Chain-Based Reasoning for Autonomous Driving
ECCV 2024
-
DriveBench
<br>Are VLMs Ready for Autonomous Driving? An Empirical Study from the Reliability, Data, and Metric Perspectives
ICCV 2025
MetaAD
<br>AlphaDrive: Unleashing the Power of VLMs in Autonomous Driving via Reinforcement Learning and Reasoning
arXiv 2025
OmniDrive
<br>OmniDrive: A Holistic LLM-Agent Framework for Autonomous Driving with 3D Perception, Reasoning and Planning
CVPR 2025
-
NuInteract
<br>Extending Large Vision-Language Model for Diverse Interactive Tasks in Autonomous Driving
arXiv 2025
-
-
DriveAction
<br>DriveAction: A Benchmark for Exploring Human-like Driving Decisions in VLA Models
arXiv 2025
-
-
ImpromptuVLA
<br>Impromptu VLA: Open Weights and Open Data for Driving Vision-Language-Action Models
arXiv 2025
CoVLA
<br>CoVLA: Comprehensive Vision-Language-Action Dataset for Autonomous Driving
WACV 2025
-
OmniReason-nuScenes
<br>OmniReason: A Temporal-Guided Vision-Language-Action Framework for Autonomous Driving
arXiv 2025
-
-
OmniReason-B2D
<br>OmniReason: A Temporal-Guided Vision-Language-Action Framework for Autonomous Driving