GitPedia

Depth any camera

[CVPR 2025] Depth Any Camera: Zero-Shot Metric Depth Estimation from Any Camera

From yuliangguo·Updated June 12, 2026·View on GitHub·

Depth Any Camera: Zero-Shot Metric Depth Estimation from Any Camera The project is written primarily in Python, distributed under the MIT License license, first published in 2025. Key topics include: 3d-perception, foundation-models, metric-depth-estimation, monocular-depth-estimation, universal-camera-model.

<div align="center"> <h1>Depth Any Camera: Zero-Shot Metric Depth Estimation from Any Camera</h1>

Yuliang Guo<sup>1*†</sup> · Sparsh Garg<sup>2†</sup> · S. Mahdi H. Miangoleh<sup>3</sup> · Xinyu Huang<sup>1</sup> · Liu Ren<sup>1</sup>

<sup>1</sup>Bosch Research North America    <sup>2</sup>Carnegie Mellon University    <sup>3</sup>Simon Fraser University    

 *corresponding author †equal technical contribution

<a href="https://arxiv.org/abs/2501.02464"><img src='https://img.shields.io/badge/arXiv-Depth Any Camera-red' alt='Paper PDF'></a>
<a href='https://yuliangguo.github.io/depth-any-camera/'><img src='https://img.shields.io/badge/Project_Page-Depth Any Camera-green' alt='Project Page'></a>
<a href='https://huggingface.co/yuliangguo/depth-any-camera'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Models-yellow'></a>

CVPR 2025

</div> <p align="center"> <img src="docs/teaser.png" alt="teaser" style="width: 70%;"> </p>

Depth Any Camera (DAC) is a zero-shot metric depth estimation framework that extends a perspective-trained model to handle any type of camera with varying FoVs effectively.

Notably, DAC can be trained exclusively on perspective images, yet it generalizes seamlessly to fisheye and 360 cameras without requiring specialized training data. Key features include:

  1. Zero-shot metric depth estimation on fisheye and 360 images, significantly outperforming prior metric depth SoTA Metric3D-v2 and UniDepth.
  2. Geometry-based training framework adaptable to any network architecture, extendable to other 3D perception tasks.

Tired of collecting new data for specific cameras? DAC maximizes the utility of every existing 3D data for training, regardless of the specific camera types used in new applications.

News

  • 2026-03-28: The code for UniDAC (CVPR 2026) -- a universial version of DAC model -- is released.
  • 2025-03-12: Add scripts to run fisheye scenes of ZipNeRF and ScanNet++ datasets. Results are downloadable from ZipNeRF DAC results, and ScanNet++ DAC results to facilitate NeRF and Gaussian Splatting from generic cameras, e.g., 3DGEER.
  • 2025-02-26: Depth Any Camera accepted by CVPR 2025!
  • 2025-01-21: Demo code for easy setup and usage.
  • 2025-01-13: Release of pre-trained DepthAnyCamera (DAC) models trained on moderately sized datasets.
  • 2025-01-13: Testing and evaluation pipeline for zero-shot metric depth estimation on perspective, fisheye, and 360-degree datasets.
  • 2025-01-13: Complete DepthAnyCamera (DAC) training pipeline using mixed perspective camera data.
  • 2025-01-13: Complete data preparation and curation scripts.
  • [TBD] Foundation-level model trained on a large-scale, diverse dataset mixture, encompassing perspective, fisheye, and 360-degree camera data.

Visualization

ScanNet++ fisheye

The zero-shot metric depth estimation results of Depth Any Camera (DAC) are visualized on ScanNet++ fisheye videos and compared to Metric3D-v2. The visualizations of A.Rel error against ground truth highlight the superior performance of DAC.

<p align="center"> <img src="docs/video_scannet++_1.gif" alt="animated" /> </p>

Matterport3D single-view reconstruction

Additionally, we showcase DAC's application on 360-degree images, where a single forward pass of depth estimation enables full 3D scene reconstruction.

<p align="center"> <img src="docs/video_matterport3d_1.gif" alt="animated" /> </p>

Additional visual results and comparison with the prior SoTA can be found at <a href='https://yuliangguo.github.io/depth-any-camera/'><img src='https://img.shields.io/badge/Project_Page-Depth Any Camera-green' alt='Project Page'></a>

Performance

Depth Any Camera performs <b>significantly better</b> than the previous SoTA <b>metric</b> depth estimation models Metric3D-v2 and UniDepth in zero-shot generalization to large FoV camera images given <b>significantly smaller training dataset and model size</b>.

MethodTraining Data SizeMatterport3D (360)Pano3D-GV2 (360)ScanNet++ (fisheye)KITTI360 (fisheye)
AbsRel$\delta_1$AbsRel$\delta_1$AbsRel$\delta_1$AbsRel$\delta_1$
UniDepth-VitL3M0.76480.25760.78920.24690.49710.36380.29390.4810
Metric3D-v2-VitL16M0.29240.43810.30700.40400.22290.53600.19970.7159
Ours-Resnet101670K-indoor / 130K-outdoor0.1560.77270.13870.81150.13230.85170.15590.7858
Ours-SwinL670K-indoor / 130K-outdoor0.17890.72310.18360.72870.12820.85440.14870.8222

We highlight the best and second best results in bold and italic respectively (better results: AbsRel $\downarrow$ , $\delta_1 \uparrow$).

<!-- ## Pipeline ![pipeline](docs/pipeline.png) -->

Usage

Installation

Clone the Repository

bash
git clone https://github.com/yuliangguo/depth_any_camera cd depth_any_camera

Docker Installation

This repository can be run from within Docker, as long as the NVIDIA Container Toolkit is properly configured.
For Ubuntu Installation steps, refer to this guide.

bash
# Build the container docker build -t dac:latest . # Enter the container docker run --gpus all --network host -v $(pwd):/depth_any_camera --rm -it dac /bin/bash # Once within the container, # source the post-entry-hooks.sh to finish the install. source post-entry-hooks.sh

Conda Installation

Alternatively, this repository can be run from within Conda alone.

bash
conda create -n dac python=3.9 -y conda activate dac pip install torch==1.13.1+cu116 torchvision==0.14.1+cu116 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu116 pip install -r requirements.txt export PYTHONPATH="$PWD:$PYTHONPATH" cd dac/models/ops/ pip install -e . cd ../../../

Data Preparation

Our current training set is very slim compared to prior fundation models. Currently, DAC is trained on a combination set of 3 labeled datasets (670k images) for indoor model and a combination of 2 datasets (130k) for outdoor model. Two 360 datasets and two fisheye datasets are used for zero-shot testing.

data

Please refer to DATA.md for detailed datasets preparation. Make sure the relative paths of datasets have been set correctly before proceeding to the actual testing and training sections.

Pre-trained models

We provide two indoor models and two outdoor modeling considering Resnet101 and SwinTransformer-Large (SwinL) as backbones. In addition, we also provide two weaker baseline models for comparison. The download links can be found in the following table or from <a href='https://huggingface.co/yuliangguo/depth-any-camera'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Models-yellow'></a>. We suggest to save download both the model configs and model weights at checkpoints in order to run our scripts directly.

Model NameTraining DatasetsModel ConfigsWeights
dac-indoor-resnet101 (ours)indoor mix 670khuggingfacehuggingface
dac-indoor-swinL (ours)indoor mix 670khuggingfacehuggingface
dac-outdoor-resnet101 (ours)outdoor mix 130khuggingfacehuggingface
dac-outdoor-swinL (ours)outdoor mix 130khuggingfacehuggingface
idisc-metric3d-indoor-resnet101 (weak baseline 1)indoor mix 670khuggingfacehuggingface
cnndepth-metric3d-indoor-resnet101 (weak baseline 2)indoor mix 670khuggingfacehuggingface

Demo

We have provided a ready-to-run demo scripts in the demo folder. demo/demo_dac_indoor.py demonstrates how to perform inference on various types of camera data, including ScanNet++ (fisheye), Matterport3D (360), and NYU (perspective), using a single metric depth model trained on perspective images. The code generates point cloud files *.ply and visualization results as shown below:

<p align="center"> <img src="demo/output/scannetpp_output_remap_subplot.jpg" alt="demo output" style="width: 70%;"> </p>

demo/demo_dac_outdoor.py similarly demonstrates how a single outdoor model handle different types of camera data, including kitti (perspective) and kitti360 (fisheye).

Instead, we also provide demo script for dealing one sample, you may follow the following example command:

bash
python demo/demo_dac_single.py --config-file checkpoints/dac_swinl_indoor.json --model-file checkpoints/dac_swinl_indoor.pt --sample-file demo/input/scannetpp_sample.json --out-dir demo/output

Run on ZipNeRF (fisheye) and ScanNet++ (fisheye) Scenes

To better facilitate the development of Neural Reconstruction technique on fisheye inputs, e.g., EVER, 3DGUT and 3DGEER, we provide scripts to conduct depth estimation on fisheye images for a whole scene folder.

bash
python demo/run_dac_zipnerf_scene.py python demo/run_dac_scannetpp_scene.py

Our depth results can be directly downloaded from ZipNeRF DAC results, and ScanNet++ DAC results. The results folder can be simply merged with the original dataset for further usage. The predicted depth maps are saved in uint16 images, with visualization as shown below.

<p align="center"> <img src="docs/zipnerf_alameda_vis.gif" alt="animated" style="width: 85%;"/> </p>

Important note: Our depth maps record the Euclidean distance to the camera center instead of the z-distance to the image plane, which is not appropriate for large-FoV camera models.

Testing

Given provided pretrained models saved in checkpoints/, the following code can be run to test and evaluate on certain dataset, e.g., ScanNet++:

bash
python script/test_dac.py --model-file checkpoints/dac_swinl_indoor.pt --model-name IDiscERP --config-file configs/test/dac_swinl_indoor_test_scannetpp.json --base-path datasets --vis

Different config files for testing all the reported datasets are included in configs/test. Interested users could also refer to the provided lauch.json for convinient use or debug provided testing scripts in VSCode. The following tables lay out those most relative ones.

Testing datasetTesting script--model-file--config-file--model-name
Matterportscripts/test_dac.pycheckpoints/dac-indoor-resnet101.ptrelative pathIDiscERP
Gibson-V2^^relative pathIDiscERP
ScanNet++^^relative pathIDiscERP
NYU^^relative pathIDiscERP
KITTI360^checkpoints/dac-outdoor-resnet101.ptrelative pathIDisc
KITTI^^relative pathIDisc
...scripts/test_persp.pycheckpoints/idisc-......IDisc
...^checkpoints/cnndepth-......CNNDepth

Note: IDiscERP is our modified version of the IDisc model, incorporating isolated image and positional encoding features. It has been observed to improve results in small-size data training, particularly for better depth-scale equivariance. However, these modifications are not essential for large dataset training. CNNDepth refers to the CNN portion of the IDisc model, which serves as a network baseline but consistently underperforms compared to other models.

The ResNet101 models and configuration files can be replaced with the corresponding Swin-L versions. Ensure that the --model-name parameter matches the type of trained model. For users interested in comparing our DAC framework with the Metric3D training framework, we have provided pre-trained weak baselines along with their testing scripts, as detailed in the last two rows of the table.

Training

To train metric depth estimation models under the DepthAnyCamera (DAC) framework, you can run the default code for indoor training datasets as follows:

bash
python scripts/train_dac.py --config-file configs/train/hm3d+taskonomy+hypersim/hm3d+taskonomy+hypersim_dac_r101.json --base-path datasets --distributed --model-name IDiscERP

If you wish to train with a larger backbone, use the following command:

bash
python scripts/train_dac_large.py --config-file configs/train/hm3d+taskonomy+hypersim/hm3d+taskonomy+hypersim_dac_swinl_s2.json --base-path datasets --distributed --model-name IDiscERP

For users interested in comparing our DAC framework with the Metric3D training framework, the following command can be used:

bash
python scripts/train_persp.py --config-file configs/train/hm3d+taskonomy+hypersim/hm3d+taskonomy+hypersim_r101.json --base-path datasets --distributed --model-name IDisc

The corresponding testing script can be found at scripts/test_persp.py.

Similar commands apply to outdoor model training. There are various options available depending on the dataset or architecture. Interested users can refer to the table below for basic usage or consult the provided launch.json for convenient use or debugging in VSCode. We also provide all the training configurations we’ve used in configs/train.

Training TargetTraining script--config-file--model-name
dac-indoor-resnet101scripts/train_dac.pyrelative pathIDiscERP or IDisc or CNNDepth
dac-indoor-swinlscripts/train_dac_large.pyrelative pathIDiscERP or IDisc or CNNDepth
dac-outdoor-resnet101scripts/train_dac.pyrelative pathIDiscERP or IDisc or CNNDepth
dac-outdoor-swinlscripts/train_dac_large.pyrelative pathIDiscERP or IDisc or CNNDepth
metric3d-indoor-resnet101scripts/train_persp.pyrelative pathIDisc or CNNDepth
metric3d-indoor-swinlscripts/train_persp.pyrelative pathIDisc or CNNDepth
metric3d-outdoor-resnet101scripts/train_persp.pyrelative pathIDisc or CNNDepth
metric3d-outdoor-swinlscripts/train_persp.pyrelative pathIDisc or CNNDepth

No camera parameters available?

State-of-the-art deep learning auto-calibration methods can rescue this situation. Consider using them to estimate the distorted camera parameters before applying our DAC.

Acknowledgements

We thank the authors of the following awesome codebases:

For developers interested in multi-view-stereo designs contributing to the cross-camera generalization problem, we refer them to the following insightful works

License

This software is released under MIT license. You can view a license summary here.

Citation

If you find our work useful in your research please consider citing our publication:

bibtex
@inproceedings{Guo2025DepthAnyCamera, title={Depth Any Camera: Zero-Shot Metric Depth Estimation from Any Camera}, author={Yuliang Guo and Sparsh Garg and S. Mahdi H. Miangoleh and Xinyu Huang and Liu Ren}, booktitle={CVPR}, year={2025} }

Contributors

Showing top 4 contributors by commit count.

View all contributors on GitHub →

This article is auto-generated from yuliangguo/depth_any_camera via the GitHub API.Last fetched: 6/19/2026