RollingDepth
[CVPR 2025] RollingDepth: Video Depth without Video Models
This repository represents the official implementation of the paper titled "Video Depth without Video Models". The project is written primarily in Python, distributed under the Apache License 2.0 license, first published in 2024. Key topics include: depth-estimation, diffusion, monocular-depth-estimation, video-depth, video-depth-estimation.
๐น RollingDepth: Video Depth without Video Models
CVPR 2025
This repository represents the official implementation of the paper titled "Video Depth without Video Models".
Bingxin Ke<sup>1</sup>,
Dominik Narnhofer<sup>1</sup>,
Shengyu Huang<sup>1</sup>,
Lei Ke<sup>2</sup>,
Torben Peters<sup>1</sup>,
Katerina Fragkiadaki<sup>2</sup>,
Anton Obukhov<sup>1</sup>,
Konrad Schindler<sup>1</sup>
<sup>1</sup>ETH Zurich,
<sup>2</sup>Carnegie Mellon University
๐ข News
2025-02-26: Paper is accepted to CVPR 2025. <br>
2024-12-02: Paper is on arXiv.<br>
2024-11-28: Inference code is released.<br>
๐ ๏ธ Setup
The inference code was tested on: Debian 12, Python 3.12.7 (venv), CUDA 12.4, GeForce RTX 3090
๐ฆ Repository
bashgit clone https://github.com/prs-eth/RollingDepth.git cd RollingDepth
๐ Python environment
Create python environment:
bash# with venv python -m venv venv/rollingdepth source venv/rollingdepth/bin/activate # or with conda conda create --name rollingdepth python=3.12 conda activate rollingdepth
๐ป Dependencies
Install dependicies:
bashpip install -r requirements.txt bash script/install_diffusers_dev.sh # Install modified diffusers with cross-frame self-attention
We use pyav for video I/O, which relies on ffmpeg (tested with version 5.1.6-0+deb12u1).
To see the modification in diffusers, search for comments "Modified in RollingDepth".
๐ Test on your videos
All scripts are designed to run from the project root directory.
๐ท Prepare input videos
-
Use sample videos:
bashbash script/download_sample_data.shThese example videos are to be used only as debug/demo input together with the code and should not be distributed outside of the repo.
-
Or place your videos in a directory, for example, under
data/samples.
๐ Run with presets
bashpython run_video.py \ -i data/samples \ -o output/samples_fast \ -p fast \ --verbose
-por--preset: preset optionsfastfor fast inference, with dilations [1, 25] (flexible), fp16, without refinement, at max. resolution 768.fast1024for fast inference at resolution 1024fullfor better details, with dilations [1, 10, 25] (flexible), fp16, with 10 refinement steps, at max. resolution 1024.paperfor reproducing paper numbers, with (fixed) dilations [1, 10, 25], fp32, with 10 refinement steps, at max. resolution 768.
-ior--input-video: path to input data, can be a single video file, a text file with video paths, or a directory of videos.-oor--output-dir: output directory.
Passing these inference arguments will overwrite the preset settings:
--resor--processing-resolution: the maximum resolution (in pixels) at which image processing will be performed. If set to 0, processes at the original input image resolution.--refine-step: number of refinement iterations to improve accuracy and details. Set to 0 to disable refinement.--snip-lenor--snippet-lengths: number of frames to analyze in each snippet.-dor--dilations: spacing between frames for temporal analysis, could have multiple values e.g.-d 1 10 25.
Clip sub-sequence to be processed:
--fromor--start-frame: the starting frame index for processing, default to 0.--framesor--frame-count: number of frames to process after the starting frame. Set to 0 (default) to process until the end of the video.
Output settings
--fpsor--output-fps: frame rate (FPS) for the output video. Set to 0 (default) to match the input video's frame rate.--restore-resor--restore-resolution: whether to restore the output to the original input resolution after processing, Default: False.--save-sbsor--save-side-by-side: whether to save side-by-side videos of RGB and colored depth. Default: True.--save-npy: whether to save depth maps as .npy files. Default: True.--save-snippets: whether to save initial snippets. Default: False
Other argumenets
- Please run
python run_video.py --helpto get details for other arguments. - For low GPU memory footage: pass
--max-vae-bs 1 --unload-snippet trueand use a smaller resolution, e.g.--res 512
โฌ Checkpoint cache
By default, the checkpoint is stored in the Hugging Face cache. The HF_HOME environment variable defines its location and can be overridden, e.g.:
export HF_HOME=$(pwd)/cache
Alternatively, use the following script to download the checkpoint weights locally and specify checkpoint path by -c checkpoint/rollingdepth-v1-0
bashbash script/download_weight.sh
๐ฆฟ Evaluation on test datasets
Coming soon
๐ Citation
bibtex@InProceedings{ke2024rollingdepth, title={Video Depth without Video Models}, author={Bingxin Ke and Dominik Narnhofer and Shengyu Huang and Lei Ke and Torben Peters and Katerina Fragkiadaki and Anton Obukhov and Konrad Schindler}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, year={2025} }
๐ Acknowledgments
We thank Yue Pan, Shuchang Liu, Nando Metzger, and Nikolai Kalischek for fruitful discussions.
We are grateful to redmond.ai (robin@redmond.ai) for providing GPU resources.
๐ซ License
This code of this work is licensed under the Apache License, Version 2.0 (as defined in the LICENSE).
The model is licensed under RAIL++-M License (as defined in the LICENSE-MODEL)
By downloading and using the code and model you agree to the terms in LICENSE and LICENSE-MODEL respectively.
Contributors
Showing top 1 contributor by commit count.
