GitPedia

Efficientvit

Efficient vision foundation models for high-resolution generation and perception.

From mit-han-labยทUpdated June 14, 2026ยทView on GitHubยท

- (๐Ÿ”ฅ New) [2025/09/05] We will no longer maintain this codebase. All future updates and announcements will be made on [DC-Gen](https://github.com/dc-ai-projects/DC-Gen). - (๐Ÿ”ฅ New) [2025/01/24] We released DC-AE-SANA-1.1: [doc](https://github.com/mit-han-lab/efficientvit/blob/master/assets/docs/dc_ae_sana_1.1.md). - (๐Ÿ”ฅ New) [2025/01/23] DC-AE and SANA are accepted by ICLR 2025. - (๐Ÿ”ฅ New) [2025/01/14] We released **DC-AE+USiT models**: [model](https://huggingface.co/collections/mit-han-lab/dc-... The project is written primarily in Python, distributed under the Apache License 2.0 license, first published in 2023. It has gained significant community traction with 3,321 stars and 250 forks on GitHub. Key topics include: deep-compression-autoencoder, efficient-diffusion-model, efficientvit, high-resolution, imagenet.

Efficient Vision Foundation Models for High-Resolution Generation and Perception

PWC

News

  • (๐Ÿ”ฅ New) [2025/09/05] We will no longer maintain this codebase. All future updates and announcements will be made on DC-Gen.
  • (๐Ÿ”ฅ New) [2025/01/24] We released DC-AE-SANA-1.1: doc.
  • (๐Ÿ”ฅ New) [2025/01/23] DC-AE and SANA are accepted by ICLR 2025.
  • (๐Ÿ”ฅ New) [2025/01/14] We released DC-AE+USiT models: model, training. Using the default training settings and sampling strategy, DC-AE+USiT-2B achieves 1.72 FID on ImageNet 512x512, surpassing the SOTA diffusion model EDM2-XXL and SOTA auto-regressive image generative models (MAGVIT-v2 and MAR-L).

Content

[ICLR 2025] Deep Compression Autoencoder for Efficient High-Resolution Diffusion Models [paper] [readme] [poster]

Deep Compression Autoencoder (DC-AE) is a new family of high-spatial compression autoencoders with a spatial compression ratio of up to 128 while maintaining reconstruction quality. It accelerates all latent diffusion models regardless of the diffusion model architecture.

Demo

demo

<p align="center"> <b> Figure 1: We address the reconstruction accuracy drop of high spatial-compression autoencoders. </p>

demo

<p align="center"> <b> Figure 2: DC-AE speeds up latent diffusion models. </p> <p align="center"> <img src="https://huggingface.co/mit-han-lab/dc-ae-f64c128-in-1.0/resolve/main/assets/dc_ae_sana.jpg" width="1200"> </p> <p align="center"> <b> Figure 3: DC-AE enables efficient text-to-image generation on the laptop: <a href="https://nvlabs.github.io/Sana/">SANA</a>. </p>

[CVPR 2024 eLVM Workshop] EfficientViT-SAM: Accelerated Segment Anything Model Without Accuracy Loss [paper] [online demo] [readme]

EfficientViT-SAM is a new family of accelerated segment anything models by replacing SAM's heavy image encoder with EfficientViT. It delivers a 48.9x measured TensorRT speedup on A100 GPU over SAM-ViT-H without sacrificing accuracy.

<p align="left"> <img src="https://huggingface.co/mit-han-lab/efficientvit-sam/resolve/main/sam_zero_shot_coco_mAP.png" width="500"> </p>

[ICCV 2023] EfficientViT-Classification [paper] [readme]

Efficient image classification models with EfficientViT backbones.

<p align="left"> <img src="https://huggingface.co/han-cai/efficientvit-cls/resolve/main/efficientvit_cls_results.png" width="600"> </p>

[ICCV 2023] EfficientViT-Segmentation [paper] [readme]

Efficient semantic segmantation models with EfficientViT backbones.

demo

EfficientViT-GazeSAM [readme]

Gaze-prompted image segmentation models capable of running in real time with TensorRT on an NVIDIA RTX 4070.

GazeSAM demo

Getting Started

bash
conda create -n efficientvit python=3.10 conda activate efficientvit pip install -U -r requirements.txt

Third-Party Implementation/Integration

Contact

Han Cai

Reference

If EfficientViT or EfficientViT-SAM or DC-AE is useful or relevant to your research, please kindly recognize our contributions by citing our paper:

bibtex
@inproceedings{cai2023efficientvit, title={Efficientvit: Lightweight multi-scale attention for high-resolution dense prediction}, author={Cai, Han and Li, Junyan and Hu, Muyan and Gan, Chuang and Han, Song}, booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision}, pages={17302--17313}, year={2023} }
bibtex
@article{zhang2024efficientvit, title={EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss}, author={Zhang, Zhuoyang and Cai, Han and Han, Song}, journal={arXiv preprint arXiv:2402.05008}, year={2024} }
bibtex
@article{chen2024deep, title={Deep Compression Autoencoder for Efficient High-Resolution Diffusion Models}, author={Chen, Junyu and Cai, Han and Chen, Junsong and Xie, Enze and Yang, Shang and Tang, Haotian and Li, Muyang and Lu, Yao and Han, Song}, journal={arXiv preprint arXiv:2410.10733}, year={2024} }

Contributors

Showing top 12 contributors by commit count.

View all contributors on GitHub โ†’

This article is auto-generated from mit-han-lab/efficientvit via the GitHub API.Last fetched: 6/15/2026