GitPedia

MogaNet

[ICLR 2024] MogaNet: Efficient Multi-order Gated Aggregation Network

From Westlake-AI·Updated April 30, 2026·View on GitHub·

MogaNet: Efficient Multi-order Gated Aggregation Network --> MogaNet: Multi-order Gated Aggregation Network (ICLR 2024) The project is written primarily in Jupyter Notebook, distributed under the Apache License 2.0 license, first published in 2022. Key topics include: 3d-pose-estimation, ade20k, backbone, cnn, coco.

Latest release: moganet-pose-weightsMogaNet-Pose-Estimation-Weights
February 13, 2023View Changelog →
<div align="center"> <!-- <h1>MogaNet: Efficient Multi-order Gated Aggregation Network</h1> --> <h2><a href="https://arxiv.org/abs/2211.03295">MogaNet: Multi-order Gated Aggregation Network (ICLR 2024)</a></h2>

Siyuan Li<sup>*,1,2</sup>, Zedong Wang<sup>*,1</sup>, Zicheng Liu<sup>1,2</sup>, Chen Tan<sup>1,2</sup>, Haitao Lin<sup>1,2</sup>, Di Wu<sup>1,2</sup>, Zhiyuan Chen<sup>1</sup>, Jiangbin Zheng<sup>1,2</sup>, Stan Z. Li<sup>†,1</sup>

<sup>1</sup>Westlake University, <sup>2</sup>Zhejiang University

</div> <p align="center"> <a href="https://arxiv.org/abs/2211.03295" alt="arXiv"> <img src="https://img.shields.io/badge/arXiv-2211.03295-b31b1b.svg?style=flat" /></a> <a href="https://github.com/Westlake-AI/MogaNet/blob/main/LICENSE" alt="license"> <img src="https://img.shields.io/badge/license-Apache--2.0-%23B7A800" /></a> <a href="https://colab.research.google.com/github/Westlake-AI/MogaNet/blob/main/demo.ipynb" alt="Colab"> <img src="https://colab.research.google.com/assets/colab-badge.svg" /></a> <a href="https://huggingface.co/MogaNet" alt="Huggingface"> <img src="https://img.shields.io/badge/huggingface-MogaNet-blueviolet" /></a> </p> <p align="center"> <img src="https://user-images.githubusercontent.com/44519745/202308950-00708e25-9ac7-48f0-af12-224d927ac1ae.jpg" width=100% height=100% class="center"> </p>

We propose MogaNet, a new family of efficient ConvNets designed through the lens of multi-order game-theoretic interaction, to pursue informative context mining with preferable complexity-performance trade-offs. It shows excellent scalability and attains competitive results among state-of-the-art models with more efficient use of model parameters on ImageNet and multifarious typical vision benchmarks, including COCO object detection, ADE20K semantic segmentation, 2D&3D human pose estimation, and video prediction.

This repository contains PyTorch implementation for MogaNet (ICLR 2024).

<details> <summary>Table of Contents</summary> <ol> <li><a href="#catalog">Catalog</a></li> <li><a href="#image-classification">Image Classification</a></li> <li><a href="#license">License</a></li> <li><a href="#acknowledgement">Acknowledgement</a></li> <li><a href="#citation">Citation</a></li> </ol> </details>

Catalog

We plan to release implementations of MogaNet in a few months. Please watch us for the latest release. Currently, this repo is reimplemented according to our official implementations in OpenMixup, and we are working on cleaning up experimental results and code implementations. Models are released in GitHub / Baidu Cloud / Hugging Face.

  • ImageNet-1K Training and Validation Code with timm [code] [models] [Hugging Face 🤗]
  • ImageNet-1K Training and Validation Code in OpenMixup / MMPretrain (TODO)
  • Downstream Transfer to Object Detection and Instance Segmentation on COCO [code] [models] [demo]
  • Downstream Transfer to Semantic Segmentation on ADE20K [code] [models] [demo]
  • Downstream Transfer to 2D Human Pose Estimation on COCO [code] (baselines supported) [models] [demo]
  • Downstream Transfer to 3D Human Pose Estimation (baselines supported) [code] [models]
  • Downstream Transfer to Video Prediction on MMNIST Variants [code] (baselines supported)
  • Image Classification on Google Colab and Notebook Demo [demo]
<p align="center"> <img src="https://github-production-user-asset-6210df.s3.amazonaws.com/44519745/239330216-a93e71ee-7909-485d-8257-1b34abcd61c6.jpg" width=100% height=100% class="center"> </p>

Image Classification

1. Installation

Please check INSTALL.md for installation instructions.

2. Training and Validation

See TRAINING.md for ImageNet-1K training and validation instructions, or refer to our OpenMixup implementations. We released pre-trained models on OpenMixup in moganet-in1k-weights. We have also reproduced ImageNet results with this repo and released args.yaml / summary.csv / model.pth.tar in moganet-in1k-weights. The parameters in the trained model can be extracted by code.

Here is a notebook demo of MogaNet which run the steps to perform inference with MogaNet for image classification.

3. ImageNet-1K Trained Models

ModelResolutionParams (M)Flops (G)Top-1 / top-5 (%)ScriptDownload
MogaNet-XT224x2242.970.8076.5 | 93.4args | scriptmodel | log
MogaNet-XT256x2562.971.0477.2 | 93.8args | scriptmodel | log
MogaNet-T224x2245.201.1079.0 | 94.6args | scriptmodel | log
MogaNet-T256x2565.201.4479.6 | 94.9args | scriptmodel | log
MogaNet-T*256x2565.201.4480.0 | 95.0config | scriptmodel | log
MogaNet-S224x22425.34.9783.4 | 96.9args | scriptmodel | log
MogaNet-B224x22443.99.9384.3 | 97.0args | scriptmodel | log
MogaNet-L224x22482.515.984.7 | 97.1args | scriptmodel | log
MogaNet-XL224x224180.834.585.1 | 97.4args | scriptmodel | log

4. Analysis Tools

(1) The code to count MACs of MogaNet variants.

python get_flops.py --model moganet_tiny
<p align="center"> <img src="https://user-images.githubusercontent.com/44519745/212429257-f0b09d7a-7503-4945-9517-68ea36d10e00.png" width=100% height=100% class="center"> </p>

(2) The code to visualize Grad-CAM activation maps (or variants of Grad-CAM) of MogaNet and other popular architectures.

python cam_image.py --use_cuda --image_path /path/to/image.JPEG --model moganet_tiny --method gradcam
<p align="right">(<a href="#top">back to top</a>)</p>

5. Downstream Tasks

<details> <summary>Object Detection and Instance Segmentation on COCO</summary> <li><a href="https://github.com/Westlake-AI/MogaNet/tree/main/detection">MogaNet + Mask R-CNN</a></li>
MethodBackbonePretrainParamsFLOPsLr schdbox mAPmask mAPConfigDownload
Mask R-CNNMogaNet-XTImageNet-1K22.8M185.4G1x40.737.6configlog / model
Mask R-CNNMogaNet-TImageNet-1K25.0M191.7G1x42.639.1configlog / model
Mask R-CNNMogaNet-SImageNet-1K45.0M271.6G1x46.642.2configlog / model
Mask R-CNNMogaNet-BImageNet-1K63.4M373.1G1x49.043.8configlog / model
Mask R-CNNMogaNet-LImageNet-1K102.1M495.3G1x49.444.2configlog / model
Mask R-CNNMogaNet-TImageNet-1K25.0M191.7GMS 3x45.340.7configlog / model
Mask R-CNNMogaNet-SImageNet-1K45.0M271.6GMS 3x48.543.1configlog / model
Mask R-CNNMogaNet-BImageNet-1K63.4M373.1GMS 3x50.344.4configlog / model
Mask R-CNNMogaNet-LImageNet-1K63.4M373.1GMS 3x50.644.6configlog / model
<li><a href="https://github.com/Westlake-AI/MogaNet/tree/main/detection">MogaNet + RetinaNet</a></li>
MethodBackbonePretrainParamsFLOPsLr schdbox mAPConfigDownload
RetinaNetMogaNet-XTImageNet-1K12.1M167.2G1x39.7configlog / model
RetinaNetMogaNet-TImageNet-1K14.4M173.4G1x41.4configlog / model
RetinaNetMogaNet-SImageNet-1K35.1M253.0G1x45.8configlog / model
RetinaNetMogaNet-BImageNet-1K53.5M354.5G1x47.7configlog / model
RetinaNetMogaNet-LImageNet-1K92.4M476.8G1x48.7configlog / model
<li><a href="https://github.com/Westlake-AI/MogaNet/tree/main/detection">MogaNet + Cascade Mask R-CNN</a></li>
MethodBackbonePretrainParamsFLOPsLr schdbox mAPmask mAPConfigDownload
Cascade Mask R-CNNMogaNet-SImageNet-1K77.9M405.4GMS 3x51.444.9configlog / model
Cascade Mask R-CNNMogaNet-SImageNet-1K82.8M750.2GGIOU+MS 3x51.745.1configlog / model
Cascade Mask R-CNNMogaNet-BImageNet-1K101.2M851.6GGIOU+MS 3x52.646.0configlog / model
Cascade Mask R-CNNMogaNet-LImageNet-1K139.9M973.8GGIOU+MS 3x53.346.1config-
</details> <details> <summary>Semantic Segmentation on ADE20K</summary> <li><a href="https://github.com/Westlake-AI/MogaNet/tree/main/segmentation">MogaNet + Semantic FPN</a></li>
MethodBackbonePretrainParamsFLOPsItersmIoUmAccConfigDownload
Semantic FPNMogaNet-XTImageNet-1K6.9M101.4G80K40.352.4configlog / model
Semantic FPNMogaNet-TImageNet-1K9.1M107.8G80K43.155.4configlog / model
Semantic FPNMogaNet-SImageNet-1K29.1M189.7G80K47.759.8configlog / model
Semantic FPNMogaNet-BImageNet-1K47.5M293.6G80K49.361.6configlog / model
Semantic FPNMogaNet-LImageNet-1K86.2M418.7G80K50.263.0configlog / model
<li><a href="https://github.com/Westlake-AI/MogaNet/tree/main/segmentation">MogaNet + UperNet</a></li>
MethodBackbonePretrainParamsFLOPsItersmIoUmAccConfigDownload
UperNetMogaNet-XTImageNet-1K30.4M855.7G160K42.255.1configlog / model
UperNetMogaNet-TImageNet-1K33.1M862.4G160K43.757.1configlog / model
UperNetMogaNet-SImageNet-1K55.3M946.4G160K49.261.6configlog / model
UperNetMogaNet-BImageNet-1K73.7M1050.4G160K50.163.4configlog / model
UperNetMogaNet-LImageNet-1K113.2M1176.1G160K50.963.5configlog / model
</details> <details> <summary>2D Human Pose Estimation on COCO</summary> <li><a href="https://github.com/Westlake-AI/MogaNet/tree/main/pose_estimation">MogaNet + Top-Down</a></li>
BackboneInput SizeParamsFLOPsAPAP<sup>50</sup>AP<sup>75</sup>ARAR<sup>M</sup>AR<sup>L</sup>ConfigDownload
MogaNet-XT256x1925.6M1.8G72.189.780.177.773.683.6configlog | model
MogaNet-XT384x2885.6M4.2G74.790.181.379.975.985.9configlog | model
MogaNet-T256x1928.1M2.2G73.290.181.078.874.984.4configlog | model
MogaNet-T384x2888.1M4.9G75.790.682.680.976.886.7configlog | model
MogaNet-S256x19229.0M6.0G74.990.782.880.175.786.3configlog | model
MogaNet-S384x28829.0M13.5G76.491.083.381.477.187.7configlog | model
MogaNet-B256x19247.4M10.9G75.390.983.380.776.487.1configlog | model
MogaNet-B384x28847.4M24.4G77.391.484.082.277.988.5configlog | model
</details> <details> <summary>Video Prediction on Moving MNIST</summary>
ArchitectureSettingParamsFLOPsFPSMSEMAESSIMPSNRDownload
IncepU (SimVPv1)200 epoch58.0M19.4G20932.1589.050.926821.84model | log
gSTA (SimVPv2)200 epoch46.8M16.5G28226.6977.190.940222.78model | log
ViT200 epoch46.1M16.9G29035.1595.870.913921.67model | log
Swin Transformer200 epoch46.1M16.4G29429.7084.050.933122.22model | log
Uniformer200 epoch44.8M16.5G29630.3885.870.930822.13model | log
MLP-Mixer200 epoch38.2M14.7G33429.5283.360.933822.22model | log
ConvMixer200 epoch3.9M5.5G65832.0988.930.925921.93model | log
Poolformer200 epoch37.1M14.1G34131.7988.480.927122.03model | log
ConvNeXt200 epoch37.3M14.1G34426.9477.230.939722.74model | log
VAN200 epoch44.5M16.0G28826.1076.110.941722.89model | log
HorNet200 epoch45.7M16.3G28729.6483.260.933122.26model | log
MogaNet200 epoch46.8M16.5G25525.5775.190.942922.99model | log
IncepU (SimVPv1)2000 epoch58.0M19.4G20921.1564.150.953623.99model | log
gSTA (SimVPv2)2000 epoch46.8M16.5G28215.0549.800.967525.97model | log
ViT2000 epoch46.1M16.9.G29019.7461.650.953924.59model | log
Swin Transformer2000 epoch46.1M16.4G29419.1159.840.958424.53model | log
Uniformer2000 epoch44.8M16.5G29618.0157.520.960924.92model | log
MLP-Mixer2000 epoch38.2M14.7G33418.8559.860.958924.58model | log
ConvMixer2000 epoch3.9M5.5G65822.3067.370.950723.73model | log
Poolformer2000 epoch37.1M14.1G34120.9664.310.953924.15model | log
ConvNeXt2000 epoch37.3M14.1G34417.5855.760.961725.06model | log
VAN2000 epoch44.5M16.0G28816.2153.570.964625.49model | log
HorNet2000 epoch45.7M16.3G28717.4055.700.962425.14model | log
MogaNet2000 epoch46.8M16.5G25515.6751.840.966125.70model | log
<summary>Video Prediction on Moving FMNIST</summary>
ArchitectureSettingParamsFLOPsFPSMSEMAESSIMPSNRDownload
IncepU (SimVPv1)200 epoch58.0M19.4G20930.77113.940.874021.81model | log
gSTA (SimVPv2)200 epoch46.8M16.5G28225.86101.220.893322.61model | log
ViT200 epoch46.1M16.9.G29031.05115.590.871221.83model | log
Swin Transformer200 epoch46.1M16.4G29428.66108.930.881522.08model | log
Uniformer200 epoch44.8M16.5G29629.56111.720.877921.97model | log
MLP-Mixer200 epoch38.2M14.7G33428.83109.510.880322.01model | log
ConvMixer200 epoch3.9M5.5G65831.21115.740.870921.71model | log
Poolformer200 epoch37.1M14.1G34130.02113.070.875021.95model | log
ConvNeXt200 epoch37.3M14.1G34426.41102.560.890822.49model | log
VAN200 epoch44.5M16.0G28831.39116.280.870322.82model | log
HorNet200 epoch45.7M16.3G28729.19110.170.879622.03model | log
MogaNet200 epoch46.8M16.5G25525.1499.690.896022.73model | log
</details>

License

This project is released under the Apache 2.0 license.

Acknowledgement

Our implementation is mainly based on the following codebases. We gratefully thank the authors for their wonderful works.

  • pytorch-image-models (timm): PyTorch image models, scripts, pretrained weights.
  • PoolFormer: Official PyTorch implementation of MetaFormer.
  • ConvNeXt: Official PyTorch implementation of ConvNeXt.
  • OpenMixup: Open-source toolbox for visual representation learning.
  • MMDetection: OpenMMLab Detection Toolbox and Benchmark.
  • MMSegmentation: OpenMMLab Semantic Segmentation Toolbox and Benchmark.
  • MMPose: OpenMMLab Pose Estimation Toolbox and Benchmark.
  • MMHuman3D: OpenMMLab 3D Human Parametric Model Toolbox and Benchmark.
  • OpenSTL: A Comprehensive Benchmark of Spatio-Temporal Predictive Learning.

Citation

If you find this repository helpful, please consider citing:

@inproceedings{iclr2024MogaNet,
  title={MogaNet: Multi-order Gated Aggregation Network},
  author={Siyuan Li and Zedong Wang and Zicheng Liu and Cheng Tan and Haitao Lin and Di Wu and Zhiyuan Chen and Jiangbin Zheng and Stan Z. Li},
  booktitle={International Conference on Learning Representations},
  year={2024}
}
<p align="right">(<a href="#top">back to top</a>)</p>

Contributors

Showing top 2 contributors by commit count.

View all contributors on GitHub →

This article is auto-generated from Westlake-AI/MogaNet via the GitHub API.Last fetched: 6/14/2026