GitPedia

Image Generation models

Generative models (GAN, VAE, Diffusion Models, Autoregressive Models) implemented with Pytorch, Pytorch_lightning and hydra.

From Victarry·Updated February 23, 2026·View on GitHub·

image:https://img.shields.io/badge/-Python 3.7--3.9-blue?style=for-the-badge&logo=python&logoColor=white[python, link=https://pytorch.org/get-started/locally/] image:https://img.shields.io/badge/-PyTorch 1.8+-ee4c2c?style=for-the-badge&logo=pytorch&logoColor=white[pytorch, link=https://pytorch.org/] image:https://img.shields.io/badge/-Lightning 1.3+-792ee5?style=for-the-badge&logo=pytorchlightning&logoColor=white[pytorch_lignthing, link=https://www.pytorchlightning.ai/] image:https://img.shields... The project is written primarily in Python, first published in 2021. Key topics include: computer-vision, diffusion-models, gan-pytorch, gans, generative-adversarial-network.

Latest release: v0.1.0Release first version.
May 31, 2022View Changelog →

:img-size: 200
:toc: macro
++++

<div align="center"> ++++ = Colletions of Image Generation Models

image:https://img.shields.io/badge/-Python 3.7--3.9-blue?style=for-the-badge&logo=python&logoColor=white[python, link=https://pytorch.org/get-started/locally/]
image:https://img.shields.io/badge/-PyTorch 1.8+-ee4c2c?style=for-the-badge&logo=pytorch&logoColor=white[pytorch, link=https://pytorch.org/]
image:https://img.shields.io/badge/-Lightning 1.3+-792ee5?style=for-the-badge&logo=pytorchlightning&logoColor=white[pytorch_lignthing, link=https://www.pytorchlightning.ai/]
image:https://img.shields.io/badge/config-hydra 1.1-89b8cd?style=for-the-badge&labelColor=gray[hydra, link=https://hydra.cc/]

An easily scalable and hierachical framework including lots of image generation method with various datasets.

++++

</div> <br> <br> ++++

Highlights 💡:
[Highlights:]

  • Various types of image generation methods(Continuous updating):
    ** GANs: WGAN, InfoGAN, BiGAN
    ** VAEs: VQ-VAE, Beta-VAE, FactorVAE
    ** Augoregressive Models: PixelCNN
    ** Diffusion Models: DDPM
  • Decomposition of model training, datasets and networks:

[source, bash]

python run.py model=wgan networks=conv64 datamodule=celeba exp_name=wgan/celeba_conv64

  • Hierachical configuration of experiment in yaml file
    ** Manual change of configs in configs/model, configs/datamodule and configs/networks
    ** Run predefined experiments in configs/experiment

[source, bash]

python run.py experiment=vanilla_gan/cifar10

** Override hyperparameters from command line
+
[source, bash]

python run.py experiment=vanilla_gan/cifar10 model.lrG=1e-3 model.lrD=1e-3 exp_name=vanilla_gan/custom_lr

  • Run multiple experiments at the same time:
    ** Grid search of hyperparameters:

[source, bash]

python run.py experiment=vae/mnist_conv model.lr=1e-3,5e-4,1e-4 "exp_name=vae/lr_${model.lr}"

** Run multiple experiments from config files:
+
[source, bash]

python run.py -m experiment=vae/mnist_conv,vae/cifar10,vae/celeba

== Setup

  • Clone this repo

[source, bash]

git clone https://github.com/Victarry/Image-Generation-models.git

  • Create new python environment using conda and install requirements

[source, bash]

conda env create -n image-generation python=3.10
conda activate image-generation
pip install requirement.txt

  • Run your first experiment ✔️

[source, bash]

python run.py experiment=vae/mnist_conv

For different datasets, refer to documentation of datasets.

== Project Structure

== Generative Adversarial Networks(GANs)

=== GAN
Generative adversarial nets. +
Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, Yoshua Bengio* +
NeurIPS 2014. [https://arxiv.org/abs/1406.2661[PDF]] [https://arxiv.org/abs/1701.00160[Tutorial]]

[cols="4*", options="header"]
|===
^| Dataset
^| MNIST
^| CelebA
^| CIFAR10

^.^| Results
| image:assets/gan/mnist.jpg[mnist_mlp, {img-size}, {img-size}]
| image:assets/gan/celeba.jpg[cleba_conv, {img-size}, {img-size}]
| image:assets/gan/cifar10.jpg[cifar10_conv, {img-size}, {img-size}]
|===

=== LSGAN
Least Squares Generative Adversarial Networks. +
Xudong Mao, Qing Li, Haoran Xie, Raymond Y.K. Lau, Zhen Wang, Stephen Paul Smolley. +
ICCV 2017. [https://arxiv.org/abs/1611.04076[PDF]]

[cols="4*", options="header"]
|===
^| Dataset
^| MNIST
^| CelebA
^| CIFAR10

^.^| Results
| image:assets/lsgan/mnist.jpg[mnist_mlp, {img-size}, {img-size}]
| image:assets/lsgan/celeba.jpg[cleba_conv, {img-size}, {img-size}]
| image:assets/lsgan/cifar10.jpg[cifar10_conv, {img-size}, {img-size}]
|===

=== WGAN
Wasserstein GAN +
Martin Arjovsky, Soumith Chintala, Léon Bottou. +
ICML 2017. [https://arxiv.org/abs/1701.07875[PDF]]

[cols="4*", options="header"]
|===
^| Dataset
^| MNIST
^| CelebA
^| CIFAR10

^.^| Results
| image:assets/wgan/mnist.jpg[mnist_mlp, {img-size}, {img-size}]
| image:assets/wgan/celeba.jpg[cleba_conv, {img-size}, {img-size}]
| image:assets/wgan/cifar10.jpg[cifar10_conv, {img-size}, {img-size}]
|===

=== WGAN-GP
Improved training of wasserstein gans +
Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin, Aaron Courville +
NeurIPS 2017. [https://arxiv.org/abs/1704.00028[PDF]]
[cols="4*", options="header"]
|===
^| Dataset
^| MNIST
^| CelebA
^| CIFAR10

^.^| Results
| image:assets/wgangp/mnist.jpg[mnist_mlp, {img-size}, {img-size}]
| image:assets/wgangp/celeba.jpg[cleba_conv, {img-size}, {img-size}]
| image:assets/wgangp/cifar10.jpg[cifar10_conv, {img-size}, {img-size}]
|===

=== VAE-GAN
Autoencoding beyond pixels using a learned similarity metric. +
Anders Boesen Lindbo Larsen, Søren Kaae Sønderby, Hugo Larochelle, Ole Winther. +
ICML 2016. [https://arxiv.org/abs/1512.09300[PDF]]

[cols="4*", options="header"]
|===
^| Dataset
^| MNIST
^| CelebA
^| CIFAR10

^.^| Results
| image:assets/vaegan/mnist.jpg[mnist_mlp, {img-size}, {img-size}]
| image:assets/vaegan/celeba.jpg[cleba_conv, {img-size}, {img-size}]
| image:assets/vaegan/cifar10.jpg[cifar10_conv, {img-size}, {img-size}]
|===

=== BiGAN/ALI
BiGAN Adversarial Feature Learning +
Jeff Donahue, Philipp Krähenbühl, Trevor Darrell. +
ICLR 2017. [https://arxiv.org/abs/1605.09782[PDF]]

ALI Adversarial Learned Inference +
Vincent Dumoulin, Ishmael Belghazi, Ben Poole, Olivier Mastropietro, Alex Lamb, Martin Arjovsky, Aaron Courville +
ICLR 2017. [https://arxiv.org/abs/1606.00704[PDF]]

[cols="4*", options="header"]
|===
^| Dataset
^| MNIST
^| CelebA
^| CIFAR10

^.^| Results
| image:assets/bigan/mnist.jpg[mnist_mlp, {img-size}, {img-size}]
| image:assets/bigan/celeba.jpg[cleba_conv, {img-size}, {img-size}]
| image:assets/bigan/cifar10.jpg[cifar10_conv, {img-size}, {img-size}]
|===

=== GGAN
Geometric GAN +
Jae Hyun Lim, Jong Chul Ye. +
Arxiv 2017. [https://arxiv.org/abs/1705.02894[PDF]]

[cols="4*", options="header"]
|===
^| Dataset
^| MNIST
^| CelebA
^| CIFAR10

^.^| Results
| image:assets/ggan/mnist.jpg[mnist_mlp, {img-size}, {img-size}]
| image:assets/ggan/celeba.jpg[cleba_conv, {img-size}, {img-size}]
| image:assets/ggan/cifar10.jpg[cifar10_conv, {img-size}, {img-size}]
|===

=== InfoGAN
InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets +
Xi Chen, Yan Duan, Rein Houthooft, John Schulman, Ilya Sutskever, Pieter Abbeel +
NeruIPS 2016. [https://arxiv.org/abs/1606.03657[PDF]]

[cols="5*", options="header"]
|===
^| Manipulated Latent
^| Random samples
^| Discrete Latent (class label)
^| Continuous Latent-1 (rotation)
^| Continuous Latent-2 (thickness)

^.^| Results
| image:assets/infogan/random.jpg[mnist_mlp, {img-size}, {img-size}]
| image:assets/infogan/class.jpg[mnist_mlp, {img-size}, {img-size}]
| image:assets/infogan/rotation.jpg[cleba_conv, {img-size}, {img-size}]
| image:assets/infogan/thickness.jpg[cifar10_conv, {img-size}, {img-size}]
|===

== Variational Autoencoders(VAEs)

=== VAE
Auto-Encoding Variational Bayes. +
Diederik P.Kingma, Max Welling. +
ICLR 2014. [https://arxiv.org/abs/1312.6114[PDF]]

[cols="4*", options="header"]
|===
^| Dataset
^| MNIST
^| CelebA
^| CIFAR10

^.^| Results
| image:assets/vae/mnist.jpg[mnist_mlp, {img-size}, {img-size}]
| image:assets/vae/celeba.jpg[cleba_conv, {img-size}, {img-size}]
| image:assets/vae/cifar10.jpg[cifar10_conv, {img-size}, {img-size}]
|===

=== cVAE
Learning Structured Output Representation using Deep Conditional Generative Models +
Kihyuk Sohn, Honglak Lee, Xinchen Yan. +
NeurIPS 2015. [https://papers.nips.cc/paper/2015/hash/8d55a249e6baa5c06772297520da2051-Abstract.html[PDF]]

[cols="3*", options="header"]
|===
^| Dataset
^| MNIST
^| CIFAR10

^.^| Results
| image:assets/cvae/mnist.jpg[mnist_mlp, {img-size}, {img-size}]
| image:assets/cvae/cifar10.jpg[cifar10_conv, {img-size}, {img-size}]
|===

=== Beta-VAE
beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework +
Irina Higgins, Loic Matthey, Arka Pal, Christopher Burgess, Xavier Glorot, Matthew Botvinick, Shakir Mohamed, Alexander Lerchner. +
ICLR 2017. [https://openreview.net/forum?id=Sy2fzU9gl[PDF]]

[cols="3*", options="header"]
|===
^| Dataset
^| CelebA
^| dsprites

^.^| Sample
| image:assets/beta_vae/celeba_sample.jpg[celeba, {img-size}, {img-size}]
| image:assets/beta_vae/dsprites_sample.jpg[dsprites, {img-size}, {img-size}]

^.^| Latent Interpolation
| image:assets/beta_vae/celeba_traverse.jpg[celeba, {img-size}, {img-size}]
| image:assets/beta_vae/dsprites_traverse.jpg[dsprites, {img-size}, {img-size}]
|===

=== Factor-VAE
Disentangling by Factorising +
Hyunjik Kim, Andriy Mnih. +
NeurIPS 2017. [https://arxiv.org/abs/1802.05983[PDF]]

[cols="3*", options="header"]
|===
^| Dataset
^| CelebA
^| dsprites

^.^| Sample
| image:assets/factor_vae/fvae_sample_celeba.jpg[celeba, {img-size}, {img-size}]
| image:assets/factor_vae/fvae_dsprites_sample.jpg[dsprites, {img-size}, {img-size}]

^.^| Latent Interpolation
| image:assets/factor_vae/fvae_celeba_traverse.jpg[celeba, {img-size}, {img-size}]
| image:assets/factor_vae/fvae_dsprites_traverse.jpg[dsprites, {img-size}, {img-size}]
|===

=== AAE
Adversarial Autoencoders. +
Alireza Makhzani, Jonathon Shlens, Navdeep Jaitly, Ian Goodfellow, Brendan Frey. +
arxiv 2015. [https://arxiv.org/abs/1511.05644[PDF]]

[cols="4*", options="header"]
|===
^| Dataset
^| MNIST
^| CelebA
^| CIFAR10

^.^| Results
| image:assets/aae/mnist.jpg[mnist_mlp, {img-size}, {img-size}]
| image:assets/aae/celeba.jpg[cleba_conv, {img-size}, {img-size}]
| image:assets/aae/cifar10.jpg[cifar10_conv, {img-size}, {img-size}]
|===

=== AGE
AGE Adversarial Generator-Encoder Networks. +
Dmitry Ulyanov, Andrea Vedaldi, Victor Lempitsky. +
AAAI 2018. [https://arxiv.org/abs/1704.02304[PDF]]

[cols="4*", options="header"]
|===
^| Dataset
^| MNIST
^| CelebA
^| CIFAR10

^.^| Results
| image:assets/age/mnist.jpg[mnist_mlp, {img-size}, {img-size}]
| TODO
| TODO
|===

=== VQ-VAE
Neural Discrete Representation Learning. +
Aaron van den Oord, Oriol Vinyals, Koray Kavukcuoglu +
NeruIPS 2017. [https://arxiv.org/abs/1711.00937[PDF]]

[cols="4*", options="header"]
|===
^| Dataset
^| MNIST
^| CelebA
^| CIFAR10

^.^| Ground truth
| image:assets/vqvae/mnist_real.jpg[mnist_mlp, {img-size}, {img-size}]
| image:assets/vqvae/celeba_real.jpg[cleba_conv, {img-size}, {img-size}]
| image:assets/vqvae/cifar10_real.jpg[cifar10_conv, {img-size}, {img-size}]

^.^| Reconstruction
| image:assets/vqvae/mnist_recon.jpg[mnist_mlp, {img-size}, {img-size}]
| image:assets/vqvae/celeba_recon.jpg[cleba_conv, {img-size}, {img-size}]
| image:assets/vqvae/cifar10_recon.jpg[cifar10_conv, {img-size}, {img-size}]
|===

== Augoregressive Models

=== MADE: Masked Autoencoder for Distribution Estimation
MADE: Masked Autoencoder for Distribution Estimation +
Mathieu Germain, Karol Gregor, Iain Murray, Hugo Larochelle +
ICML 2015. [https://arxiv.org/abs/1502.03509[PDF]]

[cols="2*", options="header"]
|===
^| Dataset
^.^| Samples

^.^| MNIST
| image:assets/made/mnist.jpg[mnist_mlp, {img-size}, {img-size}]
|===

=== PixelCNN
Conditional Image Generation with PixelCNN Decoders +
Aaron van den Oord, Nal Kalchbrenner, Oriol Vinyals, Lasse Espeholt, Alex Graves, Koray Kavukcuoglu +
NeruIPS 2016. [https://arxiv.org/abs/1606.05328[PDF]]

[cols="3*", options="header"]
|===
^| Dataset
^.^| Samples
^.^| Class Condition Samples

^.^| MNIST
| image:assets/pixelcnn/mnist.jpg[mnist_mlp, {img-size}, {img-size}]
| image:assets/pixelcnn/mnist_cond.jpg[mnist_mlp, {img-size}, {img-size}]
|===

=== Transformer
Vanilla transformer based augoregressive models.

[cols="3*", options="header"]
|===
^| Dataset
^.^| Samples
^.^| Class Condition Samples

^.^| MNIST
| image:assets/tar/mnist.jpg[mnist_mlp, {img-size}, {img-size}]
| image:assets/tar/mnist_cond.jpg[mnist_mlp, {img-size}, {img-size}]
|===

[source, bash]

python run.py experiment=tar/mnist

python run.py experiment=tar/mnist_cond

== Diffusion Models
=== DDPM
Denoising Diffusion Probabilistic Models +
Jonathan Ho, Ajay Jain, Pieter Abbeel +
NeurIPS 2020. [https://arxiv.org/abs/2006.11239[PDF]]

[cols="4*", options="header"]
|===
^| Dataset
^| MNIST
^| CelebA
^| CIFAR10

^.^| Results
| image:assets/ddpm/mnist.jpg[mnist_mlp, {img-size}, {img-size}]
| image:assets/ddpm/celeba.jpg[cleba_conv, {img-size}, {img-size}]
| image:assets/ddpm/cifar10.jpg[cifar10_conv, {img-size}, {img-size}]
|===

Contributors

Showing top 1 contributor by commit count.

View all contributors on GitHub →

This article is auto-generated from Victarry/Image-Generation-models via the GitHub API.Last fetched: 6/20/2026