Annotated deep learning paper implementations
๐งโ๐ซ 60+ Implementations/tutorials of deep learning papers with side-by-side notes ๐; including transformers (original, xl, switch, feedback, vit, ...), optimizers (adam, adabelief, sophia, ...), gans(cyclegan, stylegan2, ...), ๐ฎ reinforcement learning (ppo, dqn), capsnet, distillation, ... ๐ง
This is a collection of simple PyTorch implementations of neural networks and related algorithms. These implementations are documented with explanations, The project is written primarily in Python, distributed under the MIT License license, first published in 2020. It has gained significant community traction with 66,851 stars and 6,711 forks on GitHub. Key topics include: attention, deep-learning, deep-learning-tutorial, gan, literate-programming.
labml.ai Deep Learning Paper Implementations
This is a collection of simple PyTorch implementations of
neural networks and related algorithms.
These implementations are documented with explanations,
The website
renders these as side-by-side formatted notes.
We believe these would help you understand these algorithms better.

We are actively maintaining this repo and adding new
implementations almost weekly.
for updates.
Paper Implementations
โจ Transformers
- JAX implementation
- Multi-headed attention
- Triton Flash Attention
- Transformer building blocks
- Transformer XL
- Rotary Positional Embeddings
- Attention with Linear Biases (ALiBi)
- RETRO
- Compressive Transformer
- GPT Architecture
- GLU Variants
- kNN-LM: Generalization through Memorization
- Feedback Transformer
- Switch Transformer
- Fast Weights Transformer
- FNet
- Attention Free Transformer
- Masked Language Model
- MLP-Mixer: An all-MLP Architecture for Vision
- Pay Attention to MLPs (gMLP)
- Vision Transformer (ViT)
- Primer EZ
- Hourglass
โจ Low-Rank Adaptation (LoRA)
โจ Eleuther GPT-NeoX
โจ Diffusion models
- Denoising Diffusion Probabilistic Models (DDPM)
- Denoising Diffusion Implicit Models (DDIM)
- Latent Diffusion Models
- Stable Diffusion
โจ Generative Adversarial Networks
- Original GAN
- GAN with deep convolutional network
- Cycle GAN
- Wasserstein GAN
- Wasserstein GAN with Gradient Penalty
- StyleGAN 2
โจ Recurrent Highway Networks
โจ LSTM
โจ HyperNetworks - HyperLSTM
โจ ResNet
โจ ConvMixer
โจ Capsule Networks
โจ U-Net
โจ Sketch RNN
โจ Graph Neural Networks
โจ Counterfactual Regret Minimization (CFR)
Solving games with incomplete information such as poker with CFR.
โจ Reinforcement Learning
- Proximal Policy Optimization with
Generalized Advantage Estimation - Deep Q Networks with
with Dueling Network,
Prioritized Replay
and Double Q Network.
โจ Optimizers
- Adam
- AMSGrad
- Adam Optimizer with warmup
- Noam Optimizer
- Rectified Adam Optimizer
- AdaBelief Optimizer
- Sophia-G Optimizer
โจ Normalization Layers
- Batch Normalization
- Layer Normalization
- Instance Normalization
- Group Normalization
- Weight Standardization
- Batch-Channel Normalization
- DeepNorm
โจ Distillation
โจ Adaptive Computation
โจ Uncertainty
โจ Activations
โจ Langauge Model Sampling Techniques
โจ Scalable Training/Inference
Installation
bashpip install labml-nn
Contributors
Showing top 12 contributors by commit count.