Home/huggingface/diffusers/Changelog

huggingface/diffusers

🤗 Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch.

30 Releases

Latest: 1mo ago

Diffusers 0.38.0: New image and audio pipelines, Core library improvements, and morev0.38.0Latest

sayakpaul·1mo ago·May 1, 2026

GitHub

📦 LLaDA2

PR: https://github.com/huggingface/diffusers/pull/13226
Docs: [https://huggingface.co/docs/diffusers/main/api/pipelines/llada2](https://huggingface.co/docs/diffusers/main/api/pipelines/llada2)

📦 Nucleus-MoE

NucleusMoE-Image is a 2B active 17B parameter model trained with efficiency at its core. Our novel architecture highlights the scalability of a sparse MoE architecture for Image generation.
PR: https://github.com/huggingface/diffusers/pull/13317
Docs: [https://huggingface.co/docs/diffusers/main/api/pipelines/nucleusmoe_image](https://huggingface.co/docs/diffusers/main/api/pipelines/nucleusmoe_image)
Thanks to @sippycoder for the contribution.

📦 Ernie-Image

ERNIE-Image is a powerful and highly efficient image generation model with 8B parameters.
PR: https://github.com/huggingface/diffusers/pull/13432
Docs: [https://huggingface.co/docs/diffusers/main/api/pipelines/ernie_image](https://huggingface.co/docs/diffusers/main/api/pipelines/ernie_image)
Thanks to @HsiaWinter for the contribution.

📦 LongCat-AudioDiT

LongCat-AudioDiT is a text-to-audio diffusion model from Meituan LongCat.
PR: https://github.com/huggingface/diffusers/pull/13483
Docs: [https://huggingface.co/docs/diffusers/main/api/pipelines/longcat_audio_dit](https://huggingface.co/docs/diffusers/main/api/pipelines/longcat_audio_dit)
Thanks to @RuixiangMa for the contribution.

📦 Ace-Step 1.5

PR: [https://github.com/huggingface/diffusers/pull/13095](https://github.com/huggingface/diffusers/pull/13095)
Docs: [https://huggingface.co/docs/diffusers/main/api/pipelines/ace_step](https://huggingface.co/docs/diffusers/main/api/pipelines/ace_step)
Thanks to @[ChuxiJ](https://github.com/ChuxiJ) for the contribution.

📦 Modular Pipeline Support

We added modular support for LTX-2 and Hunyuan 1.5.

📦 Core Library

[Flash Attention 4 backend](https://github.com/huggingface/diffusers/issues/13280)
[FlashPack loading](https://github.com/huggingface/diffusers/issues/12700)
[Group offloading + TorchAO](https://github.com/huggingface/diffusers/pull/13276)
[`ring_anything` as a new CP backend](https://github.com/huggingface/diffusers/pull/13545)
[Profiling pipelines in Diffusers](https://github.com/huggingface/diffusers/pull/13356)

📦 All commits

[Discrete Diffusion] Add LLaDA2 pipeline by @kashif in #13226
[LLADA2] documentation fixes by @kashif in #13333
[ci] claude in ci. by @sayakpaul in #13297
[docs] kernels by @stevhliu in #13139
[tests] Tests for conditional pipeline blocks by @sayakpaul in #13247
avoid hardcode device in flux-control example by @kaixuanliu in #13336
fix claude workflow to include id-token with write. by @sayakpaul in #13338
Update LTX-2 Docs to Cover LTX-2.3 Models by @dg845 in #13337
+ 96 more

📦 Significant community contributions

The following contributors have made significant changes to the library over the last release:
@kashif
[Discrete Diffusion] Add LLaDA2 pipeline (#13226)
[LLADA2] documentation fixes (#13333)
@howardzhang-cv
remove str option for quantization config in torchao (#13291)
change minimum version guard for torchao to 0.15.0 (#13355)
@sippycoder
+ 29 more

Fixes for AutoModel type hints in Modular Pipelines and Flux Klein LoRA loading v0.37.1

DN6·2mo ago·March 25, 2026

GitHub

📋 Changes

Fix for loading `ModularPipelines` with `AutoModel` type hints in their `modular_model_index.json` #13271
Fix Flux Klein LoRA loading #13313
Fix unguarded `torchvision` import in Cosmos Predict 2.5 #13321

Diffusers 0.37.0: Modular Diffusers, New image and video pipelines, multiple core library improvements, and more 🔥v0.37.0

sayakpaul·2mo ago·March 5, 2026

GitHub

📦 Image 🌆

[Z Image Omni Base](https://huggingface.co/docs/diffusers/en/api/pipelines/z_image): Z-Image is the foundation model of the Z-Image family, engineered for good quality, robust generative diversity, broad stylistic coverage, and precise prompt adherence. While Z-Image-Turbo is built for speed, Z-Image is a full-capacity, undistilled transformer designed to be the backbone for creators, researchers, and developers who require the highest level of creative freedom. Thanks to @RuoyiDufor for contributing this in #12857.
[Flux2 Klein](https://huggingface.co/docs/diffusers/main/en/api/pipelines/flux2#diffusers.Flux2KleinPipeline):FLUX.2 [Klein] unifies generation and editing in a single compact architecture, delivering state-of-the-art quality with end-to-end inference in as low as under a second. Built for applications that require real-time image generation without sacrificing quality, and runs on consumer hardware, with as little as 13GB VRAM.
[Qwen Image Layered](https://huggingface.co/Qwen/Qwen-Image-Layered): Qwen-Image-Layered is a model capable of decomposing an image into multiple RGBA layers. This layered representation unlocks inherent editability: each layer can be independently manipulated without affecting other content. Thanks to @naykun for contributing this in #12853.
[FIBO Edit](https://huggingface.co/docs/diffusers/main/en/api/pipelines/bria_fibo_edit): Fibo Edit is an 8B parameter image-to-image model that introduces a new paradigm of structured control, operating on JSON inputs paired with source images to enable deterministic and repeatable editing workflows. Featuring native masking for granular precision, it moves beyond simple prompt-based diffusion to offer explicit, interpretable control optimized for production environments. Its lightweight architecture is designed for deep customization, empowering researchers to build specialized “Edit” models for domain-specific tasks while delivering top-tier aesthetic quality. Thanks galbria for contributing it in [https://github.com/huggingface/diffusers/pull/12930](https://github.com/huggingface/diffusers/pull/12930).
[Cosmos Predict2.5](https://huggingface.co/docs/diffusers/main/en/api/pipelines/cosmos): Cosmos-Predict2.5, the latest version of the Cosmos World Foundation Models (WFMs) family, specialized for simulating and predicting the future state of the world. Thanks to @miguelmartin75 for contributing it in #12852.
[Cosmos Transfer2.5](https://huggingface.co/docs/diffusers/main/en/api/pipelines/cosmos): Cosmos-Transfer2.5 is a conditional world generation model with adaptive multimodal control, that produces high-quality world simulations conditioned on multiple control inputs. These inputs can take different modalities—including edges, blurred video, segmentation maps, and depth maps. Thanks to @miguelmartin75 for contributing it in #13066.
[GLM-Image](https://huggingface.co/docs/diffusers/main/en/api/pipelines/glm_image): GLM-Image is an image generation model adopts a hybrid autoregressive + diffusion decoder architecture, effectively pushing the upper bound of visual fidelity and fine-grained details. In general image generation quality, it aligns with industry-standard LDM-based approaches, while demonstrating significant advantages in knowledge-intensive image generation scenarios. Thanks to @zRzRzRzRzRzRzR for contributing it in [https://github.com/huggingface/diffusers/pull/12973](https://github.com/huggingface/diffusers/pull/12973).
[RAE](https://huggingface.co/docs/diffusers/main/api/models/autoencoder_rae): Representation Autoencoders (aka RAE) are an exciting alternative to traditional VAEs, typically used in the area of latent-space diffusion models of image generation. RAEs leverage pre-trained vision encoders and train lightweight decoders for the task of reconstruction.

📦 Video + audio 🎥 🎼

[LTX-2](https://huggingface.co/docs/diffusers/main/en/api/pipelines/ltx2): LTX-2 is an audio-conditioned text-to-video generation model that can generate videos with synced audio. Full and distilled model inference, as well as two-stage inference with spatial sampling, is supported. We also support a conditioning pipeline that allows for passing different conditions (such as images, series of images, etc.). Check out the docs to learn more!
[Helios](https://huggingface.co/docs/diffusers/main/api/pipelines/helios): Helios is a 14B video generation model that runs at 17 FPS on a single NVIDIA H100 GPU and supports minute-scale generation while matching a strong baseline in quality. Thanks to @SHYuanBest for contributing this in [https://github.com/huggingface/diffusers/pull/13208](https://github.com/huggingface/diffusers/pull/13208).

✨ New caching methods

[MagCache](https://github.com/huggingface/diffusers/pull/12744) — thanks to @AlanPonnachan!
[TaylorSeer](https://github.com/huggingface/diffusers/pull/12648/) — thanks to @toilaluan!

✨ New context-parallelism (CP) backends

[Unified Sequence Parallel attention](https://github.com/huggingface/diffusers/pull/12693) — thanks to @Bissmella!
[Ulysses Anything Attention](https://github.com/huggingface/diffusers/pull/12996) — thanks to @DefTruth!

📦 Misc

Mambo-G Guidance: New guider implementation (#12862)
Laplace Scheduler for DDPM (#11320)
Custom Sigmas in UniPCMultistepScheduler (#12109)
MultiControlNet support for SD3 Inpainting (#11251)
Context parallel in native flash attention (#12829)
NPU Ulysses Attention Support (#12919)
Fix Wan 2.1 I2V Context Parallel Inference (#12909)
Fix Qwen-Image Context Parallel Inference (#12970)
+ 5 more

🐛 Bug Fixes

Fix QwenImageEditPlus on NPU (#13017)
Fix MT5Tokenizer → use `T5Tokenizer` for Transformers v5.0+ compatibility (#12877)
Fix Wan/WanI2V patchification (#13038)
Fix LTX-2 inference with `num_videos_per_prompt > 1` and CFG (#13121)
Fix Flux2 img2img prediction (#12855)
Fix QwenImage `txt_seq_lens` handling (#12702)
Fix `prefix_token_len` bug (#12845)
Fix ftfy imports in Wan and SkyReels-V2 (#12314, #13113)
+ 15 more

📦 All commits

[PRX] Improve model compilation by @WaterKnight1998 in #12787
Improve docstrings and type hints in scheduling_dpmsolver_singlestep.py by @delmalih in #12798
[Modular]z-image by @yiyixuxu in #12808
Fix Qwen Edit Plus modular for multi-image input by @sayakpaul in #12601
[WIP] Add Flux2 modular by @DN6 in #12763
[docs] improve distributed inference cp docs. by @sayakpaul in #12810
post release 0.36.0 by @sayakpaul in #12804
Update distributed_inference.md to correct syntax by @sayakpaul in #12827
+ 231 more

📦 Significant community contributions

The following contributors have made significant changes to the library over the last release:
@delmalih
Improve docstrings and type hints in scheduling_dpmsolver_singlestep.py (#12798)
Improve docstrings and type hints in scheduling_edm_euler.py (#12871)
Improve docstrings and type hints in scheduling_consistency_decoder.py (#12928)
Improve docstrings and type hints in scheduling_consistency_models.py (#12931)
Improve docstrings and type hints in scheduling_cosine_dpmsolver_multistep.py (#12936)
[Docs] Replace root CONTRIBUTING.md with symlink to source docs (#12986)
+ 176 more

Diffusers 0.36.0: Pipelines galore, new caching method, training scripts, and more 🎄v0.36.0

sayakpaul·5mo ago·December 8, 2025

GitHub

✨ New image pipelines

[Flux2](https://huggingface.co/docs/diffusers/main/en/api/pipelines/flux2): Flux2 is the latest generation of image generation and editing model from Black Forest Labs. It’s capable of taking multiple input images as reference, making it versatile for different use cases.
[Z-Image](https://huggingface.co/docs/diffusers/main/en/api/pipelines/z_image): Z-Image is a best-of-its-kind image generation model in the 6B param regime. Thanks to @JerryWu-code in [https://github.com/huggingface/diffusers/pull/12703](https://github.com/huggingface/diffusers/pull/12703).
[QwenImage Edit Plus](https://huggingface.co/docs/diffusers/main/en/api/pipelines/qwenimage): It’s an upgrade of QwenImage Edit and is capable of taking multiple input images as references. It can act as both a generation and an editing model. Thanks to @naykun for contributing in https://github.com/huggingface/diffusers/issues/12357.
[Bria FIBO:](https://huggingface.co/docs/diffusers/main/en/api/pipelines/bria_fibo) FIBO is trained on structured JSON captions up to 1,000+ words and designed to understand and control different visual parameters such as lighting, composition, color, and camera settings, enabling precise and reproducible outputs. Thanks to @galbria for contributing this in [https://github.com/huggingface/diffusers/pull/12545](https://github.com/huggingface/diffusers/pull/12545).
[Kandinsky Image Lite](https://huggingface.co/docs/diffusers/main/en/api/pipelines/kandinsky5_image): Kandinsky 5.0 Image Lite is a lightweight image generation model (6B parameters). Thanks to @leffff for contributing this in [https://github.com/huggingface/diffusers/pull/12664](https://github.com/huggingface/diffusers/pull/12664).
[ChronoEdit](https://huggingface.co/docs/diffusers/main/en/api/pipelines/chronoedit): ChronoEdit reframes image editing as a video generation task, using input and edited images as start/end frames to leverage pretrained video models with temporal consistency. A temporal reasoning stage introduces reasoning tokens to ensure physically plausible edits and visualize the editing trajectory. Thanks to @zhangjiewu for contributing this in https://github.com/huggingface/diffusers/pull/12593.

✨ New video pipelines

[Sana-Video](https://huggingface.co/docs/diffusers/main/en/api/pipelines/sana_video): Sana-Video is a fast and efficient video generation model, equipped to handle long video sequences, thanks to its incorporation of linear attention. Thanks to @lawrence-cj for contributing this in [https://github.com/huggingface/diffusers/pull/12634](https://github.com/huggingface/diffusers/pull/12634).
[Kandinsky 5](https://huggingface.co/docs/diffusers/main/en/api/pipelines/kandinsky5_video): Kandinsky 5.0 T2V Lite is a lightweight video generation model (2B parameters) that ranks #1 among open-source models in its class. It outperforms larger models and offers the best understanding of Russian concepts in the open-source ecosystem. Thanks to @leffff for contributing this in [https://github.com/huggingface/diffusers/pull/12478](https://github.com/huggingface/diffusers/pull/12478).
[Hunyuan 1.5](https://huggingface.co/docs/diffusers/main/en/api/pipelines/hunyuan_video15): HunyuanVideo-1.5 is a lightweight yet powerful video generation model that achieves state-of-the-art visual quality and motion coherence with only 8.3 billion parameters, enabling efficient inference on consumer-grade GPUs.
[Wan Animate](https://huggingface.co/docs/diffusers/main/en/api/pipelines/wan#wan-animate-unified-character-animation-and-replacement-with-holistic-replication): Wan-Animate is a state-of-the-art character animation and replacement video model based on Wan2.1. Given a reference character image and driving motion video, it can either animate the character with motion from the driving video, or replace the existing character in that video with that character.

✨ New `kernels`-powered attention backends

Flash Attention 3 (+ its `varlen` variant)
Flash Attention 2 (+ its `varlen` variant)
SAGE
This means if any of the above backend is supported by your development environment, you should be able to skip the manual process of building the corresponding kernels and just use:
```python
pipe.transformer.set_attention_backend("_flash_3_hub")
```
For more details, check out the [documentation](https://huggingface.co/docs/diffusers/main/en/optimization/attention_backends).

📦 Misc

Reusing `AttentionMixin`: Making certain compatible models subclass from the `AttentionMixin` class helped us get rid of 2K LoC. Going forward, users can expect more such refactorings that will help make the library leaner and simpler. Check out https://github.com/huggingface/diffusers/pull/12463 for more details.
Diffusers backend in SGLang: https://github.com/sgl-project/sglang/pull/14112.
We started the [Diffusers MVP program](https://github.com/huggingface/diffusers/issues/12635) to work with talented community members who will help us improve the library across multiple fronts. Check out the link for more information.

📦 All commits

remove unneeded checkpoint imports. by @sayakpaul in #12488
[tests] fix clapconfig for text backbone in audioldm2 by @sayakpaul in #12490
ltx0.9.8 (without IC lora, autoregressive sampling) by @yiyixuxu in #12493
[docs] Attention checks by @stevhliu in #12486
[CI] Check links by @stevhliu in #12491
[ci] xfail more incorrect transformer imports. by @sayakpaul in #12455
[tests] introduce `VAETesterMixin` to consolidate tests for slicing and tiling by @sayakpaul in #12374
docs: cleanup of runway model by @EazyAl in #12503
+ 147 more

📦 Significant community contributions

The following contributors have made significant changes to the library over the last release:
@yiyixuxu
ltx0.9.8 (without IC lora, autoregressive sampling) (#12493)
Fix: Add _skip_keys for AutoencoderKLWan (#12523)
HunyuanImage21 (#12333)
[modular] better warn message (#12573)
[modular]pass hub_kwargs to load_config (#12577)
[modular] wan! (#12611)
+ 76 more

🐞 fixes for `transformers` models, imports, v0.35.2

sayakpaul·7mo ago·October 15, 2025

GitHub

📦 All commits

Release: v0.35.1-patch by @sayakpaul (direct commit on v0.35.2-patch)
handle offload_state_dict when initing transformers models by @sayakpaul in #12438
[CI] Fix TRANSFORMERS_FLAX_WEIGHTS_NAME import issue by @DN6 in #12354
Fix PyTorch 2.3.1 compatibility: add version guard for torch.library.… by @Aishwarya0811 in #12206
fix scale_shift_factor being on cpu for wan and ltx by @vladmandic in #12347
Release: v0.35.2-patch by @sayakpaul (direct commit on v0.35.2-patch)

v0.35.1 for improvements in Qwen-Image Editv0.35.1

sayakpaul·9mo ago·August 20, 2025

GitHub

📋 Changes

https://github.com/huggingface/diffusers/pull/12188
https://github.com/huggingface/diffusers/pull/12190

Diffusers 0.35.0: Qwen Image pipelines, Flux Kontext, Wan 2.2, and morev0.35.0

sayakpaul·9mo ago·August 19, 2025

GitHub

✨ New pipelines 🧨

We welcomed new pipelines in this release:
Wan 2.2
Flux-Kontext
Qwen-Image
Qwen-Image-Edit

✨ New training scripts 🎛️

Make these newly added models your own with our training scripts:
[Kontext trainer](https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/README_flux.md#training-kontext)
[Qwen-Image trainer](https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/README_flux.md#training-kontext)

♻️ Attention refactor

Users shouldn’t be affected at all by these changes. Please open an issue if you face any problems.

📦 Regional compilation

Thanks to @anijain2305 for contributing this feature in [this PR](https://github.com/huggingface/diffusers/pull/11705).
We have also authored a number of posts that center around the use of `torch.compile`. You can check them out at the links below:
[Presenting Flux Fast: Making Flux go brrr on H100s](https://pytorch.org/blog/presenting-flux-fast-making-flux-go-brrr-on-h100s/)
[torch.compile and Diffusers: A Hands-On Guide to Peak Performance](https://pytorch.org/blog/torch-compile-and-diffusers-a-hands-on-guide-to-peak-performance/)
[Fast LoRA inference for Flux with Diffusers and PEFT](https://huggingface.co/blog/lora-fast)

📦 Faster pipeline loading ⚡️

Users can now load pipelines directly on an accelerator device leading to significantly faster load times. This particularly becomes evident when loading large pipelines like Wan and Qwen-Image.
```diff
from diffusers import DiffusionPipeline
import torch
ckpt_id = "Qwen/Qwen-Image"
pipe = DiffusionPipeline.from_pretrained(
ckpt_id, torch_dtype=torch.bfloat16
).to("cuda")
+ 8 more

📦 Better GGUF integration

@Isotr0py contributed support for native GGUF CUDA kernels in [this PR](https://github.com/huggingface/diffusers/pull/11869). This should provide an approximately 10% improvement in inference speed.
We now support loading of Diffusers format GGUF checkpoints.
You can learn more about all of this in our [GGUF official docs](https://huggingface.co/docs/diffusers/main/en/quantization/gguf).

📦 Modular Diffusers (Experimental)

The API is currently in active development and is being released as an experimental feature. Learn more in our [docs](https://huggingface.co/docs/diffusers/main/en/modular_diffusers/overview).

📦 All commits

[tests] skip instead of returning. by @sayakpaul in #11793
adjust to get CI test cases passed on XPU by @kaixuanliu in #11759
fix deprecation in lora after 0.34.0 release by @sayakpaul in #11802
[chore] post release v0.34.0 by @sayakpaul in #11800
Follow up for Group Offload to Disk by @DN6 in #11760
[rfc][compile] compile method for DiffusionPipeline by @anijain2305 in #11705
[tests] add a test on torch compile for varied resolutions by @sayakpaul in #11776
adjust tolerance criteria for `test_float16_inference` in unit test by @kaixuanliu in #11809
+ 173 more

📦 Significant community contributions

The following contributors have made significant changes to the library over the last release:
@vuongminh1907
update: FluxKontextInpaintPipeline support (#11820)
@Net-Mist
feat: add multiple input image support in Flux Kontext (#11880)
@tolgacangoz
Add SkyReels V2: Infinite-Length Film Generative Model (#11518)
@naykun
+ 7 more

Diffusers 0.34.0: New Image and Video Models, Better torch.compile Support, and morev0.34.0

sayakpaul·11mo ago·June 24, 2025

GitHub

📦 Wan VACE

Control to Video (Depth, Pose, Sketch, Flow, Grayscale, Scribble, Layout, Boundary Box, etc.). Recommended library for preprocessing videos to obtain control videos: [huggingface/controlnet_aux](https://huggingface.co/docs/diffusers/main/en/api/pipelines/wan)
Image/Video to Video (first frame, last frame, starting clip, ending clip, random clips)
Inpainting and Outpainting
Subject to Video (faces, object, characters, etc.)
Composition to Video (reference anything, animate anything, swap anything, expand anything, move anything, etc.)
The code snippets available in [this](https://github.com/huggingface/diffusers/pull/11582) pull request demonstrate some examples of how videos can be generated with controllability signals.
Check out the [docs](https://huggingface.co/docs/diffusers/main/en/api/pipelines/wan#any-to-video-controllable-generation) to learn more.

📦 Cosmos Predict2 Video2World

The Video2World model comes in a 2B and 14B variant. Check out the [docs](https://huggingface.co/docs/diffusers/main/en/api/pipelines/cosmos) to learn more.

📦 LTX 0.9.7 and Distilled

LTX 0.9.7 and its distilled variants are the latest in the family of models released by Lightricks.
Check out the [docs](https://huggingface.co/docs/diffusers/en/api/pipelines/ltx_video) to learn more.

📦 FusionX

```python
from diffusers import WanTransformer3DModel
transformer = WanTransformer3DModel.from_single_file(
"https://huggingface.co/vrgamedevgirl84/Wan14BT2VFusioniX/blob/main/Wan14Bi2vFusioniX_fp16.safetensors",
torch_dtype=torch.bfloat16
)
```
To load the LoRAs, use `load_lora_weights()`:
+ 9 more

📦 Chroma

Thanks to @Ednaordinary for contributing it in [this PR](https://github.com/huggingface/diffusers/pull/11698)!

📦 VisualCloze

1. Support for various in-domain tasks
2. Generalization to unseen tasks through in-context learning
3. Unify multiple tasks into one step and generate both target image and intermediate results
4. Support reverse-engineering conditions from target images

📦 Better `torch.compile` support

https://github.com/huggingface/diffusers/pull/11085
https://github.com/huggingface/diffusers/issues/11430
Additionally, users can combine offloading with compilation to get a better speed-memory trade-off. Below is an example:
<details>
<summary>Code</summary>
```py
import torch
from diffusers import DiffusionPipeline
+ 67 more

📦 PipelineQuantizationConfig

Users can now provide a quantization config while initializing a pipeline:
```python
import torch
from diffusers import DiffusionPipeline
from diffusers.quantizers import PipelineQuantizationConfig
pipeline_quant_config = PipelineQuantizationConfig(
quant_backend="bitsandbytes_4bit",
quant_kwargs={"load_in_4bit": True, "bnb_4bit_quant_type": "nf4", "bnb_4bit_compute_dtype": torch.bfloat16},
+ 9 more

📦 Group offloading with disk

However, you still need a considerable amount of system RAM to make offloading work effectively. So, low VRAM and low RAM environments would still not work.
Starting this release, users will additionally have the option to offload to disk instead of RAM, further lowering memory consumption. Set the `offload_to_disk_path` to enable this feature.
```python
pipeline.transformer.enable_group_offload(
onload_device="cuda",
offload_device="cpu",
offload_type="leaf_level",
offload_to_disk_path="path/to/disk"
+ 2 more

✨ New training scripts

We now have a capable training script for training robust timestep-distilled models through the SANA Sprint framework. Check out [this resource](https://github.com/huggingface/diffusers/tree/main/examples/research_projects/sana) for more details. Thanks to @scxue and @lawrence-cj for contributing it in [this PR](https://github.com/huggingface/diffusers/pull/11514).
HiDream LoRA DreamBooth training script ([docs](https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/README_hidream.md)). The script supports training with quantization. [HiDream](https://huggingface.co/docs/diffusers/main/en/api/pipelines/hidream) is an MIT-licensed model. So, make it yours with this training script.

📦 Updates on educational materials on quantization

We have worked on a two-part series discussing the support of quantization in Diffusers. Check them out:
[Exploring Quantization Backends in Diffusers](https://huggingface.co/blog/diffusers-quantization)
[(LoRA) Fine-Tuning FLUX.1-dev on Consumer Hardware](https://huggingface.co/blog/flux-qlora)

📦 All commits

[LoRA] support musubi wan loras. by @sayakpaul in #11243
fix test_vanilla_funetuning failure on XPU and A100 by @yao-matrix in #11263
make test_stable_diffusion_inpaint_fp16 pass on XPU by @yao-matrix in #11264
make test_dict_tuple_outputs_equivalent pass on XPU by @yao-matrix in #11265
add onnxruntime-qnn & onnxruntime-cann by @xieofxie in #11269
make test_instant_style_multiple_masks pass on XPU by @yao-matrix in #11266
[BUG] Fix convert_vae_pt_to_diffusers bug by @lavinal712 in #11078
Fix LTX 0.9.5 single file by @hlky in #11271
+ 259 more

📦 Significant community contributions

The following contributors have made significant changes to the library over the last release:
@yao-matrix
fix test_vanilla_funetuning failure on XPU and A100 (#11263)
make test_stable_diffusion_inpaint_fp16 pass on XPU (#11264)
make test_dict_tuple_outputs_equivalent pass on XPU (#11265)
make test_instant_style_multiple_masks pass on XPU (#11266)
make KandinskyV22PipelineInpaintCombinedFastTests::test_float16_inference pass on XPU (#11308)
make test_stable_diffusion_karras_sigmas pass on XPU (#11310)
+ 91 more

v0.33.1: fix ftfy importv0.33.1

yiyixuxu·1y ago·April 10, 2025

GitHub

📦 All commits

fix ftfy import for wan pipelines by @yiyixuxu in #11262

Diffusers 0.33.0: New Image and Video Models, Memory Optimizations, Caching Methods, Remote VAEs, New Training Scripts, and morev0.33.0

sayakpaul·1y ago·April 9, 2025

GitHub

📦 Wan 2.1

`Wan-AI/Wan2.1-T2V-1.3B-Diffusers`
`Wan-AI/Wan2.1-T2V-14B-Diffusers`
`Wan-AI/Wan2.1-I2V-14B-480P-Diffusers`
`Wan-AI/Wan2.1-I2V-14B-720P-Diffusers`
Check out the docs [here](https://huggingface.co/docs/diffusers/main/en/api/pipelines/wan) to learn more.

📦 LTX Video 0.9.5

To support these additional conditioning inputs, we’ve introduced the `LTXConditionPipeline` and `LTXVideoCondition` object.
To learn more about the usage, check out the docs [here](https://huggingface.co/docs/diffusers/main/en/api/pipelines/ltx_video).

📦 Hunyuan Image to Video

To learn more, check out the docs [here](https://huggingface.co/docs/diffusers/main/en/api/pipelines/hunyuan_video).

📦 Others

[EasyAnimateV5](https://huggingface.co/docs/diffusers/main/en/api/pipelines/easyanimate) (thanks to @bubbliiiing for contributing this in [this PR](https://github.com/huggingface/diffusers/pull/10626))
[ConsisID](https://huggingface.co/docs/diffusers/main/en/using-diffusers/consisid) (thanks to @SHYuanBest for contributing this in [this PR](https://github.com/huggingface/diffusers/pull/10140))

📦 Sana-Sprint

Shoutout to @lawrence-cj for their help and guidance on [this PR](https://github.com/huggingface/diffusers/pull/11074).
Check out the [pipeline docs](https://huggingface.co/docs/diffusers/main/en/api/pipelines/sana_sprint) of SANA-Sprint to learn more.

📦 Lumina2

Lumina-Image-2.0 is a 2B parameter flow-based diffusion transformer for text-to-image generation released under the Apache 2.0 license.

📦 Others

[CogView4](https://huggingface.co/docs/diffusers/main/en/api/pipelines/cogview4) (thanks to @zRzRzRzRzRzRzR for contributing CogView4 in [this PR](https://github.com/huggingface/diffusers/pull/10649))

📦 Layerwise Casting

PyTorch supports `torch.float8_e4m3fn` and `torch.float8_e5m2` as weight storage `dtypes`, but they can’t be used for computation on many devices due to unimplemented kernel support.
<details>
<summary>Code</summary>
```py
import torch
from diffusers import CogVideoXPipeline, CogVideoXTransformer3DModel
from diffusers.utils import export_to_video
model_id = "THUDM/CogVideoX-5b"
+ 16 more

📦 Group Offloading

You can also use `record_stream=True` when using `use_stream=True` to obtain more speedups at the expense of slightly increased memory usage.
<details>
<summary>Code</summary>
```py
import torch
from diffusers import CogVideoXPipeline
from diffusers.utils import export_to_video
onload_device = torch.device("cuda")
+ 35 more

📦 Remote Components

| Model | Endpoint | Model |
|---------------------|---------------------------------------------------------------------|--------------------------------------|
| Stable Diffusion v1 | https://q1bj3bpq6kzilnsu.us-east-1.aws.endpoints.huggingface.cloud | stabilityai/sd-vae-ft-mse |
| Stable Diffusion XL | https://x2dmsqunjd6k9prw.us-east-1.aws.endpoints.huggingface.cloud | madebyollin/sdxl-vae-fp16-fix |
| Flux | https://whhx50ex1aryqvw6.us-east-1.aws.endpoints.huggingface.cloud | black-forest-labs/FLUX.1-schnell |
| HunyuanVideo | https://o7ywnmrahorts457.us-east-1.aws.endpoints.huggingface.cloud | hunyuanvideo-community/HunyuanVideo |
This is an example of using remote decoding with the Hunyuan Video pipeline:
<details>
+ 29 more

📦 Introducing Cached Inference for DiTs

Check out the [docs](https://huggingface.co/docs/diffusers/main/en/api/cache) to learn more about the available caching methods.
Pyramind Attention Broadcast
<details>
<summary>Code</summary>
```py
import torch
from diffusers import CogVideoXPipeline, PyramidAttentionBroadcastConfig
pipe = CogVideoXPipeline.from_pretrained("THUDM/CogVideoX-5b", torch_dtype=torch.bfloat16)
+ 27 more

📦 Quanto Backend

```python
import torch
from diffusers import FluxTransformer2DModel, QuantoConfig
model_id = "black-forest-labs/FLUX.1-dev"
quantization_config = QuantoConfig(weights_dtype="float8")
transformer = FluxTransformer2DModel.from_pretrained(
model_id,
subfolder="transformer",
+ 21 more

📦 Improved loading for `uintx` TorchAO checkpoints with `torch>=2.6`

Torch 2.6 allows adding expected Tensors to torch safe globals, which lets us directly load TorchAO checkpoints with these objects.
```diff
state_dict = torch.load("/path/to/flux_uint4wo/diffusion_pytorch_model.bin", weights_only=False, map_location="cpu")
with init_empty_weights():
transformer = FluxTransformer2DModel.from_config("/path/to/flux_uint4wo/config.json")
transformer.load_state_dict(state_dict, strict=True, assign=True)
+ transformer = FluxTransformer2DModel.from_pretrained("/path/to/flux_uint4wo/")
```

📦 LoRAs

We have shipped a couple of improvements on the LoRA front in this release.
🚨 Improved coverage for loading non-diffusers LoRA checkpoints for Flux
`torch.compile()` support when hotswapping LoRAs without triggering recompilation
Check out the [docs](https://huggingface.co/docs/diffusers/en/using-diffusers/loading_adapters#hotswapping-lora-adapters) to learn more about this feature.
The other major change is the support for
Loading LoRAs into quantized model checkpoints

📦 `dtype` Maps for Pipelines

Since various pipelines require their components to run in different compute dtypes, we now support passing a dtype map when initializing a pipeline:
```python
from diffusers import HunyuanVideoPipeline
import torch
pipe = HunyuanVideoPipeline.from_pretrained(
"hunyuanvideo-community/HunyuanVideo",
torch_dtype={"transformer": torch.bfloat16, "default": torch.float16},
)
+ 2 more

📦 AutoModel

This release includes an AutoModel object similar to the one found in `transformers` that automatically fetches the appropriate model class for the provided repo.
```python
from diffusers import AutoModel
unet = AutoModel.from_pretrained("runwayml/stable-diffusion-v1-5", subfolder="unet")
```

📦 All commits

[Sana 4K] Add vae tiling option to avoid OOM by @leisuzz in #10583
IP-Adapter for `StableDiffusion3Img2ImgPipeline` by @guiyrt in #10589
[DC-AE, SANA] fix SanaMultiscaleLinearAttention apply_quadratic_attention bf16 by @chenjy2003 in #10595
Move buffers to device by @hlky in #10523
[Docs] Update SD3 ip_adapter model_id to diffusers checkpoint by @guiyrt in #10597
Scheduling fixes on MPS by @hlky in #10549
[Docs] Add documentation about using ParaAttention to optimize FLUX and HunyuanVideo by @chengzeyi in #10544
NPU adaption for RMSNorm by @leisuzz in #10534
+ 297 more

📦 Significant community contributions

The following contributors have made significant changes to the library over the last release:
@guiyrt
IP-Adapter for `StableDiffusion3Img2ImgPipeline` (#10589)
[Docs] Update SD3 ip_adapter model_id to diffusers checkpoint (#10597)
`MultiControlNetUnionModel` on SDXL (#10747)
SD3 IP-Adapter runtime checkpoint conversion (#10718)
Comprehensive type checking for `from_pretrained` kwargs (#10758)
Multi IP-Adapter for Flux pipelines (#10867)
+ 96 more

v0.32.2

DN6·1y ago·January 15, 2025

GitHub

📋 Changes

Fixes a regression in loading Comfy UI format single file checkpoints for Flux
Fixes a regression in loading LoRAs with bitsandbytes 4bit quantized Flux models
Adds `unload_lora_weights` for Flux Control
Fixes a bug that prevents Hunyuan Video from running with batch size > 1
Allow Hunyuan Video to load LoRAs created from the original repository code

📦 All commits

[Single File] Fix loading Flux Dev finetunes with Comfy Prefix by @DN6 in #10545
[CI] Update HF Token on Fast GPU Model Tests by @DN6 #10570
[CI] Update HF Token in Fast GPU Tests by @DN6 #10568
Fix batch > 1 in HunyuanVideo by @hlky in #10548
Fix HunyuanVideo produces NaN on PyTorch<2.5 by @hlky in #10482
Fix hunyuan video attention mask dim by @a-r-r-o-w in #10454
[LoRA] Support original format loras for HunyuanVideo by @a-r-r-o-w in #10376
[LoRA] feat: support loading loras into 4bit quantized Flux models. by @sayakpaul in #10578
+ 4 more

v0.32.1

a-r-r-o-w·1y ago·December 25, 2024

GitHub

📋 Changes

Importing Diffusers would raise an error in PyTorch versions lower than 2.3.0. This should no longer be a problem.
Device Map does not work as expected when using the quantizer. We now raise an error if it is used. Support for using device maps with different quantization backends will be added in the near future.
Quantization was not performed due to faulty logic. This is now fixed and better tested.

📦 All commits

make style for https://github.com/huggingface/diffusers/pull/10368 by @yiyixuxu in #10370
fix test pypi installation in the release workflow by @sayakpaul in #10360
Fix TorchAO related bugs; revert device_map changes by @a-r-r-o-w in #10371

Diffusers 0.32.0: New video pipelines, new image pipelines, new quantization backends, new training scripts, and morev0.32.0

sayakpaul·1y ago·December 23, 2024

GitHub

✨ New Video Generation Pipelines 📹

Open video generation models are on the rise, and we’re pleased to provide comprehensive integration support for all of them. The following video pipelines are bundled in this release:
[Mochi-1](https://huggingface.co/docs/diffusers/main/en/api/pipelines/mochi)
[Allegro](https://huggingface.co/docs/diffusers/main/en/api/pipelines/allegro)
[LTXVideo](https://huggingface.co/docs/diffusers/main/en/api/pipelines/ltx_video)
[HunyuanVideo](https://huggingface.co/docs/diffusers/main/en/api/pipelines/hunyuan_video)
Check out [this section](https://www.notion.so/Diffusers-0-32-0-release-15f1384ebcac8091ac5bf18c128639ab?pvs=21) to learn more about the fine-tuning options available for these new video models.

✨ New Image Generation Pipelines

SANA
[Text-to-image](https://huggingface.co/docs/diffusers/main/en/api/pipelines/sana#diffusers.SanaPipeline)
[PAG](https://huggingface.co/docs/diffusers/main/en/api/pipelines/sana#diffusers.SanaPAGPipeline)
Flux Control (including Control LoRA)
[Depth Control](https://huggingface.co/docs/diffusers/main/en/api/pipelines/flux#depth-control)
[Canny Control](https://huggingface.co/docs/diffusers/main/en/api/pipelines/flux#canny-control)
[Flux Redux](https://huggingface.co/docs/diffusers/main/en/api/pipelines/flux#redux)
[Flux Fill Inpainting / Outpainting](https://huggingface.co/docs/diffusers/main/en/api/pipelines/flux#fill-inpaintingoutpainting)
+ 33 more

📦 Acknowledgements

Shoutout to @lawrence-cj and @chenjy2003 for contributing SANA in [this PR](https://github.com/huggingface/diffusers/pull/9982). SANA also features a Deep Compression Autoencoder, which was contributed by @lawrence-cj in [this PR](https://github.com/huggingface/diffusers/pull/9708).
Shoutout to @guiyrt for contributing SD3.5 IP Adapter in [this PR](https://github.com/huggingface/diffusers/pull/9987).

✨ New Quantization Backends

[TorchAO](https://huggingface.co/docs/diffusers/main/en/quantization/torchao)
[GGUF](https://huggingface.co/docs/diffusers/main/en/quantization/gguf)
Please be aware of the following caveats:
TorchAO quantized checkpoints cannot be serialized in `safetensors` currently. This may change in the future.
GGUF currently only supports loading pre-quantized checkpoints into models in this release. Support for saving models with GGUF quantization will be added in the future.

✨ New training scripts

This release features many new training scripts for the community to play:
[Flux Control](https://github.com/huggingface/diffusers/tree/main/examples/flux-control)
[Mochi-1](https://github.com/a-r-r-o-w/finetrainers)
[LTXVideo](https://github.com/a-r-r-o-w/finetrainers?tab=readme-ov-file#quickstart)
[SANA](https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/README_sana.md)
[Hunyuan Video](https://github.com/a-r-r-o-w/finetrainers?tab=readme-ov-file#quickstart)

📦 All commits

post-release 0.31.0 by @sayakpaul in #9742
fix bug in `require_accelerate_version_greater` by @faaany in #9746
[Official callbacks] SDXL Controlnet CFG Cutoff by @asomoza in #9311
[SD3-5 dreambooth lora] update model cards by @linoytsaban in #9749
config attribute not foud error for FluxImagetoImage Pipeline for multi controlnet solved by @rshah240 in #9586
Some minor updates to the nightly and push workflows by @sayakpaul in #9759
[Docs] fix docstring typo in SD3 pipeline by @shenzhiy21 in #9765
[bugfix] bugfix for npu free memory by @leisuzz in #9640
+ 253 more

📦 Significant community contributions

The following contributors have made significant changes to the library over the last release:
@faaany
fix bug in `require_accelerate_version_greater` (#9746)
make `pipelines` tests device-agnostic (part1) (#9399)
make `pipelines` tests device-agnostic (part2) (#9400)
@linoytsaban
[SD3-5 dreambooth lora] update model cards (#9749)
[SD 3.5 Dreambooth LoRA] support configurable training block & layers (#9762)
+ 105 more

v0.31.0

sayakpaul·1y ago·October 22, 2024

GitHub

📦 Stable Diffusion 3.5 Large

A regular one
A timestep-distilled one enabling few-step inference
Make sure to fill up the form by going to the [model page](https://huggingface.co/stabilityai/stable-diffusion-3.5-large), and then run `huggingface-cli login` before running the code below.
```python
import torch
from diffusers import StableDiffusion3Pipeline
pipe = StableDiffusion3Pipeline.from_pretrained(
"stabilityai/stable-diffusion-3.5-large", torch_dtype=torch.bfloat16
+ 12 more

📦 Cogview3-plus

We added a new text-to-image model, Cogview3-plus, from the THUDM team! The model is DiT-based and supports image generation from 512 to 2048px. Thanks to @zRzRzRzRzRzRzR for contributing it!
```python
from diffusers import CogView3PlusPipeline
import torch
pipe = CogView3PlusPipeline.from_pretrained("THUDM/CogView3-Plus-3B", torch_dtype=torch.float16).to("cuda")
pipe.enable_model_cpu_offload()
pipe.vae.enable_slicing()
pipe.vae.enable_tiling()
+ 11 more

📦 Quantization

The example below shows how to run Flux.1 Dev with the NF4 data-type. Make sure you install the libraries:
```bash
pip install -Uq git+https://github.com/huggingface/transformers@main
pip install -Uq bitsandbytes
pip install -Uq diffusers
```
```python
from diffusers import BitsAndBytesConfig, FluxTransformer2DModel
+ 32 more

📦 Training scripts

We have a fresh bucket of training scripts with this release:
[Advanced Flux.1 trainer](https://huggingface.co/blog/linoyts/new-advanced-flux-dreambooth-lora)
[CogVideoX trainer](https://github.com/huggingface/diffusers/tree/main/examples/cogvideo)

📦 Misc

We now support the loading of different kinds of Flux LoRAs, including Kohya, TheLastBen, and Xlabs.
Loading of Xlabs Flux ControlNets is also now supported. Thanks to @Anghellia for contributing it!

📦 All commits

Feature flux controlnet img2img and inpaint pipeline by @ighoshsubho in #9408
Remove CogVideoX mentions from single file docs; Test updates by @a-r-r-o-w in #9444
set max_shard_size to None for pipeline save_pretrained by @a-r-r-o-w in #9447
adapt masked im2im pipeline for SDXL by @noskill in #7790
[Flux] add lora integration tests. by @sayakpaul in #9353
[training] CogVideoX Lora by @a-r-r-o-w in #9302
Several fixes to Flux ControlNet pipelines by @vladmandic in #9472
[refactor] LoRA tests by @a-r-r-o-w in #9481
+ 106 more

📦 Significant community contributions

The following contributors have made significant changes to the library over the last release:
@ighoshsubho
Feature flux controlnet img2img and inpaint pipeline (#9408)
flux controlnet control_guidance_start and control_guidance_end implement (#9571)
@noskill
adapt masked im2im pipeline for SDXL (#7790)
@saqlain2204
[Tests] Reduce the model size in the lumina test (#8985)
+ 43 more

v0.30.3: CogVideoX Image-to-Video and Video-to-Videov0.30.3

a-r-r-o-w·1y ago·September 17, 2024

GitHub

📋 Changes

CogVideoXImageToVideoPipeline
CogVideoXVideoToVideoPipeline

📦 CogVideoXImageToVideoPipeline

The code below demonstrates how to use the new image-to-video pipeline:
```python
import torch
from diffusers import CogVideoXImageToVideoPipeline
from diffusers.utils import export_to_video, load_image
pipe = CogVideoXImageToVideoPipeline.from_pretrained("THUDM/CogVideoX-5b-I2V", torch_dtype=torch.bfloat16)
pipe.to("cuda")
pipe.enable_model_cpu_offload()
+ 13 more

📦 CogVideoXVideoToVideoPipeline

The code below demonstrates how to use the new video-to-video pipeline:
```python
import torch
from diffusers import CogVideoXDPMScheduler, CogVideoXVideoToVideoPipeline
from diffusers.utils import export_to_video, load_video
pipe = CogVideoXVideoToVideoPipeline.from_pretrained("THUDM/CogVideoX-5b-trial", torch_dtype=torch.bfloat16)
pipe.scheduler = CogVideoXDPMScheduler.from_config(pipe.scheduler.config)
pipe.to("cuda")
+ 21 more

📦 All commits

[core] Support VideoToVideo with CogVideoX by @a-r-r-o-w in #9333
[core] CogVideoX memory optimizations in VAE encode by @a-r-r-o-w in #9340
[CI] Quick fix for Cog Video Test by @DN6 in #9373
[refactor] move positional embeddings to patch embed layer for CogVideoX by @a-r-r-o-w in #9263
CogVideoX-5b-I2V support by @zRzRzRzRzRzRzR in #9418

v0.30.2: Update from single file default repositoryv0.30.2

asomoza·1y ago·August 31, 2024

GitHub

📦 All commits

update runway repo for single_file by @yiyixuxu in #9323
Fix Flux CLIP prompt embeds repeat for num_images_per_prompt > 1 by @DN6 in #9280
[IP Adapter] Fix cache_dir and local_files_only for image encoder by @asomoza in #9272

V0.30.1: CogVideoX-5B & Bug fixesv0.30.1

yiyixuxu·1y ago·August 24, 2024

GitHub

📦 CogVideoX-5B

The code below shows how to generate a video with CogVideoX-5B
```python
import torch
from diffusers import CogVideoXPipeline
from diffusers.utils import export_to_video
pipe = CogVideoXPipeline.from_pretrained(
"THUDM/CogVideoX-5b",
torch_dtype=torch.bfloat16
+ 14 more

📦 All commits

Update Video Loading/Export to use `imageio` by @DN6 in #9094
[refactor] CogVideoX followups + tiled decoding support by @a-r-r-o-w in #9150
Add Learned PE selection for Auraflow by @cloneofsimo in #9182
[Single File] Fix configuring scheduler via legacy kwargs by @DN6 in #9229
[Flux LoRA] support parsing alpha from a flux lora state dict. by @sayakpaul in #9236
[tests] fix broken xformers tests by @a-r-r-o-w in #9206
Cogvideox-5B Model adapter change by @zRzRzRzRzRzRzR in #9203
[Single File] Support loading Comfy UI Flux checkpoints by @DN6 in #9243

v0.30.0: New Pipelines (Flux, Stable Audio, Kolors, CogVideoX, Latte, and more), New Methods (FreeNoise, SparseCtrl), and New Refactorsv0.30.0

sayakpaul·1y ago·August 7, 2024

GitHub

✨ New pipelines

![Untitled](https://github.com/user-attachments/assets/a313ceba-248b-4c09-9f0e-85050b4c3df7)
Image taken from the [Lumina’s GitHub](https://github.com/Alpha-VLLM/Lumina-T2X/blob/main/assets/lumina-next.pdf).
This release features many new pipelines. Below, we provide a list:
Audio pipelines 🎼
[Stable Audio](https://huggingface.co/docs/diffusers/main/en/api/pipelines/stable_audio)
Video pipelines 📹
[Latte](https://huggingface.co/docs/diffusers/main/en/api/pipelines/latte) (thanks to @maxin-cn for the contribution through #8404)
[CogVideoX](https://huggingface.co/docs/diffusers/main/en/api/pipelines/cogvideox) (thanks to @zRzRzRzRzRzRzR for the contribution through #9082)
+ 11 more

📦 Perturbed Attention Guidance (PAG)

| Without PAG | With PAG |
|-------------|----------|
| ![](https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/pag_0.0_cfg_7.0_mid.png) | ![](https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/pag_3.0_cfg_7.0_mid.png)|
`StableDiffusionPAGPipeline`
`StableDiffusion3PAGPipeline`
`StableDiffusionControlNetPAGPipeline`
`StableDiffusionXLPAGPipeline`
`StableDiffusionXLPAGImg2ImgPipeline`
+ 9 more

📦 AnimateDiff with SparseCtrl

There are two SparseCtrl-specific checkpoints and a Motion LoRA made available by the authors namely:
[SparseCtrl Scribble](https://huggingface.co/guoyww/animatediff-sparsectrl-scribble)
[SparseCtrl RGB](https://huggingface.co/guoyww/animatediff-sparsectrl-rgb)
[Motion LoRA v1-5-3](https://huggingface.co/guoyww/animatediff-motion-lora-v1-5-3)
Scribble Interpolation Example:
<table>
<tr>
<td><img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/animatediff-scribble-1.png" alt="Image 1"></td>
+ 46 more

📦 FreeNoise for AnimateDiff

FreeNoise is a training-free method that allows extending the generative capabilities of pretrained video diffusion models beyond their existing context/frame limits.
```python
import torch
from diffusers import AnimateDiffPipeline, MotionAdapter, EulerAncestralDiscreteScheduler
from diffusers.utils import export_to_gif
adapter = MotionAdapter.from_pretrained("guoyww/animatediff-motion-adapter-v1-5-2", torch_dtype=torch.float16)
pipe = AnimateDiffPipeline.from_pretrained("SG161222/Realistic_Vision_V6.0_B1_noVAE", motion_adapter=adapter, torch_dtype=torch.float16)
pipe.scheduler = EulerAncestralDiscreteScheduler(
+ 16 more

♻️ LoRA refactor

To learn more details, please follow [this PR](https://github.com/huggingface/diffusers/pull/8774). If you see any LoRA-related issues stemming from these refactors, please open an issue.

📦 All commits

[Release notification] add some info when there is an error. by @sayakpaul in #8718
Modify FlowMatch Scale Noise by @asomoza in #8678
Fix json WindowsPath crash by @vincedovy in #8662
Motion Model / Adapter versatility by @Arlaz in #8301
[Chore] perform better deprecation for vqmodeloutput by @sayakpaul in #8719
[Advanced dreambooth lora] adjustments to align with canonical script by @linoytsaban in #8406
[Tests] Fix precision related issues in slow pipeline tests by @DN6 in #8720
fix: ValueError when using FromOriginalModelMixin in subclasses #8440 by @fkcptlst in #8454
+ 149 more

📦 Significant community contributions

The following contributors have made significant changes to the library over the last release:
@DN6
[Tests] Fix precision related issues in slow pipeline tests (#8720)
Remove legacy single file model loading mixins (#8754)
Enforce ordering when running Pipeline slow tests (#8763)
Fix warning in UNetMotionModel (#8756)
Fix indent in dreambooth lora advanced SD 15 script (#8753)
Fix mistake in Single File Docs page (#8765)
+ 54 more

v0.29.2: fix deprecation and LoRA bugs 🐞v0.29.2

sayakpaul·1y ago·June 27, 2024

GitHub

📦 All commits

[SD3] Fix mis-matched shape when num_images_per_prompt > 1 using without T5 (text_encoder_3=None) by @Dalanke in #8558
[LoRA] refactor lora conversion utility. by @sayakpaul in #8295
[LoRA] fix conversion utility so that lora dora loads correctly by @sayakpaul in #8688
[Chore] remove deprecation from transformer2d regarding the output class. by @sayakpaul in #8698
[LoRA] fix vanilla fine-tuned lora loading. by @sayakpaul in #8691
Release: v0.29.2 by @sayakpaul (direct commit on v0.29.2-patch)

v0.29.1: SD3 ControlNet, Expanded SD3 `from_single_file` support, Using long Prompts with T5 Text Encoder & Bug fixesv0.29.1

yiyixuxu·1y ago·June 21, 2024

GitHub

📦 SD3 CntrolNet

<img width="624" alt="image" src="https://github.com/huggingface/diffusers/assets/46553287/db384753-cfbb-488c-bc74-8280f9bee24e">
```python
import torch
from diffusers import StableDiffusion3ControlNetPipeline
from diffusers.models import SD3ControlNetModel, SD3MultiControlNetModel
from diffusers.utils import load_image
controlnet = SD3ControlNetModel.from_pretrained("InstantX/SD3-Controlnet-Canny", torch_dtype=torch.float16)
pipe = StableDiffusion3ControlNetPipeline.from_pretrained(
+ 10 more

📦 Expanded single file support

We now support all available single-file checkpoints for sd3 in diffusers! To load the single file checkpoint with t5
```python
import torch
from diffusers import StableDiffusion3Pipeline
pipe = StableDiffusion3Pipeline.from_single_file(
"https://huggingface.co/stabilityai/stable-diffusion-3-medium/blob/main/sd3_medium_incl_clips_t5xxlfp8.safetensors",
torch_dtype=torch.float16,
)
+ 4 more

📦 Using Long Prompts with the T5 Text Encoder

```python
image = pipe(
prompt=prompt,
negative_prompt="",
num_inference_steps=28,
guidance_scale=4.5,
max_sequence_length=512,
).images[0]
+ 3 more

📦 All commits

Release: v0.29.0 by @sayakpaul (direct commit on v0.29.1-patch)
prepare for patch release by @yiyixuxu (direct commit on v0.29.1-patch)
fix warning log for Transformer SD3 by @sayakpaul in #8496
Add SD3 AutoPipeline mappings by @Beinsezii in #8489
Add Hunyuan AutoPipe mapping by @Beinsezii in #8505
Expand Single File support in SD3 Pipeline by @DN6 in #8517
[Single File Loading] Handle unexpected keys in CLIP models when `accelerate` isn't installed. by @DN6 in #8462
Fix sharding when no device_map is passed by @SunMarc in #8531
+ 5 more

📦 Significant community contributions

The following contributors have made significant changes to the library over the last release:
@wangqixun
Support SD3 ControlNet and Multi-ControlNet. (#8566)

v0.29.0: Stable Diffusion 3v0.29.0

sayakpaul·1y ago·June 12, 2024

GitHub

This release emphasizes Stable Diffusion 3, Stability AI’s latest iteration of the Stable Diffusion family of models. It was introduced in [Scaling Rectified Flow Transformers for High-Resolution Image Synthesis](https://arxiv.org/abs/2403.03206) by Patrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas Müller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, Dustin Podell, Tim Dockhorn, Zion English, Kyle Lacey, Alex Goodwin, Yannik Marek, and Robin Rombach. As the model is gated, before using it with `diffusers`, you first need to go to the [Stable Diffusion 3 Medium Hugging Face page](https://huggingface.co/stabilityai/stable-diffusion-3-medium-diffusers), fill in the form and accept the gate. Once you are in, you need to log in so that your system knows you’ve accepted the gate. ```bash huggingface-cli login ``` The code below shows how to perform text-to-image generation with SD3: ```python import torch from diffusers import StableDiffusion3Pipeline pipe = StableDiffusion3Pipeline.from_pretrained("stabilityai/stable-diffusion-3-medium-diffusers", torch_dtype=torch.float16) pipe = pipe.to("cuda") image = pipe( "A cat holding a sign that says hello world", negative_prompt="", num_inference_steps=28, guidance_scale=7.0, ).images[0] image ``` ![image](https://github.com/huggingface/diffusers/assets/22957388/30917935-6649-447e-8bf2-c4c9378562de) Refer to [our documentation](https://huggingface.co/docs/diffusers/main/en/api/pipelines/stable_diffusion/stable_diffusion_3) for learning all the optimizations you can apply to SD3 as well as the image-to-image pipeline. Additionally, we support DreamBooth + LoRA fine-tuning of Stable Diffusion 3 through rectified flow. Check out [this directory](https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/README_sd3.md) for more details.

v0.28.2: fix `from_single_file` clip model checkpoint key error 🐞 v0.28.2

yiyixuxu·1y ago·June 4, 2024

GitHub

📋 Changes

Change checkpoint key used to identify CLIP models in single file checkpoints by @DN6 in #8319

v0.28.1: HunyuanDiT and Transformer2D model class variantsv0.28.1

sayakpaul·1y ago·June 4, 2024

GitHub

📦 Hunyuan DiT

![image](https://github.com/gnobitab/diffusers-hunyuan/assets/1157982/39b99036-c3cb-4f16-bb1a-40ec25eda573)
```python
import torch
from diffusers import HunyuanDiTPipeline
pipe = HunyuanDiTPipeline.from_pretrained(
"Tencent-Hunyuan/HunyuanDiT-Diffusers", torch_dtype=torch.float16
)
pipe.to("cuda")
+ 6 more

📦 All commits

Release: v0.28.0 by @sayakpaul (direct commit on v0.28.1-patch)
[Core] Introduce class variants for `Transformer2DModel` by @sayakpaul in #7647
resolve comflicts by @toshas (direct commit on v0.28.1-patch)
Tencent Hunyuan Team: add HunyuanDiT related updates by @gnobitab in #8240
Tencent Hunyuan Team - Updated Doc for HunyuanDiT by @gnobitab in #8383
[Transformer2DModel] Handle `norm_type` safely while remapping by @sayakpaul in #8370
Release: v0.28.1 by @sayakpaul (direct commit on v0.28.1-patch)

📦 Significant community contributions

The following contributors have made significant changes to the library over the last release:
@gnobitab
Tencent Hunyuan Team: add HunyuanDiT related updates (#8240)
Tencent Hunyuan Team - Updated Doc for HunyuanDiT (#8383)

v0.28.0: Marigold, PixArt Sigma, AnimateDiff SDXL, InstantStyle, VQGAN Training Script, and morev0.28.0

sayakpaul·2y ago·May 27, 2024

GitHub

📦 Marigold

![marigold](https://github.com/huggingface/diffusers/assets/22957388/e704585f-d29e-41f0-8307-2bdcff0f47ac)
_(Image taken from the [official repository](https://github.com/prs-eth/Marigold))_
The code snippet below shows how to use this pipeline for depth estimation:
```python
import diffusers
import torch
pipe = diffusers.MarigoldDepthPipeline.from_pretrained(
"prs-eth/marigold-depth-lcm-v1-0", variant="fp16", torch_dtype=torch.float16
+ 9 more

♻️ 🌀 Massive Refactor of `from_single_file` 🌀

Some of the changes introduced in this refactor:
```python
pipe = StableDiffusionPipeline.from_single_file("...", config=<model repo id or local repo path>)
```

📦 PixArt Sigma

<div align="center">
<img src="https://github.com/huggingface/diffusers/assets/22957388/31f2b30b-e46f-4fc9-aeb7-a6dea50b474b" width=700/><br>
<small>(Taken from the <a href="https://pixart-alpha.github.io/PixArt-sigma-project">project website</a>.)</small>
</div>
<br>
```python
import torch
from diffusers import PixArtSigmaPipeline
+ 9 more

📦 AnimateDiff SDXL

```python
import torch
from diffusers.models import MotionAdapter
from diffusers import AnimateDiffSDXLPipeline, DDIMScheduler
from diffusers.utils import export_to_gif
adapter = MotionAdapter.from_pretrained("guoyww/animatediff-motion-adapter-sdxl-beta", torch_dtype=torch.float16)
model_id = "stabilityai/stable-diffusion-xl-base-1.0"
scheduler = DDIMScheduler.from_pretrained(
+ 28 more

📦 Block-wise LoRA

```python
...
adapter_weight_scales = { "unet": { "down": 0, "mid": 1, "up": 0} }
pipe.set_adapters("pixel", adapter_weight_scales)
image = pipe(
prompt, num_inference_steps=30, generator=torch.manual_seed(0)
).images[0]
```
+ 1 more

📦 InstantStyle

```python
...
scale = {
"down": {"block_2": [0.0, 1.0]},
"up": {"block_0": [0.0, 1.0, 0.0]},
}
pipeline.set_ip_adapter_scale(scale)
```
+ 1 more

📦 ControlNetXS

Thanks to @UmerHA for contributing ControlNet-XS in #5827 and #6772.

📦 Custom Timesteps

```python
from diffusers.schedulers import AysSchedules
sampling_schedule = AysSchedules["StableDiffusionXLTimesteps"]
pipe = StableDiffusionXLPipeline.from_pretrained(
"SG161222/RealVisXL_V4.0",
torch_dtype=torch.float16,
variant="fp16",
).to("cuda")
+ 5 more

📦 `device_map` in Pipelines 🧪

```python
from diffusers import DiffusionPipeline
import torch
pipeline = DiffusionPipeline.from_pretrained(
"runwayml/stable-diffusion-v1-5",
torch_dtype=torch.float16,
device_map="balanced"
)
+ 16 more

✨ New Guides 📑

[ControlNet Outpainting](https://huggingface.co/blog/OzzyGT/outpainting-controlnet): Learn how to do outpainting with a specific [ControlNet model](https://huggingface.co/destitech/controlnet-inpaint-dreamer-sdxl) trained for this task. This method is best for creative outpainting.
[Differential Diffusion Outpainting](https://huggingface.co/blog/OzzyGT/outpainting-differential-diffusion): Use a [novel framework](https://github.com/exx8/differential-diffusion) that enables customization of the amount of change per pixel or per image region, allowing seamless outpainting. This can be used for expanding images beyond their initial size.
[Outpainting using an Inpaint Model](https://huggingface.co/blog/OzzyGT/outpainting-inpaint-model): Using various techniques, learn how to use a regular inpainting model to do outpainting while preserving the original subject intact. This is ideal for product catalogs.

📦 Official Callbacks

We introduced official callbacks that you can conveniently plug into your pipeline. For example, to turn off classifier-free guidance after denoising steps with `SDXLCFGCutoffCallback`.
```python
import torch
from diffusers import DiffusionPipeline
from diffusers.callbacks import SDXLCFGCutoffCallback
callback = SDXLCFGCutoffCallback(cutoff_step_ratio=0.4)
pipeline = StableDiffusionXLPipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0",
+ 11 more

📦 Community Pipelines and `from_pipe` API

Read more about `from_pipe` API in our [documentation](https://huggingface.co/docs/diffusers/main/en/using-diffusers/loading#reuse-a-pipeline) 📃.
Here are four new community pipelines since our last release.

📦 BoxDiff

```python
pipe_box = DiffusionPipeline.from_pipe(
pipe_sd,
custom_pipeline="pipeline_stable_diffusion_boxdiff",
)
pipe_box.enable_model_cpu_offload()
phrases = ["aurora","reindeer","meadow","lake","mountain"]
boxes = [[1,3,512,202], [75,344,421,495], [1,327,508,507], [2,217,507,341], [1,135,509,242]]
+ 15 more

📦 HD-Painter

```python
pipe = DiffusionPipeline.from_pipe(
pipe_box,
custom_pipeline="hd_painter"
)
pipe.scheduler = DDIMScheduler.from_config(pipe.scheduler.config)
prompt = "wooden boat"
init_image = load_image("https://raw.githubusercontent.com/Picsart-AI-Research/HD-Painter/main/__assets__/samples/images/2.jpg")
+ 4 more

📦 Differential Diffusion

```python
pipeline = DiffusionPipeline.from_pipe(
pipe_sdxl,
custom_pipeline="pipeline_stable_diffusion_xl_differential_img2img",
).to("cuda")
pipeline.scheduler = DPMSolverMultistepScheduler.from_config(pipeline.scheduler.config, use_karras_sigmas=True)
prompt = "a green pear"
negative_prompt = "blurry"
+ 12 more

📦 All Commits

clean dep installation step in push_tests by @sayakpaul in #7382
[LoRA test suite] refactor the test suite and cleanse it by @sayakpaul in #7316
[Custom Pipelines with Custom Components] fix multiple things by @sayakpaul in #7304
Fix typos by @standardAI in #7411
fix: enable unet_3d_condition to support time_cond_proj_dim by @yhZhai in #7364
add: space within docs to calculate mememory usage. by @sayakpaul (direct commit on v0.28.0-release)
Revert "add: space within docs to calculate mememory usage." by @sayakpaul (direct commit on v0.28.0-release)
[Docs] add missing output image by @sayakpaul in #7425
+ 232 more

📦 Significant community contributions

The following contributors have made significant changes to the library over the last release:
@standardAI
Fix typos (#7411)
[`IP-Adapter`] Fix IP-Adapter Support and Refactor Callback for `StableDiffusionPanoramaPipeline` (#7262)
[`Docs`] Fix typos (#7451)
Fix Tiling in `ConsistencyDecoderVAE` (#7290)
Fix CPU offload in docstring (#7827)
Fix image upcasting (#7858)
+ 46 more

v0.27.2: Fix scheduler `add_noise` 🐞, embeddings in StableCascade, `scale` when using LoRA v0.27.2

sayakpaul·2y ago·March 20, 2024

GitHub

📦 All commits

[scheduler] fix a bug in add_noise by @yiyixuxu in https://github.com/huggingface/diffusers/pull/7386
[LoRA] fix cross_attention_kwargs problems and tighten tests by @sayakpaul in https://github.com/huggingface/diffusers/pull/7388
Fix issue with prompt embeds and latents in SD Cascade Decoder with multiple image embeddings for a single prompt. by @DN6 in https://github.com/huggingface/diffusers/pull/7381

v0.27.1: Clear `scale` argument confusion for LoRAv0.27.1

sayakpaul·2y ago·March 19, 2024

GitHub

📦 All commits

Release: v0.27.0 by @DN6 (direct commit on v0.27.1-patch)
[LoRA] pop the LoRA scale so that it doesn't get propagated to the weeds by @sayakpaul in #7338
Release: 0.27.1-patch by @sayakpaul (direct commit on v0.27.1-patch)

v0.27.0: Stable Cascade, Playground v2.5, EDM-style training, IP-Adapter image embeds, and morev0.27.0

sayakpaul·2y ago·March 14, 2024

GitHub

📦 Stable Cascade

```python
from diffusers import StableCascadePriorPipeline, StableCascadeDecoderPipeline
import torch
prior = StableCascadePriorPipeline.from_pretrained(
"stabilityai/stable-cascade-prior",
torch_dtype=torch.bfloat16,
).to("cuda")
prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
+ 10 more

📦 Playground v2.5

```python
from diffusers import DiffusionPipeline
import torch
pipe = DiffusionPipeline.from_pretrained(
"playgroundai/playground-v2.5-1024px-aesthetic",
torch_dtype=torch.float16,
variant="fp16",
).to("cuda")
+ 38 more

📦 EDM-style training support

To train `stabilityai/stable-diffusion-xl-base-1.0` using the EDM formulation, you just have to specify the `--do_edm_style_training` flag in your training command, and voila 🤗
If you’re interested in extending this formulation to other training scripts, we refer you to [this PR](https://github.com/huggingface/diffusers/pull/7126).

📦 Trajectory Consistency Distillation

```python
import torch
from diffusers import StableDiffusionXLPipeline, TCDScheduler
device = "cuda"
base_model_id = "stabilityai/stable-diffusion-xl-base-1.0"
tcd_lora_id = "h1t/TCD-SDXL-LoRA"
pipe = StableDiffusionXLPipeline.from_pretrained(base_model_id, torch_dtype=torch.float16, variant="fp16").to(device)
pipe.scheduler = TCDScheduler.from_config(pipe.scheduler.config)
+ 13 more

📦 IP-Adapter image embeddings and masking

📜 To know about the exact usage of both of the above, refer to our [official guide](https://huggingface.co/docs/diffusers/main/en/using-diffusers/ip_adapter).
We thank our community members, @fabiorigano, @asomoza, and @cubiq, for their guidance and input on these features.

📦 Guide on merging LoRAs

Merging LoRAs can be a fun and creative way to create new and unique images. Diffusers provides merging support with the `set_adapters` method which concatenates the weights of the LoRAs to merge.
📜 Take a look at the [Merge LoRAs](https://huggingface.co/docs/diffusers/main/en/using-diffusers/merge_loras) guide to learn more about merging in Diffusers.

📦 LEDITS++

The code snippet below shows a usage:
```python
import torch
import PIL
import requests
from io import BytesIO
from diffusers import LEditsPPPipelineStableDiffusionXL, AutoencoderKL
device = "cuda"
+ 31 more

📦 All commits

Fix flaky IP Adapter test by @DN6 in #6960
Move SDXL T2I Adapter lora test into PEFT workflow by @DN6 in #6965
Allow passing `config_file` argument to ControlNetModel when using `from_single_file` by @DN6 in #6959
[`PEFT` / `docs`] Add a note about torch.compile by @younesbelkada in #6864
[Core] Harmonize single file ckpt model loading by @sayakpaul in #6971
fix: controlnet inpaint single file. by @sayakpaul in #6975
[docs] IP-Adapter by @stevhliu in #6897
fix IPAdapter unload_ip_adapter test by @yiyixuxu in #6972
+ 161 more

📦 Significant community contributions

The following contributors have made significant changes to the library over the last release:
@ihkap11
Fix diffusers import prompt2prompt (#6927)
Refactor Prompt2Prompt: Inherit from DiffusionPipeline (#7211)
@ustcuna
[Community Pipelines]Accelerate inference of stable diffusion xl (SDXL) by IPEX on CPU (#6683)
@rootonchair
IP-Adapter support for StableDiffusionXLControlNetInpaintPipeline (#6941)
+ 29 more

v0.26.3: Patch release to fix DPMSolverSinglestepScheduler and configuring VAE from single file mixinv0.26.3

yiyixuxu·2y ago·February 13, 2024

GitHub

📦 All commits

Fix configuring VAE from single file mixin by @DN6 in #6950
[DPMSolverSinglestepScheduler] correct `get_order_list` for `solver_order=2`and `lower_order_final=True` by @yiyixuxu in #6953

v0.26.2: Patch fix for adding `self.use_ada_layer_norm_*` params back to `BasicTransformerBlock`v0.26.2

sayakpaul·2y ago·February 6, 2024

GitHub

📦 All commits

add `self.use_ada_layer_norm_*` params back to `BasicTransformerBlock` by @yiyixuxu in #6841

v0.26.1: Patch release to fix `torchvision` dependencyv0.26.1

sayakpaul·2y ago·February 2, 2024

GitHub

📦 All commits

add is_torchvision_available by @yiyixuxu in #6800

View all releases on GitHub

← Back to diffusers wiki