huggingface/diffusers
🤗 Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch.
📦 LLaDA2
- PR: https://github.com/huggingface/diffusers/pull/13226
- Docs: [https://huggingface.co/docs/diffusers/main/api/pipelines/llada2](https://huggingface.co/docs/diffusers/main/api/pipelines/llada2)
📦 Nucleus-MoE
- NucleusMoE-Image is a 2B active 17B parameter model trained with efficiency at its core. Our novel architecture highlights the scalability of a sparse MoE architecture for Image generation.
- PR: https://github.com/huggingface/diffusers/pull/13317
- Docs: [https://huggingface.co/docs/diffusers/main/api/pipelines/nucleusmoe_image](https://huggingface.co/docs/diffusers/main/api/pipelines/nucleusmoe_image)
- Thanks to @sippycoder for the contribution.
📦 Ernie-Image
- ERNIE-Image is a powerful and highly efficient image generation model with 8B parameters.
- PR: https://github.com/huggingface/diffusers/pull/13432
- Docs: [https://huggingface.co/docs/diffusers/main/api/pipelines/ernie_image](https://huggingface.co/docs/diffusers/main/api/pipelines/ernie_image)
- Thanks to @HsiaWinter for the contribution.
📦 LongCat-AudioDiT
- LongCat-AudioDiT is a text-to-audio diffusion model from Meituan LongCat.
- PR: https://github.com/huggingface/diffusers/pull/13483
- Docs: [https://huggingface.co/docs/diffusers/main/api/pipelines/longcat_audio_dit](https://huggingface.co/docs/diffusers/main/api/pipelines/longcat_audio_dit)
- Thanks to @RuixiangMa for the contribution.
📦 Ace-Step 1.5
- PR: [https://github.com/huggingface/diffusers/pull/13095](https://github.com/huggingface/diffusers/pull/13095)
- Docs: [https://huggingface.co/docs/diffusers/main/api/pipelines/ace_step](https://huggingface.co/docs/diffusers/main/api/pipelines/ace_step)
- Thanks to @[ChuxiJ](https://github.com/ChuxiJ) for the contribution.
📦 Modular Pipeline Support
- We added modular support for LTX-2 and Hunyuan 1.5.
📦 Core Library
- [Flash Attention 4 backend](https://github.com/huggingface/diffusers/issues/13280)
- [FlashPack loading](https://github.com/huggingface/diffusers/issues/12700)
- [Group offloading + TorchAO](https://github.com/huggingface/diffusers/pull/13276)
- [`ring_anything` as a new CP backend](https://github.com/huggingface/diffusers/pull/13545)
- [Profiling pipelines in Diffusers](https://github.com/huggingface/diffusers/pull/13356)
📦 All commits
- [Discrete Diffusion] Add LLaDA2 pipeline by @kashif in #13226
- [LLADA2] documentation fixes by @kashif in #13333
- [ci] claude in ci. by @sayakpaul in #13297
- [docs] kernels by @stevhliu in #13139
- [tests] Tests for conditional pipeline blocks by @sayakpaul in #13247
- avoid hardcode device in flux-control example by @kaixuanliu in #13336
- fix claude workflow to include id-token with write. by @sayakpaul in #13338
- Update LTX-2 Docs to Cover LTX-2.3 Models by @dg845 in #13337
- + 96 more
📦 Significant community contributions
- The following contributors have made significant changes to the library over the last release:
- @kashif
- [Discrete Diffusion] Add LLaDA2 pipeline (#13226)
- [LLADA2] documentation fixes (#13333)
- @howardzhang-cv
- remove str option for quantization config in torchao (#13291)
- change minimum version guard for torchao to 0.15.0 (#13355)
- @sippycoder
- + 29 more
📋 Changes
- Fix for loading `ModularPipelines` with `AutoModel` type hints in their `modular_model_index.json` #13271
- Fix Flux Klein LoRA loading #13313
- Fix unguarded `torchvision` import in Cosmos Predict 2.5 #13321
📦 Image 🌆
- [Z Image Omni Base](https://huggingface.co/docs/diffusers/en/api/pipelines/z_image): Z-Image is the foundation model of the Z-Image family, engineered for good quality, robust generative diversity, broad stylistic coverage, and precise prompt adherence. While Z-Image-Turbo is built for speed, Z-Image is a full-capacity, undistilled transformer designed to be the backbone for creators, researchers, and developers who require the highest level of creative freedom. Thanks to @RuoyiDufor for contributing this in #12857.
- [Flux2 Klein](https://huggingface.co/docs/diffusers/main/en/api/pipelines/flux2#diffusers.Flux2KleinPipeline):FLUX.2 [Klein] unifies generation and editing in a single compact architecture, delivering state-of-the-art quality with end-to-end inference in as low as under a second. Built for applications that require real-time image generation without sacrificing quality, and runs on consumer hardware, with as little as 13GB VRAM.
- [Qwen Image Layered](https://huggingface.co/Qwen/Qwen-Image-Layered): Qwen-Image-Layered is a model capable of decomposing an image into multiple RGBA layers. This layered representation unlocks inherent editability: each layer can be independently manipulated without affecting other content. Thanks to @naykun for contributing this in #12853.
- [FIBO Edit](https://huggingface.co/docs/diffusers/main/en/api/pipelines/bria_fibo_edit): Fibo Edit is an 8B parameter image-to-image model that introduces a new paradigm of structured control, operating on JSON inputs paired with source images to enable deterministic and repeatable editing workflows. Featuring native masking for granular precision, it moves beyond simple prompt-based diffusion to offer explicit, interpretable control optimized for production environments. Its lightweight architecture is designed for deep customization, empowering researchers to build specialized “Edit” models for domain-specific tasks while delivering top-tier aesthetic quality. Thanks galbria for contributing it in [https://github.com/huggingface/diffusers/pull/12930](https://github.com/huggingface/diffusers/pull/12930).
- [Cosmos Predict2.5](https://huggingface.co/docs/diffusers/main/en/api/pipelines/cosmos): Cosmos-Predict2.5, the latest version of the Cosmos World Foundation Models (WFMs) family, specialized for simulating and predicting the future state of the world. Thanks to @miguelmartin75 for contributing it in #12852.
- [Cosmos Transfer2.5](https://huggingface.co/docs/diffusers/main/en/api/pipelines/cosmos): Cosmos-Transfer2.5 is a conditional world generation model with adaptive multimodal control, that produces high-quality world simulations conditioned on multiple control inputs. These inputs can take different modalities—including edges, blurred video, segmentation maps, and depth maps. Thanks to @miguelmartin75 for contributing it in #13066.
- [GLM-Image](https://huggingface.co/docs/diffusers/main/en/api/pipelines/glm_image): GLM-Image is an image generation model adopts a hybrid autoregressive + diffusion decoder architecture, effectively pushing the upper bound of visual fidelity and fine-grained details. In general image generation quality, it aligns with industry-standard LDM-based approaches, while demonstrating significant advantages in knowledge-intensive image generation scenarios. Thanks to @zRzRzRzRzRzRzR for contributing it in [https://github.com/huggingface/diffusers/pull/12973](https://github.com/huggingface/diffusers/pull/12973).
- [RAE](https://huggingface.co/docs/diffusers/main/api/models/autoencoder_rae): Representation Autoencoders (aka RAE) are an exciting alternative to traditional VAEs, typically used in the area of latent-space diffusion models of image generation. RAEs leverage pre-trained vision encoders and train lightweight decoders for the task of reconstruction.
📦 Video + audio 🎥 🎼
- [LTX-2](https://huggingface.co/docs/diffusers/main/en/api/pipelines/ltx2): LTX-2 is an audio-conditioned text-to-video generation model that can generate videos with synced audio. Full and distilled model inference, as well as two-stage inference with spatial sampling, is supported. We also support a conditioning pipeline that allows for passing different conditions (such as images, series of images, etc.). Check out the docs to learn more!
- [Helios](https://huggingface.co/docs/diffusers/main/api/pipelines/helios): Helios is a 14B video generation model that runs at 17 FPS on a single NVIDIA H100 GPU and supports minute-scale generation while matching a strong baseline in quality. Thanks to @SHYuanBest for contributing this in [https://github.com/huggingface/diffusers/pull/13208](https://github.com/huggingface/diffusers/pull/13208).
✨ New caching methods
- [MagCache](https://github.com/huggingface/diffusers/pull/12744) — thanks to @AlanPonnachan!
- [TaylorSeer](https://github.com/huggingface/diffusers/pull/12648/) — thanks to @toilaluan!
✨ New context-parallelism (CP) backends
- [Unified Sequence Parallel attention](https://github.com/huggingface/diffusers/pull/12693) — thanks to @Bissmella!
- [Ulysses Anything Attention](https://github.com/huggingface/diffusers/pull/12996) — thanks to @DefTruth!
📦 Misc
- Mambo-G Guidance: New guider implementation (#12862)
- Laplace Scheduler for DDPM (#11320)
- Custom Sigmas in UniPCMultistepScheduler (#12109)
- MultiControlNet support for SD3 Inpainting (#11251)
- Context parallel in native flash attention (#12829)
- NPU Ulysses Attention Support (#12919)
- Fix Wan 2.1 I2V Context Parallel Inference (#12909)
- Fix Qwen-Image Context Parallel Inference (#12970)
- + 5 more
🐛 Bug Fixes
- Fix QwenImageEditPlus on NPU (#13017)
- Fix MT5Tokenizer → use `T5Tokenizer` for Transformers v5.0+ compatibility (#12877)
- Fix Wan/WanI2V patchification (#13038)
- Fix LTX-2 inference with `num_videos_per_prompt > 1` and CFG (#13121)
- Fix Flux2 img2img prediction (#12855)
- Fix QwenImage `txt_seq_lens` handling (#12702)
- Fix `prefix_token_len` bug (#12845)
- Fix ftfy imports in Wan and SkyReels-V2 (#12314, #13113)
- + 15 more
📦 All commits
- [PRX] Improve model compilation by @WaterKnight1998 in #12787
- Improve docstrings and type hints in scheduling_dpmsolver_singlestep.py by @delmalih in #12798
- [Modular]z-image by @yiyixuxu in #12808
- Fix Qwen Edit Plus modular for multi-image input by @sayakpaul in #12601
- [WIP] Add Flux2 modular by @DN6 in #12763
- [docs] improve distributed inference cp docs. by @sayakpaul in #12810
- post release 0.36.0 by @sayakpaul in #12804
- Update distributed_inference.md to correct syntax by @sayakpaul in #12827
- + 231 more
📦 Significant community contributions
- The following contributors have made significant changes to the library over the last release:
- @delmalih
- Improve docstrings and type hints in scheduling_dpmsolver_singlestep.py (#12798)
- Improve docstrings and type hints in scheduling_edm_euler.py (#12871)
- Improve docstrings and type hints in scheduling_consistency_decoder.py (#12928)
- Improve docstrings and type hints in scheduling_consistency_models.py (#12931)
- Improve docstrings and type hints in scheduling_cosine_dpmsolver_multistep.py (#12936)
- [Docs] Replace root CONTRIBUTING.md with symlink to source docs (#12986)
- + 176 more
✨ New image pipelines
- [Flux2](https://huggingface.co/docs/diffusers/main/en/api/pipelines/flux2): Flux2 is the latest generation of image generation and editing model from Black Forest Labs. It’s capable of taking multiple input images as reference, making it versatile for different use cases.
- [Z-Image](https://huggingface.co/docs/diffusers/main/en/api/pipelines/z_image): Z-Image is a best-of-its-kind image generation model in the 6B param regime. Thanks to @JerryWu-code in [https://github.com/huggingface/diffusers/pull/12703](https://github.com/huggingface/diffusers/pull/12703).
- [QwenImage Edit Plus](https://huggingface.co/docs/diffusers/main/en/api/pipelines/qwenimage): It’s an upgrade of QwenImage Edit and is capable of taking multiple input images as references. It can act as both a generation and an editing model. Thanks to @naykun for contributing in https://github.com/huggingface/diffusers/issues/12357.
- [Bria FIBO:](https://huggingface.co/docs/diffusers/main/en/api/pipelines/bria_fibo) FIBO is trained on structured JSON captions up to 1,000+ words and designed to understand and control different visual parameters such as lighting, composition, color, and camera settings, enabling precise and reproducible outputs. Thanks to @galbria for contributing this in [https://github.com/huggingface/diffusers/pull/12545](https://github.com/huggingface/diffusers/pull/12545).
- [Kandinsky Image Lite](https://huggingface.co/docs/diffusers/main/en/api/pipelines/kandinsky5_image): Kandinsky 5.0 Image Lite is a lightweight image generation model (6B parameters). Thanks to @leffff for contributing this in [https://github.com/huggingface/diffusers/pull/12664](https://github.com/huggingface/diffusers/pull/12664).
- [ChronoEdit](https://huggingface.co/docs/diffusers/main/en/api/pipelines/chronoedit): ChronoEdit reframes image editing as a video generation task, using input and edited images as start/end frames to leverage pretrained video models with temporal consistency. A temporal reasoning stage introduces reasoning tokens to ensure physically plausible edits and visualize the editing trajectory. Thanks to @zhangjiewu for contributing this in https://github.com/huggingface/diffusers/pull/12593.
✨ New video pipelines
- [Sana-Video](https://huggingface.co/docs/diffusers/main/en/api/pipelines/sana_video): Sana-Video is a fast and efficient video generation model, equipped to handle long video sequences, thanks to its incorporation of linear attention. Thanks to @lawrence-cj for contributing this in [https://github.com/huggingface/diffusers/pull/12634](https://github.com/huggingface/diffusers/pull/12634).
- [Kandinsky 5](https://huggingface.co/docs/diffusers/main/en/api/pipelines/kandinsky5_video): Kandinsky 5.0 T2V Lite is a lightweight video generation model (2B parameters) that ranks #1 among open-source models in its class. It outperforms larger models and offers the best understanding of Russian concepts in the open-source ecosystem. Thanks to @leffff for contributing this in [https://github.com/huggingface/diffusers/pull/12478](https://github.com/huggingface/diffusers/pull/12478).
- [Hunyuan 1.5](https://huggingface.co/docs/diffusers/main/en/api/pipelines/hunyuan_video15): HunyuanVideo-1.5 is a lightweight yet powerful video generation model that achieves state-of-the-art visual quality and motion coherence with only 8.3 billion parameters, enabling efficient inference on consumer-grade GPUs.
- [Wan Animate](https://huggingface.co/docs/diffusers/main/en/api/pipelines/wan#wan-animate-unified-character-animation-and-replacement-with-holistic-replication): Wan-Animate is a state-of-the-art character animation and replacement video model based on Wan2.1. Given a reference character image and driving motion video, it can either animate the character with motion from the driving video, or replace the existing character in that video with that character.
✨ New `kernels`-powered attention backends
- Flash Attention 3 (+ its `varlen` variant)
- Flash Attention 2 (+ its `varlen` variant)
- SAGE
- This means if any of the above backend is supported by your development environment, you should be able to skip the manual process of building the corresponding kernels and just use:
- ```python
- pipe.transformer.set_attention_backend("_flash_3_hub")
- ```
- For more details, check out the [documentation](https://huggingface.co/docs/diffusers/main/en/optimization/attention_backends).
📦 Misc
- Reusing `AttentionMixin`: Making certain compatible models subclass from the `AttentionMixin` class helped us get rid of 2K LoC. Going forward, users can expect more such refactorings that will help make the library leaner and simpler. Check out https://github.com/huggingface/diffusers/pull/12463 for more details.
- Diffusers backend in SGLang: https://github.com/sgl-project/sglang/pull/14112.
- We started the [Diffusers MVP program](https://github.com/huggingface/diffusers/issues/12635) to work with talented community members who will help us improve the library across multiple fronts. Check out the link for more information.
📦 All commits
- remove unneeded checkpoint imports. by @sayakpaul in #12488
- [tests] fix clapconfig for text backbone in audioldm2 by @sayakpaul in #12490
- ltx0.9.8 (without IC lora, autoregressive sampling) by @yiyixuxu in #12493
- [docs] Attention checks by @stevhliu in #12486
- [CI] Check links by @stevhliu in #12491
- [ci] xfail more incorrect transformer imports. by @sayakpaul in #12455
- [tests] introduce `VAETesterMixin` to consolidate tests for slicing and tiling by @sayakpaul in #12374
- docs: cleanup of runway model by @EazyAl in #12503
- + 147 more
📦 Significant community contributions
- The following contributors have made significant changes to the library over the last release:
- @yiyixuxu
- ltx0.9.8 (without IC lora, autoregressive sampling) (#12493)
- Fix: Add _skip_keys for AutoencoderKLWan (#12523)
- HunyuanImage21 (#12333)
- [modular] better warn message (#12573)
- [modular]pass hub_kwargs to load_config (#12577)
- [modular] wan! (#12611)
- + 76 more
📦 All commits
- Release: v0.35.1-patch by @sayakpaul (direct commit on v0.35.2-patch)
- handle offload_state_dict when initing transformers models by @sayakpaul in #12438
- [CI] Fix TRANSFORMERS_FLAX_WEIGHTS_NAME import issue by @DN6 in #12354
- Fix PyTorch 2.3.1 compatibility: add version guard for torch.library.… by @Aishwarya0811 in #12206
- fix scale_shift_factor being on cpu for wan and ltx by @vladmandic in #12347
- Release: v0.35.2-patch by @sayakpaul (direct commit on v0.35.2-patch)
📋 Changes
- https://github.com/huggingface/diffusers/pull/12188
- https://github.com/huggingface/diffusers/pull/12190
✨ New pipelines 🧨
- We welcomed new pipelines in this release:
- Wan 2.2
- Flux-Kontext
- Qwen-Image
- Qwen-Image-Edit
✨ New training scripts 🎛️
- Make these newly added models your own with our training scripts:
- [Kontext trainer](https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/README_flux.md#training-kontext)
- [Qwen-Image trainer](https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/README_flux.md#training-kontext)
♻️ Attention refactor
- Users shouldn’t be affected at all by these changes. Please open an issue if you face any problems.
📦 Regional compilation
- Thanks to @anijain2305 for contributing this feature in [this PR](https://github.com/huggingface/diffusers/pull/11705).
- We have also authored a number of posts that center around the use of `torch.compile`. You can check them out at the links below:
- [Presenting Flux Fast: Making Flux go brrr on H100s](https://pytorch.org/blog/presenting-flux-fast-making-flux-go-brrr-on-h100s/)
- [torch.compile and Diffusers: A Hands-On Guide to Peak Performance](https://pytorch.org/blog/torch-compile-and-diffusers-a-hands-on-guide-to-peak-performance/)
- [Fast LoRA inference for Flux with Diffusers and PEFT](https://huggingface.co/blog/lora-fast)
📦 Faster pipeline loading ⚡️
- Users can now load pipelines directly on an accelerator device leading to significantly faster load times. This particularly becomes evident when loading large pipelines like Wan and Qwen-Image.
- ```diff
- from diffusers import DiffusionPipeline
- import torch
- ckpt_id = "Qwen/Qwen-Image"
- pipe = DiffusionPipeline.from_pretrained(
- ckpt_id, torch_dtype=torch.bfloat16
- ).to("cuda")
- + 8 more
📦 Better GGUF integration
- @Isotr0py contributed support for native GGUF CUDA kernels in [this PR](https://github.com/huggingface/diffusers/pull/11869). This should provide an approximately 10% improvement in inference speed.
- We now support loading of Diffusers format GGUF checkpoints.
- You can learn more about all of this in our [GGUF official docs](https://huggingface.co/docs/diffusers/main/en/quantization/gguf).
📦 Modular Diffusers (Experimental)
- The API is currently in active development and is being released as an experimental feature. Learn more in our [docs](https://huggingface.co/docs/diffusers/main/en/modular_diffusers/overview).
📦 All commits
- [tests] skip instead of returning. by @sayakpaul in #11793
- adjust to get CI test cases passed on XPU by @kaixuanliu in #11759
- fix deprecation in lora after 0.34.0 release by @sayakpaul in #11802
- [chore] post release v0.34.0 by @sayakpaul in #11800
- Follow up for Group Offload to Disk by @DN6 in #11760
- [rfc][compile] compile method for DiffusionPipeline by @anijain2305 in #11705
- [tests] add a test on torch compile for varied resolutions by @sayakpaul in #11776
- adjust tolerance criteria for `test_float16_inference` in unit test by @kaixuanliu in #11809
- + 173 more
📦 Significant community contributions
- The following contributors have made significant changes to the library over the last release:
- @vuongminh1907
- update: FluxKontextInpaintPipeline support (#11820)
- @Net-Mist
- feat: add multiple input image support in Flux Kontext (#11880)
- @tolgacangoz
- Add SkyReels V2: Infinite-Length Film Generative Model (#11518)
- @naykun
- + 7 more
📦 Wan VACE
- Control to Video (Depth, Pose, Sketch, Flow, Grayscale, Scribble, Layout, Boundary Box, etc.). Recommended library for preprocessing videos to obtain control videos: [huggingface/controlnet_aux](https://huggingface.co/docs/diffusers/main/en/api/pipelines/wan)
- Image/Video to Video (first frame, last frame, starting clip, ending clip, random clips)
- Inpainting and Outpainting
- Subject to Video (faces, object, characters, etc.)
- Composition to Video (reference anything, animate anything, swap anything, expand anything, move anything, etc.)
- The code snippets available in [this](https://github.com/huggingface/diffusers/pull/11582) pull request demonstrate some examples of how videos can be generated with controllability signals.
- Check out the [docs](https://huggingface.co/docs/diffusers/main/en/api/pipelines/wan#any-to-video-controllable-generation) to learn more.
📦 Cosmos Predict2 Video2World
- The Video2World model comes in a 2B and 14B variant. Check out the [docs](https://huggingface.co/docs/diffusers/main/en/api/pipelines/cosmos) to learn more.
📦 LTX 0.9.7 and Distilled
- LTX 0.9.7 and its distilled variants are the latest in the family of models released by Lightricks.
- Check out the [docs](https://huggingface.co/docs/diffusers/en/api/pipelines/ltx_video) to learn more.
📦 FusionX
- ```python
- from diffusers import WanTransformer3DModel
- transformer = WanTransformer3DModel.from_single_file(
- "https://huggingface.co/vrgamedevgirl84/Wan14BT2VFusioniX/blob/main/Wan14Bi2vFusioniX_fp16.safetensors",
- torch_dtype=torch.bfloat16
- )
- ```
- To load the LoRAs, use `load_lora_weights()`:
- + 9 more
📦 Chroma
- Thanks to @Ednaordinary for contributing it in [this PR](https://github.com/huggingface/diffusers/pull/11698)!
📦 VisualCloze
- 1. Support for various in-domain tasks
- 2. Generalization to unseen tasks through in-context learning
- 3. Unify multiple tasks into one step and generate both target image and intermediate results
- 4. Support reverse-engineering conditions from target images
📦 Better `torch.compile` support
- https://github.com/huggingface/diffusers/pull/11085
- https://github.com/huggingface/diffusers/issues/11430
- Additionally, users can combine offloading with compilation to get a better speed-memory trade-off. Below is an example:
- <details>
- <summary>Code</summary>
- ```py
- import torch
- from diffusers import DiffusionPipeline
- + 67 more
📦 PipelineQuantizationConfig
- Users can now provide a quantization config while initializing a pipeline:
- ```python
- import torch
- from diffusers import DiffusionPipeline
- from diffusers.quantizers import PipelineQuantizationConfig
- pipeline_quant_config = PipelineQuantizationConfig(
- quant_backend="bitsandbytes_4bit",
- quant_kwargs={"load_in_4bit": True, "bnb_4bit_quant_type": "nf4", "bnb_4bit_compute_dtype": torch.bfloat16},
- + 9 more
📦 Group offloading with disk
- However, you still need a considerable amount of system RAM to make offloading work effectively. So, low VRAM and low RAM environments would still not work.
- Starting this release, users will additionally have the option to offload to disk instead of RAM, further lowering memory consumption. Set the `offload_to_disk_path` to enable this feature.
- ```python
- pipeline.transformer.enable_group_offload(
- onload_device="cuda",
- offload_device="cpu",
- offload_type="leaf_level",
- offload_to_disk_path="path/to/disk"
- + 2 more
✨ New training scripts
- We now have a capable training script for training robust timestep-distilled models through the SANA Sprint framework. Check out [this resource](https://github.com/huggingface/diffusers/tree/main/examples/research_projects/sana) for more details. Thanks to @scxue and @lawrence-cj for contributing it in [this PR](https://github.com/huggingface/diffusers/pull/11514).
- HiDream LoRA DreamBooth training script ([docs](https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/README_hidream.md)). The script supports training with quantization. [HiDream](https://huggingface.co/docs/diffusers/main/en/api/pipelines/hidream) is an MIT-licensed model. So, make it yours with this training script.
📦 Updates on educational materials on quantization
- We have worked on a two-part series discussing the support of quantization in Diffusers. Check them out:
- [Exploring Quantization Backends in Diffusers](https://huggingface.co/blog/diffusers-quantization)
- [(LoRA) Fine-Tuning FLUX.1-dev on Consumer Hardware](https://huggingface.co/blog/flux-qlora)
📦 All commits
- [LoRA] support musubi wan loras. by @sayakpaul in #11243
- fix test_vanilla_funetuning failure on XPU and A100 by @yao-matrix in #11263
- make test_stable_diffusion_inpaint_fp16 pass on XPU by @yao-matrix in #11264
- make test_dict_tuple_outputs_equivalent pass on XPU by @yao-matrix in #11265
- add onnxruntime-qnn & onnxruntime-cann by @xieofxie in #11269
- make test_instant_style_multiple_masks pass on XPU by @yao-matrix in #11266
- [BUG] Fix convert_vae_pt_to_diffusers bug by @lavinal712 in #11078
- Fix LTX 0.9.5 single file by @hlky in #11271
- + 259 more
📦 Significant community contributions
- The following contributors have made significant changes to the library over the last release:
- @yao-matrix
- fix test_vanilla_funetuning failure on XPU and A100 (#11263)
- make test_stable_diffusion_inpaint_fp16 pass on XPU (#11264)
- make test_dict_tuple_outputs_equivalent pass on XPU (#11265)
- make test_instant_style_multiple_masks pass on XPU (#11266)
- make KandinskyV22PipelineInpaintCombinedFastTests::test_float16_inference pass on XPU (#11308)
- make test_stable_diffusion_karras_sigmas pass on XPU (#11310)
- + 91 more
📦 All commits
- fix ftfy import for wan pipelines by @yiyixuxu in #11262
📦 Wan 2.1
- `Wan-AI/Wan2.1-T2V-1.3B-Diffusers`
- `Wan-AI/Wan2.1-T2V-14B-Diffusers`
- `Wan-AI/Wan2.1-I2V-14B-480P-Diffusers`
- `Wan-AI/Wan2.1-I2V-14B-720P-Diffusers`
- Check out the docs [here](https://huggingface.co/docs/diffusers/main/en/api/pipelines/wan) to learn more.
📦 LTX Video 0.9.5
- To support these additional conditioning inputs, we’ve introduced the `LTXConditionPipeline` and `LTXVideoCondition` object.
- To learn more about the usage, check out the docs [here](https://huggingface.co/docs/diffusers/main/en/api/pipelines/ltx_video).
📦 Hunyuan Image to Video
- To learn more, check out the docs [here](https://huggingface.co/docs/diffusers/main/en/api/pipelines/hunyuan_video).
📦 Others
- [EasyAnimateV5](https://huggingface.co/docs/diffusers/main/en/api/pipelines/easyanimate) (thanks to @bubbliiiing for contributing this in [this PR](https://github.com/huggingface/diffusers/pull/10626))
- [ConsisID](https://huggingface.co/docs/diffusers/main/en/using-diffusers/consisid) (thanks to @SHYuanBest for contributing this in [this PR](https://github.com/huggingface/diffusers/pull/10140))
📦 Sana-Sprint
- Shoutout to @lawrence-cj for their help and guidance on [this PR](https://github.com/huggingface/diffusers/pull/11074).
- Check out the [pipeline docs](https://huggingface.co/docs/diffusers/main/en/api/pipelines/sana_sprint) of SANA-Sprint to learn more.
📦 Lumina2
- Lumina-Image-2.0 is a 2B parameter flow-based diffusion transformer for text-to-image generation released under the Apache 2.0 license.
📦 Others
- [CogView4](https://huggingface.co/docs/diffusers/main/en/api/pipelines/cogview4) (thanks to @zRzRzRzRzRzRzR for contributing CogView4 in [this PR](https://github.com/huggingface/diffusers/pull/10649))
📦 Layerwise Casting
- PyTorch supports `torch.float8_e4m3fn` and `torch.float8_e5m2` as weight storage `dtypes`, but they can’t be used for computation on many devices due to unimplemented kernel support.
- <details>
- <summary>Code</summary>
- ```py
- import torch
- from diffusers import CogVideoXPipeline, CogVideoXTransformer3DModel
- from diffusers.utils import export_to_video
- model_id = "THUDM/CogVideoX-5b"
- + 16 more
📦 Group Offloading
- You can also use `record_stream=True` when using `use_stream=True` to obtain more speedups at the expense of slightly increased memory usage.
- <details>
- <summary>Code</summary>
- ```py
- import torch
- from diffusers import CogVideoXPipeline
- from diffusers.utils import export_to_video
- onload_device = torch.device("cuda")
- + 35 more
📦 Remote Components
- | Model | Endpoint | Model |
- |---------------------|---------------------------------------------------------------------|--------------------------------------|
- | Stable Diffusion v1 | https://q1bj3bpq6kzilnsu.us-east-1.aws.endpoints.huggingface.cloud | stabilityai/sd-vae-ft-mse |
- | Stable Diffusion XL | https://x2dmsqunjd6k9prw.us-east-1.aws.endpoints.huggingface.cloud | madebyollin/sdxl-vae-fp16-fix |
- | Flux | https://whhx50ex1aryqvw6.us-east-1.aws.endpoints.huggingface.cloud | black-forest-labs/FLUX.1-schnell |
- | HunyuanVideo | https://o7ywnmrahorts457.us-east-1.aws.endpoints.huggingface.cloud | hunyuanvideo-community/HunyuanVideo |
- This is an example of using remote decoding with the Hunyuan Video pipeline:
- <details>
- + 29 more
📦 Introducing Cached Inference for DiTs
- Check out the [docs](https://huggingface.co/docs/diffusers/main/en/api/cache) to learn more about the available caching methods.
- Pyramind Attention Broadcast
- <details>
- <summary>Code</summary>
- ```py
- import torch
- from diffusers import CogVideoXPipeline, PyramidAttentionBroadcastConfig
- pipe = CogVideoXPipeline.from_pretrained("THUDM/CogVideoX-5b", torch_dtype=torch.bfloat16)
- + 27 more
📦 Quanto Backend
- ```python
- import torch
- from diffusers import FluxTransformer2DModel, QuantoConfig
- model_id = "black-forest-labs/FLUX.1-dev"
- quantization_config = QuantoConfig(weights_dtype="float8")
- transformer = FluxTransformer2DModel.from_pretrained(
- model_id,
- subfolder="transformer",
- + 21 more
📦 Improved loading for `uintx` TorchAO checkpoints with `torch>=2.6`
- Torch 2.6 allows adding expected Tensors to torch safe globals, which lets us directly load TorchAO checkpoints with these objects.
- ```diff
- state_dict = torch.load("/path/to/flux_uint4wo/diffusion_pytorch_model.bin", weights_only=False, map_location="cpu")
- with init_empty_weights():
- transformer = FluxTransformer2DModel.from_config("/path/to/flux_uint4wo/config.json")
- transformer.load_state_dict(state_dict, strict=True, assign=True)
- + transformer = FluxTransformer2DModel.from_pretrained("/path/to/flux_uint4wo/")
- ```
📦 LoRAs
- We have shipped a couple of improvements on the LoRA front in this release.
- 🚨 Improved coverage for loading non-diffusers LoRA checkpoints for Flux
- `torch.compile()` support when hotswapping LoRAs without triggering recompilation
- Check out the [docs](https://huggingface.co/docs/diffusers/en/using-diffusers/loading_adapters#hotswapping-lora-adapters) to learn more about this feature.
- The other major change is the support for
- Loading LoRAs into quantized model checkpoints
📦 `dtype` Maps for Pipelines
- Since various pipelines require their components to run in different compute dtypes, we now support passing a dtype map when initializing a pipeline:
- ```python
- from diffusers import HunyuanVideoPipeline
- import torch
- pipe = HunyuanVideoPipeline.from_pretrained(
- "hunyuanvideo-community/HunyuanVideo",
- torch_dtype={"transformer": torch.bfloat16, "default": torch.float16},
- )
- + 2 more
📦 AutoModel
- This release includes an AutoModel object similar to the one found in `transformers` that automatically fetches the appropriate model class for the provided repo.
- ```python
- from diffusers import AutoModel
- unet = AutoModel.from_pretrained("runwayml/stable-diffusion-v1-5", subfolder="unet")
- ```
📦 All commits
- [Sana 4K] Add vae tiling option to avoid OOM by @leisuzz in #10583
- IP-Adapter for `StableDiffusion3Img2ImgPipeline` by @guiyrt in #10589
- [DC-AE, SANA] fix SanaMultiscaleLinearAttention apply_quadratic_attention bf16 by @chenjy2003 in #10595
- Move buffers to device by @hlky in #10523
- [Docs] Update SD3 ip_adapter model_id to diffusers checkpoint by @guiyrt in #10597
- Scheduling fixes on MPS by @hlky in #10549
- [Docs] Add documentation about using ParaAttention to optimize FLUX and HunyuanVideo by @chengzeyi in #10544
- NPU adaption for RMSNorm by @leisuzz in #10534
- + 297 more
📦 Significant community contributions
- The following contributors have made significant changes to the library over the last release:
- @guiyrt
- IP-Adapter for `StableDiffusion3Img2ImgPipeline` (#10589)
- [Docs] Update SD3 ip_adapter model_id to diffusers checkpoint (#10597)
- `MultiControlNetUnionModel` on SDXL (#10747)
- SD3 IP-Adapter runtime checkpoint conversion (#10718)
- Comprehensive type checking for `from_pretrained` kwargs (#10758)
- Multi IP-Adapter for Flux pipelines (#10867)
- + 96 more
📋 Changes
- Fixes a regression in loading Comfy UI format single file checkpoints for Flux
- Fixes a regression in loading LoRAs with bitsandbytes 4bit quantized Flux models
- Adds `unload_lora_weights` for Flux Control
- Fixes a bug that prevents Hunyuan Video from running with batch size > 1
- Allow Hunyuan Video to load LoRAs created from the original repository code
📦 All commits
- [Single File] Fix loading Flux Dev finetunes with Comfy Prefix by @DN6 in #10545
- [CI] Update HF Token on Fast GPU Model Tests by @DN6 #10570
- [CI] Update HF Token in Fast GPU Tests by @DN6 #10568
- Fix batch > 1 in HunyuanVideo by @hlky in #10548
- Fix HunyuanVideo produces NaN on PyTorch<2.5 by @hlky in #10482
- Fix hunyuan video attention mask dim by @a-r-r-o-w in #10454
- [LoRA] Support original format loras for HunyuanVideo by @a-r-r-o-w in #10376
- [LoRA] feat: support loading loras into 4bit quantized Flux models. by @sayakpaul in #10578
- + 4 more
📋 Changes
- Importing Diffusers would raise an error in PyTorch versions lower than 2.3.0. This should no longer be a problem.
- Device Map does not work as expected when using the quantizer. We now raise an error if it is used. Support for using device maps with different quantization backends will be added in the near future.
- Quantization was not performed due to faulty logic. This is now fixed and better tested.
📦 All commits
- make style for https://github.com/huggingface/diffusers/pull/10368 by @yiyixuxu in #10370
- fix test pypi installation in the release workflow by @sayakpaul in #10360
- Fix TorchAO related bugs; revert device_map changes by @a-r-r-o-w in #10371
✨ New Video Generation Pipelines 📹
- Open video generation models are on the rise, and we’re pleased to provide comprehensive integration support for all of them. The following video pipelines are bundled in this release:
- [Mochi-1](https://huggingface.co/docs/diffusers/main/en/api/pipelines/mochi)
- [Allegro](https://huggingface.co/docs/diffusers/main/en/api/pipelines/allegro)
- [LTXVideo](https://huggingface.co/docs/diffusers/main/en/api/pipelines/ltx_video)
- [HunyuanVideo](https://huggingface.co/docs/diffusers/main/en/api/pipelines/hunyuan_video)
- Check out [this section](https://www.notion.so/Diffusers-0-32-0-release-15f1384ebcac8091ac5bf18c128639ab?pvs=21) to learn more about the fine-tuning options available for these new video models.
✨ New Image Generation Pipelines
- SANA
- [Text-to-image](https://huggingface.co/docs/diffusers/main/en/api/pipelines/sana#diffusers.SanaPipeline)
- [PAG](https://huggingface.co/docs/diffusers/main/en/api/pipelines/sana#diffusers.SanaPAGPipeline)
- Flux Control (including Control LoRA)
- [Depth Control](https://huggingface.co/docs/diffusers/main/en/api/pipelines/flux#depth-control)
- [Canny Control](https://huggingface.co/docs/diffusers/main/en/api/pipelines/flux#canny-control)
- [Flux Redux](https://huggingface.co/docs/diffusers/main/en/api/pipelines/flux#redux)
- [Flux Fill Inpainting / Outpainting](https://huggingface.co/docs/diffusers/main/en/api/pipelines/flux#fill-inpaintingoutpainting)
- + 33 more
📦 Acknowledgements
- Shoutout to @lawrence-cj and @chenjy2003 for contributing SANA in [this PR](https://github.com/huggingface/diffusers/pull/9982). SANA also features a Deep Compression Autoencoder, which was contributed by @lawrence-cj in [this PR](https://github.com/huggingface/diffusers/pull/9708).
- Shoutout to @guiyrt for contributing SD3.5 IP Adapter in [this PR](https://github.com/huggingface/diffusers/pull/9987).
✨ New Quantization Backends
- [TorchAO](https://huggingface.co/docs/diffusers/main/en/quantization/torchao)
- [GGUF](https://huggingface.co/docs/diffusers/main/en/quantization/gguf)
- Please be aware of the following caveats:
- TorchAO quantized checkpoints cannot be serialized in `safetensors` currently. This may change in the future.
- GGUF currently only supports loading pre-quantized checkpoints into models in this release. Support for saving models with GGUF quantization will be added in the future.
✨ New training scripts
- This release features many new training scripts for the community to play:
- [Flux Control](https://github.com/huggingface/diffusers/tree/main/examples/flux-control)
- [Mochi-1](https://github.com/a-r-r-o-w/finetrainers)
- [LTXVideo](https://github.com/a-r-r-o-w/finetrainers?tab=readme-ov-file#quickstart)
- [SANA](https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/README_sana.md)
- [Hunyuan Video](https://github.com/a-r-r-o-w/finetrainers?tab=readme-ov-file#quickstart)
📦 All commits
- post-release 0.31.0 by @sayakpaul in #9742
- fix bug in `require_accelerate_version_greater` by @faaany in #9746
- [Official callbacks] SDXL Controlnet CFG Cutoff by @asomoza in #9311
- [SD3-5 dreambooth lora] update model cards by @linoytsaban in #9749
- config attribute not foud error for FluxImagetoImage Pipeline for multi controlnet solved by @rshah240 in #9586
- Some minor updates to the nightly and push workflows by @sayakpaul in #9759
- [Docs] fix docstring typo in SD3 pipeline by @shenzhiy21 in #9765
- [bugfix] bugfix for npu free memory by @leisuzz in #9640
- + 253 more
📦 Significant community contributions
- The following contributors have made significant changes to the library over the last release:
- @faaany
- fix bug in `require_accelerate_version_greater` (#9746)
- make `pipelines` tests device-agnostic (part1) (#9399)
- make `pipelines` tests device-agnostic (part2) (#9400)
- @linoytsaban
- [SD3-5 dreambooth lora] update model cards (#9749)
- [SD 3.5 Dreambooth LoRA] support configurable training block & layers (#9762)
- + 105 more
📦 Stable Diffusion 3.5 Large
- A regular one
- A timestep-distilled one enabling few-step inference
- Make sure to fill up the form by going to the [model page](https://huggingface.co/stabilityai/stable-diffusion-3.5-large), and then run `huggingface-cli login` before running the code below.
- ```python
- import torch
- from diffusers import StableDiffusion3Pipeline
- pipe = StableDiffusion3Pipeline.from_pretrained(
- "stabilityai/stable-diffusion-3.5-large", torch_dtype=torch.bfloat16
- + 12 more
📦 Cogview3-plus
- We added a new text-to-image model, Cogview3-plus, from the THUDM team! The model is DiT-based and supports image generation from 512 to 2048px. Thanks to @zRzRzRzRzRzRzR for contributing it!
- ```python
- from diffusers import CogView3PlusPipeline
- import torch
- pipe = CogView3PlusPipeline.from_pretrained("THUDM/CogView3-Plus-3B", torch_dtype=torch.float16).to("cuda")
- pipe.enable_model_cpu_offload()
- pipe.vae.enable_slicing()
- pipe.vae.enable_tiling()
- + 11 more
📦 Quantization
- The example below shows how to run Flux.1 Dev with the NF4 data-type. Make sure you install the libraries:
- ```bash
- pip install -Uq git+https://github.com/huggingface/transformers@main
- pip install -Uq bitsandbytes
- pip install -Uq diffusers
- ```
- ```python
- from diffusers import BitsAndBytesConfig, FluxTransformer2DModel
- + 32 more
📦 Training scripts
- We have a fresh bucket of training scripts with this release:
- [Advanced Flux.1 trainer](https://huggingface.co/blog/linoyts/new-advanced-flux-dreambooth-lora)
- [CogVideoX trainer](https://github.com/huggingface/diffusers/tree/main/examples/cogvideo)
📦 Misc
- We now support the loading of different kinds of Flux LoRAs, including Kohya, TheLastBen, and Xlabs.
- Loading of Xlabs Flux ControlNets is also now supported. Thanks to @Anghellia for contributing it!
📦 All commits
- Feature flux controlnet img2img and inpaint pipeline by @ighoshsubho in #9408
- Remove CogVideoX mentions from single file docs; Test updates by @a-r-r-o-w in #9444
- set max_shard_size to None for pipeline save_pretrained by @a-r-r-o-w in #9447
- adapt masked im2im pipeline for SDXL by @noskill in #7790
- [Flux] add lora integration tests. by @sayakpaul in #9353
- [training] CogVideoX Lora by @a-r-r-o-w in #9302
- Several fixes to Flux ControlNet pipelines by @vladmandic in #9472
- [refactor] LoRA tests by @a-r-r-o-w in #9481
- + 106 more
📦 Significant community contributions
- The following contributors have made significant changes to the library over the last release:
- @ighoshsubho
- Feature flux controlnet img2img and inpaint pipeline (#9408)
- flux controlnet control_guidance_start and control_guidance_end implement (#9571)
- @noskill
- adapt masked im2im pipeline for SDXL (#7790)
- @saqlain2204
- [Tests] Reduce the model size in the lumina test (#8985)
- + 43 more
📋 Changes
- CogVideoXImageToVideoPipeline
- CogVideoXVideoToVideoPipeline
📦 CogVideoXImageToVideoPipeline
- The code below demonstrates how to use the new image-to-video pipeline:
- ```python
- import torch
- from diffusers import CogVideoXImageToVideoPipeline
- from diffusers.utils import export_to_video, load_image
- pipe = CogVideoXImageToVideoPipeline.from_pretrained("THUDM/CogVideoX-5b-I2V", torch_dtype=torch.bfloat16)
- pipe.to("cuda")
- pipe.enable_model_cpu_offload()
- + 13 more
📦 CogVideoXVideoToVideoPipeline
- The code below demonstrates how to use the new video-to-video pipeline:
- ```python
- import torch
- from diffusers import CogVideoXDPMScheduler, CogVideoXVideoToVideoPipeline
- from diffusers.utils import export_to_video, load_video
- pipe = CogVideoXVideoToVideoPipeline.from_pretrained("THUDM/CogVideoX-5b-trial", torch_dtype=torch.bfloat16)
- pipe.scheduler = CogVideoXDPMScheduler.from_config(pipe.scheduler.config)
- pipe.to("cuda")
- + 21 more
📦 All commits
- [core] Support VideoToVideo with CogVideoX by @a-r-r-o-w in #9333
- [core] CogVideoX memory optimizations in VAE encode by @a-r-r-o-w in #9340
- [CI] Quick fix for Cog Video Test by @DN6 in #9373
- [refactor] move positional embeddings to patch embed layer for CogVideoX by @a-r-r-o-w in #9263
- CogVideoX-5b-I2V support by @zRzRzRzRzRzRzR in #9418
📦 All commits
- update runway repo for single_file by @yiyixuxu in #9323
- Fix Flux CLIP prompt embeds repeat for num_images_per_prompt > 1 by @DN6 in #9280
- [IP Adapter] Fix cache_dir and local_files_only for image encoder by @asomoza in #9272
📦 CogVideoX-5B
- The code below shows how to generate a video with CogVideoX-5B
- ```python
- import torch
- from diffusers import CogVideoXPipeline
- from diffusers.utils import export_to_video
- pipe = CogVideoXPipeline.from_pretrained(
- "THUDM/CogVideoX-5b",
- torch_dtype=torch.bfloat16
- + 14 more
📦 All commits
- Update Video Loading/Export to use `imageio` by @DN6 in #9094
- [refactor] CogVideoX followups + tiled decoding support by @a-r-r-o-w in #9150
- Add Learned PE selection for Auraflow by @cloneofsimo in #9182
- [Single File] Fix configuring scheduler via legacy kwargs by @DN6 in #9229
- [Flux LoRA] support parsing alpha from a flux lora state dict. by @sayakpaul in #9236
- [tests] fix broken xformers tests by @a-r-r-o-w in #9206
- Cogvideox-5B Model adapter change by @zRzRzRzRzRzRzR in #9203
- [Single File] Support loading Comfy UI Flux checkpoints by @DN6 in #9243
✨ New pipelines
- 
- Image taken from the [Lumina’s GitHub](https://github.com/Alpha-VLLM/Lumina-T2X/blob/main/assets/lumina-next.pdf).
- This release features many new pipelines. Below, we provide a list:
- Audio pipelines 🎼
- [Stable Audio](https://huggingface.co/docs/diffusers/main/en/api/pipelines/stable_audio)
- Video pipelines 📹
- [Latte](https://huggingface.co/docs/diffusers/main/en/api/pipelines/latte) (thanks to @maxin-cn for the contribution through #8404)
- [CogVideoX](https://huggingface.co/docs/diffusers/main/en/api/pipelines/cogvideox) (thanks to @zRzRzRzRzRzRzR for the contribution through #9082)
- + 11 more
📦 Perturbed Attention Guidance (PAG)
- | Without PAG | With PAG |
- |-------------|----------|
- |  | |
- `StableDiffusionPAGPipeline`
- `StableDiffusion3PAGPipeline`
- `StableDiffusionControlNetPAGPipeline`
- `StableDiffusionXLPAGPipeline`
- `StableDiffusionXLPAGImg2ImgPipeline`
- + 9 more
📦 AnimateDiff with SparseCtrl
- There are two SparseCtrl-specific checkpoints and a Motion LoRA made available by the authors namely:
- [SparseCtrl Scribble](https://huggingface.co/guoyww/animatediff-sparsectrl-scribble)
- [SparseCtrl RGB](https://huggingface.co/guoyww/animatediff-sparsectrl-rgb)
- [Motion LoRA v1-5-3](https://huggingface.co/guoyww/animatediff-motion-lora-v1-5-3)
- Scribble Interpolation Example:
- <table>
- <tr>
- <td><img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/animatediff-scribble-1.png" alt="Image 1"></td>
- + 46 more
📦 FreeNoise for AnimateDiff
- FreeNoise is a training-free method that allows extending the generative capabilities of pretrained video diffusion models beyond their existing context/frame limits.
- ```python
- import torch
- from diffusers import AnimateDiffPipeline, MotionAdapter, EulerAncestralDiscreteScheduler
- from diffusers.utils import export_to_gif
- adapter = MotionAdapter.from_pretrained("guoyww/animatediff-motion-adapter-v1-5-2", torch_dtype=torch.float16)
- pipe = AnimateDiffPipeline.from_pretrained("SG161222/Realistic_Vision_V6.0_B1_noVAE", motion_adapter=adapter, torch_dtype=torch.float16)
- pipe.scheduler = EulerAncestralDiscreteScheduler(
- + 16 more
♻️ LoRA refactor
- To learn more details, please follow [this PR](https://github.com/huggingface/diffusers/pull/8774). If you see any LoRA-related issues stemming from these refactors, please open an issue.
📦 All commits
- [Release notification] add some info when there is an error. by @sayakpaul in #8718
- Modify FlowMatch Scale Noise by @asomoza in #8678
- Fix json WindowsPath crash by @vincedovy in #8662
- Motion Model / Adapter versatility by @Arlaz in #8301
- [Chore] perform better deprecation for vqmodeloutput by @sayakpaul in #8719
- [Advanced dreambooth lora] adjustments to align with canonical script by @linoytsaban in #8406
- [Tests] Fix precision related issues in slow pipeline tests by @DN6 in #8720
- fix: ValueError when using FromOriginalModelMixin in subclasses #8440 by @fkcptlst in #8454
- + 149 more
📦 Significant community contributions
- The following contributors have made significant changes to the library over the last release:
- @DN6
- [Tests] Fix precision related issues in slow pipeline tests (#8720)
- Remove legacy single file model loading mixins (#8754)
- Enforce ordering when running Pipeline slow tests (#8763)
- Fix warning in UNetMotionModel (#8756)
- Fix indent in dreambooth lora advanced SD 15 script (#8753)
- Fix mistake in Single File Docs page (#8765)
- + 54 more
📦 All commits
- [SD3] Fix mis-matched shape when num_images_per_prompt > 1 using without T5 (text_encoder_3=None) by @Dalanke in #8558
- [LoRA] refactor lora conversion utility. by @sayakpaul in #8295
- [LoRA] fix conversion utility so that lora dora loads correctly by @sayakpaul in #8688
- [Chore] remove deprecation from transformer2d regarding the output class. by @sayakpaul in #8698
- [LoRA] fix vanilla fine-tuned lora loading. by @sayakpaul in #8691
- Release: v0.29.2 by @sayakpaul (direct commit on v0.29.2-patch)
📦 SD3 CntrolNet
- <img width="624" alt="image" src="https://github.com/huggingface/diffusers/assets/46553287/db384753-cfbb-488c-bc74-8280f9bee24e">
- ```python
- import torch
- from diffusers import StableDiffusion3ControlNetPipeline
- from diffusers.models import SD3ControlNetModel, SD3MultiControlNetModel
- from diffusers.utils import load_image
- controlnet = SD3ControlNetModel.from_pretrained("InstantX/SD3-Controlnet-Canny", torch_dtype=torch.float16)
- pipe = StableDiffusion3ControlNetPipeline.from_pretrained(
- + 10 more
📦 Expanded single file support
- We now support all available single-file checkpoints for sd3 in diffusers! To load the single file checkpoint with t5
- ```python
- import torch
- from diffusers import StableDiffusion3Pipeline
- pipe = StableDiffusion3Pipeline.from_single_file(
- "https://huggingface.co/stabilityai/stable-diffusion-3-medium/blob/main/sd3_medium_incl_clips_t5xxlfp8.safetensors",
- torch_dtype=torch.float16,
- )
- + 4 more
📦 Using Long Prompts with the T5 Text Encoder
- ```python
- image = pipe(
- prompt=prompt,
- negative_prompt="",
- num_inference_steps=28,
- guidance_scale=4.5,
- max_sequence_length=512,
- ).images[0]
- + 3 more
📦 All commits
- Release: v0.29.0 by @sayakpaul (direct commit on v0.29.1-patch)
- prepare for patch release by @yiyixuxu (direct commit on v0.29.1-patch)
- fix warning log for Transformer SD3 by @sayakpaul in #8496
- Add SD3 AutoPipeline mappings by @Beinsezii in #8489
- Add Hunyuan AutoPipe mapping by @Beinsezii in #8505
- Expand Single File support in SD3 Pipeline by @DN6 in #8517
- [Single File Loading] Handle unexpected keys in CLIP models when `accelerate` isn't installed. by @DN6 in #8462
- Fix sharding when no device_map is passed by @SunMarc in #8531
- + 5 more
📦 Significant community contributions
- The following contributors have made significant changes to the library over the last release:
- @wangqixun
- Support SD3 ControlNet and Multi-ControlNet. (#8566)
This release emphasizes Stable Diffusion 3, Stability AI’s latest iteration of the Stable Diffusion family of models. It was introduced in [Scaling Rectified Flow Transformers for High-Resolution Image Synthesis](https://arxiv.org/abs/2403.03206) by Patrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas Müller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, Dustin Podell, Tim Dockhorn, Zion English, Kyle Lacey, Alex Goodwin, Yannik Marek, and Robin Rombach. As the model is gated, before using it with `diffusers`, you first need to go to the [Stable Diffusion 3 Medium Hugging Face page](https://huggingface.co/stabilityai/stable-diffusion-3-medium-diffusers), fill in the form and accept the gate. Once you are in, you need to log in so that your system knows you’ve accepted the gate. ```bash huggingface-cli login ``` The code below shows how to perform text-to-image generation with SD3: ```python import torch from diffusers import StableDiffusion3Pipeline pipe = StableDiffusion3Pipeline.from_pretrained("stabilityai/stable-diffusion-3-medium-diffusers", torch_dtype=torch.float16) pipe = pipe.to("cuda") image = pipe( "A cat holding a sign that says hello world", negative_prompt="", num_inference_steps=28, guidance_scale=7.0, ).images[0] image ```  Refer to [our documentation](https://huggingface.co/docs/diffusers/main/en/api/pipelines/stable_diffusion/stable_diffusion_3) for learning all the optimizations you can apply to SD3 as well as the image-to-image pipeline. Additionally, we support DreamBooth + LoRA fine-tuning of Stable Diffusion 3 through rectified flow. Check out [this directory](https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/README_sd3.md) for more details.
📋 Changes
- Change checkpoint key used to identify CLIP models in single file checkpoints by @DN6 in #8319
📦 Hunyuan DiT
- 
- ```python
- import torch
- from diffusers import HunyuanDiTPipeline
- pipe = HunyuanDiTPipeline.from_pretrained(
- "Tencent-Hunyuan/HunyuanDiT-Diffusers", torch_dtype=torch.float16
- )
- pipe.to("cuda")
- + 6 more
📦 All commits
- Release: v0.28.0 by @sayakpaul (direct commit on v0.28.1-patch)
- [Core] Introduce class variants for `Transformer2DModel` by @sayakpaul in #7647
- resolve comflicts by @toshas (direct commit on v0.28.1-patch)
- Tencent Hunyuan Team: add HunyuanDiT related updates by @gnobitab in #8240
- Tencent Hunyuan Team - Updated Doc for HunyuanDiT by @gnobitab in #8383
- [Transformer2DModel] Handle `norm_type` safely while remapping by @sayakpaul in #8370
- Release: v0.28.1 by @sayakpaul (direct commit on v0.28.1-patch)
📦 Significant community contributions
- The following contributors have made significant changes to the library over the last release:
- @gnobitab
- Tencent Hunyuan Team: add HunyuanDiT related updates (#8240)
- Tencent Hunyuan Team - Updated Doc for HunyuanDiT (#8383)
📦 Marigold
- 
- _(Image taken from the [official repository](https://github.com/prs-eth/Marigold))_
- The code snippet below shows how to use this pipeline for depth estimation:
- ```python
- import diffusers
- import torch
- pipe = diffusers.MarigoldDepthPipeline.from_pretrained(
- "prs-eth/marigold-depth-lcm-v1-0", variant="fp16", torch_dtype=torch.float16
- + 9 more
♻️ 🌀 Massive Refactor of `from_single_file` 🌀
- Some of the changes introduced in this refactor:
- ```python
- pipe = StableDiffusionPipeline.from_single_file("...", config=<model repo id or local repo path>)
- ```
📦 PixArt Sigma
- <div align="center">
- <img src="https://github.com/huggingface/diffusers/assets/22957388/31f2b30b-e46f-4fc9-aeb7-a6dea50b474b" width=700/><br>
- <small>(Taken from the <a href="https://pixart-alpha.github.io/PixArt-sigma-project">project website</a>.)</small>
- </div>
- <br>
- ```python
- import torch
- from diffusers import PixArtSigmaPipeline
- + 9 more
📦 AnimateDiff SDXL
- ```python
- import torch
- from diffusers.models import MotionAdapter
- from diffusers import AnimateDiffSDXLPipeline, DDIMScheduler
- from diffusers.utils import export_to_gif
- adapter = MotionAdapter.from_pretrained("guoyww/animatediff-motion-adapter-sdxl-beta", torch_dtype=torch.float16)
- model_id = "stabilityai/stable-diffusion-xl-base-1.0"
- scheduler = DDIMScheduler.from_pretrained(
- + 28 more
📦 Block-wise LoRA
- ```python
- ...
- adapter_weight_scales = { "unet": { "down": 0, "mid": 1, "up": 0} }
- pipe.set_adapters("pixel", adapter_weight_scales)
- image = pipe(
- prompt, num_inference_steps=30, generator=torch.manual_seed(0)
- ).images[0]
- ```
- + 1 more
📦 InstantStyle
- ```python
- ...
- scale = {
- "down": {"block_2": [0.0, 1.0]},
- "up": {"block_0": [0.0, 1.0, 0.0]},
- }
- pipeline.set_ip_adapter_scale(scale)
- ```
- + 1 more
📦 ControlNetXS
- Thanks to @UmerHA for contributing ControlNet-XS in #5827 and #6772.
📦 Custom Timesteps
- ```python
- from diffusers.schedulers import AysSchedules
- sampling_schedule = AysSchedules["StableDiffusionXLTimesteps"]
- pipe = StableDiffusionXLPipeline.from_pretrained(
- "SG161222/RealVisXL_V4.0",
- torch_dtype=torch.float16,
- variant="fp16",
- ).to("cuda")
- + 5 more
📦 `device_map` in Pipelines 🧪
- ```python
- from diffusers import DiffusionPipeline
- import torch
- pipeline = DiffusionPipeline.from_pretrained(
- "runwayml/stable-diffusion-v1-5",
- torch_dtype=torch.float16,
- device_map="balanced"
- )
- + 16 more
✨ New Guides 📑
- [ControlNet Outpainting](https://huggingface.co/blog/OzzyGT/outpainting-controlnet): Learn how to do outpainting with a specific [ControlNet model](https://huggingface.co/destitech/controlnet-inpaint-dreamer-sdxl) trained for this task. This method is best for creative outpainting.
- [Differential Diffusion Outpainting](https://huggingface.co/blog/OzzyGT/outpainting-differential-diffusion): Use a [novel framework](https://github.com/exx8/differential-diffusion) that enables customization of the amount of change per pixel or per image region, allowing seamless outpainting. This can be used for expanding images beyond their initial size.
- [Outpainting using an Inpaint Model](https://huggingface.co/blog/OzzyGT/outpainting-inpaint-model): Using various techniques, learn how to use a regular inpainting model to do outpainting while preserving the original subject intact. This is ideal for product catalogs.
📦 Official Callbacks
- We introduced official callbacks that you can conveniently plug into your pipeline. For example, to turn off classifier-free guidance after denoising steps with `SDXLCFGCutoffCallback`.
- ```python
- import torch
- from diffusers import DiffusionPipeline
- from diffusers.callbacks import SDXLCFGCutoffCallback
- callback = SDXLCFGCutoffCallback(cutoff_step_ratio=0.4)
- pipeline = StableDiffusionXLPipeline.from_pretrained(
- "stabilityai/stable-diffusion-xl-base-1.0",
- + 11 more
📦 Community Pipelines and `from_pipe` API
- Read more about `from_pipe` API in our [documentation](https://huggingface.co/docs/diffusers/main/en/using-diffusers/loading#reuse-a-pipeline) 📃.
- Here are four new community pipelines since our last release.
📦 BoxDiff
- ```python
- pipe_box = DiffusionPipeline.from_pipe(
- pipe_sd,
- custom_pipeline="pipeline_stable_diffusion_boxdiff",
- )
- pipe_box.enable_model_cpu_offload()
- phrases = ["aurora","reindeer","meadow","lake","mountain"]
- boxes = [[1,3,512,202], [75,344,421,495], [1,327,508,507], [2,217,507,341], [1,135,509,242]]
- + 15 more
📦 HD-Painter
- ```python
- pipe = DiffusionPipeline.from_pipe(
- pipe_box,
- custom_pipeline="hd_painter"
- )
- pipe.scheduler = DDIMScheduler.from_config(pipe.scheduler.config)
- prompt = "wooden boat"
- init_image = load_image("https://raw.githubusercontent.com/Picsart-AI-Research/HD-Painter/main/__assets__/samples/images/2.jpg")
- + 4 more
📦 Differential Diffusion
- ```python
- pipeline = DiffusionPipeline.from_pipe(
- pipe_sdxl,
- custom_pipeline="pipeline_stable_diffusion_xl_differential_img2img",
- ).to("cuda")
- pipeline.scheduler = DPMSolverMultistepScheduler.from_config(pipeline.scheduler.config, use_karras_sigmas=True)
- prompt = "a green pear"
- negative_prompt = "blurry"
- + 12 more
📦 All Commits
- clean dep installation step in push_tests by @sayakpaul in #7382
- [LoRA test suite] refactor the test suite and cleanse it by @sayakpaul in #7316
- [Custom Pipelines with Custom Components] fix multiple things by @sayakpaul in #7304
- Fix typos by @standardAI in #7411
- fix: enable unet_3d_condition to support time_cond_proj_dim by @yhZhai in #7364
- add: space within docs to calculate mememory usage. by @sayakpaul (direct commit on v0.28.0-release)
- Revert "add: space within docs to calculate mememory usage." by @sayakpaul (direct commit on v0.28.0-release)
- [Docs] add missing output image by @sayakpaul in #7425
- + 232 more
📦 Significant community contributions
- The following contributors have made significant changes to the library over the last release:
- @standardAI
- Fix typos (#7411)
- [`IP-Adapter`] Fix IP-Adapter Support and Refactor Callback for `StableDiffusionPanoramaPipeline` (#7262)
- [`Docs`] Fix typos (#7451)
- Fix Tiling in `ConsistencyDecoderVAE` (#7290)
- Fix CPU offload in docstring (#7827)
- Fix image upcasting (#7858)
- + 46 more
📦 All commits
- [scheduler] fix a bug in add_noise by @yiyixuxu in https://github.com/huggingface/diffusers/pull/7386
- [LoRA] fix cross_attention_kwargs problems and tighten tests by @sayakpaul in https://github.com/huggingface/diffusers/pull/7388
- Fix issue with prompt embeds and latents in SD Cascade Decoder with multiple image embeddings for a single prompt. by @DN6 in https://github.com/huggingface/diffusers/pull/7381
📦 All commits
- Release: v0.27.0 by @DN6 (direct commit on v0.27.1-patch)
- [LoRA] pop the LoRA scale so that it doesn't get propagated to the weeds by @sayakpaul in #7338
- Release: 0.27.1-patch by @sayakpaul (direct commit on v0.27.1-patch)
📦 Stable Cascade
- ```python
- from diffusers import StableCascadePriorPipeline, StableCascadeDecoderPipeline
- import torch
- prior = StableCascadePriorPipeline.from_pretrained(
- "stabilityai/stable-cascade-prior",
- torch_dtype=torch.bfloat16,
- ).to("cuda")
- prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
- + 10 more
📦 Playground v2.5
- ```python
- from diffusers import DiffusionPipeline
- import torch
- pipe = DiffusionPipeline.from_pretrained(
- "playgroundai/playground-v2.5-1024px-aesthetic",
- torch_dtype=torch.float16,
- variant="fp16",
- ).to("cuda")
- + 38 more
📦 EDM-style training support
- To train `stabilityai/stable-diffusion-xl-base-1.0` using the EDM formulation, you just have to specify the `--do_edm_style_training` flag in your training command, and voila 🤗
- If you’re interested in extending this formulation to other training scripts, we refer you to [this PR](https://github.com/huggingface/diffusers/pull/7126).
📦 Trajectory Consistency Distillation
- ```python
- import torch
- from diffusers import StableDiffusionXLPipeline, TCDScheduler
- device = "cuda"
- base_model_id = "stabilityai/stable-diffusion-xl-base-1.0"
- tcd_lora_id = "h1t/TCD-SDXL-LoRA"
- pipe = StableDiffusionXLPipeline.from_pretrained(base_model_id, torch_dtype=torch.float16, variant="fp16").to(device)
- pipe.scheduler = TCDScheduler.from_config(pipe.scheduler.config)
- + 13 more
📦 IP-Adapter image embeddings and masking
- 📜 To know about the exact usage of both of the above, refer to our [official guide](https://huggingface.co/docs/diffusers/main/en/using-diffusers/ip_adapter).
- We thank our community members, @fabiorigano, @asomoza, and @cubiq, for their guidance and input on these features.
📦 Guide on merging LoRAs
- Merging LoRAs can be a fun and creative way to create new and unique images. Diffusers provides merging support with the `set_adapters` method which concatenates the weights of the LoRAs to merge.
- 📜 Take a look at the [Merge LoRAs](https://huggingface.co/docs/diffusers/main/en/using-diffusers/merge_loras) guide to learn more about merging in Diffusers.
📦 LEDITS++
- The code snippet below shows a usage:
- ```python
- import torch
- import PIL
- import requests
- from io import BytesIO
- from diffusers import LEditsPPPipelineStableDiffusionXL, AutoencoderKL
- device = "cuda"
- + 31 more
📦 All commits
- Fix flaky IP Adapter test by @DN6 in #6960
- Move SDXL T2I Adapter lora test into PEFT workflow by @DN6 in #6965
- Allow passing `config_file` argument to ControlNetModel when using `from_single_file` by @DN6 in #6959
- [`PEFT` / `docs`] Add a note about torch.compile by @younesbelkada in #6864
- [Core] Harmonize single file ckpt model loading by @sayakpaul in #6971
- fix: controlnet inpaint single file. by @sayakpaul in #6975
- [docs] IP-Adapter by @stevhliu in #6897
- fix IPAdapter unload_ip_adapter test by @yiyixuxu in #6972
- + 161 more
📦 Significant community contributions
- The following contributors have made significant changes to the library over the last release:
- @ihkap11
- Fix diffusers import prompt2prompt (#6927)
- Refactor Prompt2Prompt: Inherit from DiffusionPipeline (#7211)
- @ustcuna
- [Community Pipelines]Accelerate inference of stable diffusion xl (SDXL) by IPEX on CPU (#6683)
- @rootonchair
- IP-Adapter support for StableDiffusionXLControlNetInpaintPipeline (#6941)
- + 29 more
📦 All commits
- Fix configuring VAE from single file mixin by @DN6 in #6950
- [DPMSolverSinglestepScheduler] correct `get_order_list` for `solver_order=2`and `lower_order_final=True` by @yiyixuxu in #6953
📦 All commits
- add `self.use_ada_layer_norm_*` params back to `BasicTransformerBlock` by @yiyixuxu in #6841
📦 All commits
- add is_torchvision_available by @yiyixuxu in #6800