Repositories tagged with "inference-acceleration"
TurboDiffusion
thu-ml
โTurboDiffusion: 100โ200ร Acceleration for Video Diffusion Modelsโ
SageAttention
โ[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.โ
TeaCache
ali-vilab
โTimestep Embedding Tells: It's Time to Cache for Video Diffusion Modelโ
SpargeAttn
โ[ICML2025] SpargeAttention: A training-free sparse attention that accelerates any model inference.โ
SLA
โSLA: Beyond Sparsity in Diffusion Transformers via Fine-Tunable SparseโLinear Attentionโ
EasyCache
H-EmbodVis
โLess is Enough: Training-Free Video Diffusion Acceleration via Runtime-Adaptive Cachingโ
Discrete-Diffusion-Forcing
SJTU-DENG-Lab
โDiscrete Diffusion Forcing (D2F): dLLMs Can Do Faster-Than-AR Inferenceโ
AsyncDiff
czg1225
โ[NeurIPS 2024] AsyncDiff: Parallelizing Diffusion Models by Asynchronous Denoisingโ
nos
autonomi-ai
โโก๏ธ A fast and flexible PyTorch inference server that runs locally, on any cloud or AI HW. โ
KsanaDiT
Tencent
โKsanaDiT: High-Performance DiT (Diffusion Transformer) Inference Framework for Video & Image Generationโ
Q-LLM
JIA-Lab-research
โThis is the official repo of "QuickLLaMA: Query-aware Inference Acceleration for Large Language Models"โ