Home/qualcomm/aimet/Changelog

qualcomm/aimet

AIMET is a library that provides advanced quantization and compression techniques for trained neural network models.

30 Releases

Latest: 1w ago

Version 2.32.12.32.1Latest

aimet-bot·1w ago·June 4, 2026

GitHub

**Full Changelog**: https://github.com/qualcomm/aimet/compare/2.32.0...2.32.1

Version 2.32.02.32.0

aimet-bot·1w ago·June 3, 2026

GitHub

📋 Changes

Bug fixes and Improvements
ONNX
Add C++ support for bfloat16 quantization (ca7d3e01b)
Fix large model support with protobuf 7.x (9ef22519a)
Skip QDQ pair scale/zp in duplicate_shared_initializers (05e8332ce)
Handle Identity passthrough in duplicate_shared_initializers (1b27d9841)
Fix SpinQuant embed_tokens filter to exclude non-embedding Gathers (81b80411a)
Inline fused supergroups after encoding propagation (68fdcb673)
+ 4 more

Version 2.31.02.31.0

bhushan23·3w ago·May 20, 2026

GitHub

📋 Changes

New Features
ONNX
Support Qwen 3VL in AdaScale ONNX (35d2440db)
Torch
Add Gemma 3 support for AdaScale (a2da0de9c)
LoRA integration (0b90d8a4f)
Removed Features
Torch
+ 17 more

Version 2.30.02.30.0

aimetci·1mo ago·May 5, 2026

GitHub

📋 Changes

New Features
ONNX
Extend SpinQuant support for Vision-Language Models (VLM) (e5cd62847)
Torch
Remove legacy aimet_torch v1. v2 is now the sole API (4192da749, 00e3c7220)
Bug fixes and Improvements
ONNX
Improve set_and_freeze_param_encodings (39bf1f69b, d320cae7e)
+ 7 more

Version 2.29.02.29.0

aimetci·1mo ago·April 20, 2026

GitHub

📋 Changes

New Features
ONNX
Add support for Qwen 2.5 VL in aimet-onnx (f25668610)
Torch
Support OOTB quantization of nn.MultiHeadAttention (4d19f470f)
Support OOTB quantization of Qwen 3.5 normalization layers (01b912f65)
Support OOTB quantization of InternVL GELU (c5f65b782)
Bug fixes and Improvements
+ 11 more

Version 2.28.02.28.0

aimetci·2mo ago·April 6, 2026

GitHub

📋 Changes

New Features
Torch
Add resumable checkpointing for AdaScale optimization (20ecb0a)
Common
Migrate pybind11 bindings to Cython using Python's Stable ABI to enable Python-version-independent wheels (0d6f856)
Bug fixes and Improvements
Torch
Fix rescale encodings not propagating with shared scale values (d9f3a90)
+ 2 more

Version 2.27.02.27.0

aimetci·2mo ago·March 26, 2026

GitHub

📋 Changes

Bug fixes and Improvements
ONNX
Add `force_activation_as` option to export APIs to control activation signedness (3583462)
Torch
Reduce quantize-dequantize latency overhead (9ca3bf4, 525e993, b3de9a2)
Optimize inference speed for GenAITests models (cacd5cc, b6ea5bd, 30ab60a)
Allow checkpointing and loading during SeqMSE optimization (4eb97f0)
Fix SeqMSE error when model contains unquantized Conv/Linear layers (3dd4ca9)
+ 4 more

Version 2.26.02.26.0

aimetci·3mo ago·March 9, 2026

GitHub

📋 Changes

Bug fixes and Improvements
ONNX
Implement onnxscript RMSNorm fusion for improved graph optimization (68710d9)
Propagate encoding through Concat during ONNX QDQ export (4811a34)
Export scale & offset as initializers instead of Constants in ONNX QDQ export (ea9a619)
Fix AdaScale (aimet-onnx) for Qwen3 models (beac8f8)
Fix BN fold for YOLO models (bae9953)
Torch
+ 20 more

Version 2.25.12.25.1

aimetci·3mo ago·March 3, 2026

GitHub

📋 Changes

Bug fixes and Improvements
ONNX
Fix for encoding propagation for concat layers (5084af3)
Torch
Fix to reduce GPU RAM usage for AdaScale for Qwen 3 VL model (ee3d193)

Version 2.25.02.25.0

aimetci·3mo ago·February 25, 2026

GitHub

📋 Changes

Bug fixes and Improvements
ONNX
Reduced peak CPU memory usage for AdaScale and SeqMSE techniques (28f89a7)
Reduced peak CUDA memory usage for AdaScale technique (a29f44f)
Added support for Qwen3 VL models in GenAITests (c014961)
ONNX-IR based supergroup pattern detection and replacement (9972c1b)
Tie concat and interpolation ops by default (a8ac6f4)
Torch
+ 7 more

Version 2.24.02.24.0

aimetci·4mo ago·February 10, 2026

GitHub

📋 Changes

Bug fixes and Improvements
ONNX
Add Windows ARM64 wheel build/test support, distribute Windows ARM64 wheel on GitHub releases (1390b96)
Add transpose MatMul support in Sequential MSE (ff7a284)
Torch
Expose block-level AdaScale API (72246db)
Improve numerical stability of zero point shifting ([-1.5, -.5, .5, 1.5]) implementation (489f7df)
Fix :func:`replace_lora_layers_with_quantizable_layers` to inherit train/eval flag (af5a82d)
+ 6 more

Version 2.23.02.23.0

aimetci·4mo ago·January 28, 2026

GitHub

📋 Changes

Bug fixes and Improvements
ONNX
Disable per-channel quantization for ConvTranspose ops (9395e32)
New top level API for configuring parameter quantization type (a1c197d)
Torch
Enable Torch Dynamo ONNX export (59e0125)
Common
Enable per-channel matmul quantization in config files (7137849)
+ 2 more

Version 2.22.02.22.0

aimetci·5mo ago·January 13, 2026

GitHub

📋 Changes

Bug fixes and Improvements
ONNX
Allow loading 2.0.0 encoding format to sim (e8cb098)
Fix Cast unpacking error (6761a19)
Enable exporting non-LPBQ encodings with zero_point shift (7b3cc4c)
Implement aimet-onnx LPBQEncoding (5ad7ea6)
Common
Support exporting 1x1 Conv LPBQ to ONNX QDQ (58ce71d)

Version 2.21.02.21.0

aimetci·6mo ago·December 15, 2025

GitHub

📋 Changes

Bug fixes and Improvements
ONNX
Fix IndexError when Conv or Linear layers are reused in the model (65c4b3b)
Add optional argument `export_int32_bias` to aimet-onnx export (3b8e0f0)
Unpin PyTorch version in aimet-onnx (d99b6c4)
Align NaN handling with ORT CPU Execution Provider (e4c49eb)
Fix quantization axis handling for transposed MatMul operations (6ca06d6)
PyTorch
+ 1 more

Version 2.20.02.20.0

aimetci·6mo ago·December 2, 2025

GitHub

📋 Changes

Bug fixes and Improvements
Common
Update supported python version to >=3.10 ([2bc8c94](https://github.com/quic/aimet/commit/2bc8c94fcced5ceff790f2c8a0b8347ee42f0be1))
Repackage aimet_common as alias to aimet_onnx.common or aimet_torch.common ([074e85f](https://github.com/quic/aimet/commit/074e85fd15b92c2b65b03059374a5272f07bdeb5))
Remove Pad op from data movement ops ([21cddb6](https://github.com/quic/aimet/commit/21cddb68889e3d01843de8744e8493f6daa3db28))
ONNX
Export data movement op output encoding in sim.export by default ([550c029](https://github.com/quic/aimet/commit/550c0291d074626e555db6b6a5fa3239f333787e))
Assign generic node names if node name is missing or duplicate ([273dd82](https://github.com/quic/aimet/commit/273dd8202489205ff39d20d52a227053ee6cd2e6))
+ 18 more

Version 2.19.02.19.0

aimetci·6mo ago·November 19, 2025

GitHub

📋 Changes

New Features
Bug fixes and Improvements
ONNX
Make LiteMP API percentage float (69f96ff)
Set layernorm int16 weight to symmetric by default (8560e13)
Automatically insert data movement op output qdq during to_onnx_qdq (15c8b9b)
Create LazyExtractor to handle external data for onnx Extractor utils (104e7e8)
Tie input/output encodings across maximum Concat subgraph (832ea91)
+ 6 more

Version 2.18.02.18.0

aimetci·7mo ago·November 6, 2025

GitHub

📋 Changes

New Features
Torch
Promoted aimettorch.onnx.export and QuantizationSimModel.onnx.export as production APIs (99160d2, e026fd1)
Added utility functions to exclude some or all unknown nn.Modules from quantization (5a419f3, 501eebd)
Bug fixes and Improvements
ONNX
Fixed supergroup misidentification bug upon MatMul-MatMul-Add sequence (ab63866)
Torch
+ 4 more

Version 2.17.02.17.0

aimetci·7mo ago·October 20, 2025

GitHub

📋 Changes

Bug fixes and Improvements
ONNX
Optimize SeqMSE latency and CPU memory usage (434ac6b)
Support excluding nodes from SeqMSE optimization (6a37239)
Support exporting large models (> 2GB) to ONNX QDQ (b1dafe6, 1bf8b82)
Support exporting float16 ONNX models to ONNX QDQ (66ccb45)
Allow disabling MatMul-Add supergroup via config file (e49660c)
Fix bug where on-disk tensor data is deleted before InferenceSession (d57a934)
+ 7 more

Version 2.16.02.16.0

aimetci·8mo ago·October 7, 2025

GitHub

📋 Changes

ONNX
Experimental - Added Adascale, a post-training quantization technique ([5e23ceb](https://github.com/quic/aimet/commit/5e23cebea551c074f7a380ef2f385fd95433bb53))
ONNX
Skip tying Concat input/output quantizers with conflicting encoding constraints ([b924107](https://github.com/quic/aimet/commit/b9241073256c4a455426451efbc1f3d0672e37b2))
Small updates to FPT Quant for improved accuracy ([ba10947](https://github.com/quic/aimet/commit/ba10947bdbdecdf2980f076560453991c3888e77))
Implement partial encoding freezing mechanism in aimet-onnx ([658ec3c](https://github.com/quic/aimet/commit/658ec3c20be379b582321171e28f92e8fab1102b))
Add Relu partial encoding constraints to HTP config files ([dc8d978](https://github.com/quic/aimet/commit/dc8d978f672e5a93ecb5c8de64017ccaf949d2bf))
Clear encoding analyzer stats after computing param encodings ([3d4725f](https://github.com/quic/aimet/commit/3d4725fc172bffeadd87ee993b7a30e5d51691b2))
+ 5 more

Version 2.15.12.15.1

aimetci·8mo ago·September 27, 2025

GitHub

📋 Changes

ONNX
Experimental - Added Adascale, a post-training quantization technique ([5e23ceb](https://github.com/quic/aimet/commit/5e23cebea551c074f7a380ef2f385fd95433bb53))

Version 2.15.02.15.0

aimetci·8mo ago·September 22, 2025

GitHub

📋 Changes

Bug fixes and Improvements
ONNX
Throws an error on `bfloat16` models (5181860)
Added docs and examples for LiteMP (3d5e0dd)
Export to QDQ ONNX with pre-quantized constants (a97354f)
PyTorch
Fix multiple dispatch issue when torch function is called in nested context manager (6216ca0)
Keras
+ 3 more

Version 2.14.02.14.0

aimetci·9mo ago·September 8, 2025

GitHub

📋 Changes

New Feature
ONNX
Add support for FP16 in `QuantizationSimModel` (2494d90)
Bug fixes and Improvements
ONNX
Add sequential MSE support for ``onnx >= 1.18.0``. (754d030)
Improve histogram granularity during TFE calibration (91109af)
Improve runtime for `QuantizationSimModel` creation for large models like LLMs (f7e700f)
+ 9 more

Version 2.13.02.13.0

aimetci·9mo ago·August 26, 2025

GitHub

📋 Changes

Bug fixes and Improvements
ONNX
Adjust weight scale for int32 bias overflow in W16A16 quantization (f39c0bf)
AutoQuant: Remove deprecated feature (414cdde)
Support exporting large models in aimet-onnx (0fe6701)
AdaRound: Delete deprecated top-level API. (bfba557)
AdaRound: Skip optimization if no input to layer (18dfedc)
PyTorch
+ 4 more

Version 2.12.02.12.0

aimetci·10mo ago·August 13, 2025

GitHub

📋 Changes

Bug fixes and Improvements
Common
Remove data movement ops from config (ae02aa8)
ONNX
Exclude bias from quantization when weights are not quantized (62f5879)
AdaRound: Fix prelu failing in CUDA model (b2350b2)
PyTorch
Wrap aimet_torch.onnx.export with torch.no_grad (b73bb71)
+ 3 more

Version 2.11.02.11.0

aimetci·10mo ago·July 29, 2025

GitHub

📋 Changes

New Feature
PyTorch
SpinQuant (experimental) - implement SpinQuant PTQ technique (https://arxiv.org/pdf/2308.13137) for Llama, Qwen2, and Mistral families (R1 rotation w/o optimization) (7364b37)
Enable Adascale and Omniquant for Mistral (d33e98c)
ONNX
Enable llm_configurator for Llama (Experimental) (08c17b8)
Bug fixes and Improvements
Common
+ 22 more

Version 2.10.02.10.0

aimetci·11mo ago·July 14, 2025

GitHub

📋 What's Changed

New Feature
Promote to_onnx_qdq to a public API (f333188). Note: This is currently a beta feature
Bug fixes and Improvements
Common
Added hover tooltip to plot per layer sensitivity. Changed x-axis to plot layer indices instead of names (c96894f)
PyTorch
Implement scaling factor in aimet-torch float QDQ (9b8c655)
Fix CustomSiLU bug (499df9f)
+ 20 more

Version 2.9.02.9.0

aimetci·11mo ago·July 1, 2025

GitHub

📋 What's Changed

Bug Fixes and Improvements
ONNX
Rename QuantizeLinear outputs from <...>_int to <...>_q in onnx QDQ export (e78dbec)
Preserve I/O names in onnx QDQ export (35ad990)
Allow freezing loaded encodings in load_encodings_to_sim (911af75)
Represent activation QDQ with uint in encodings 2.0.0 in onnx QDQ export (92f63f5)
Allow aimet-onnx to load partial encodings (6636515)
Fix onnx sim.export permanently removing quantizers (9a2a407)
+ 7 more

Version 2.8.02.8.0

aimetci·12mo ago·June 18, 2025

GitHub

📋 What's Changed

New Features
ONNX
Update aimet_onnx `QuantizationSimModel.__init__` function signature (cbe67ae)
Defined new AdaRound API `aimet_onnx.apply_adaround` (84edcf5)
Defined new sequential MSE API `aimet_onnx.apply_seq_mse` (836ab1e)
Defined new per-layer sensitivity analysis API `aimet_onnx.analyze_per_layer_sensitivity` (dc34fa4)
Allowed onnx `QuantizationSimModel.compute_encodings` to take iterables (2c8ae88)
PyTorch
+ 12 more

Version 2.7.02.7.0

aimetci·1y ago·June 2, 2025

GitHub

📋 What's Changed

New Features
PyTorch
OmniQuant (experimental) - implement OmniQuant PTQ technique (https://arxiv.org/pdf/2308.13137) for Llama and Qwen2 model families
Bug Fixes and Improvements
ONNX
Remove DlCompression, DlEqualization, OpenCV, zlib dependencies
Support loading encodings for missing quantizers
Set bitwidth of tensor quantizer while loading encodings
+ 5 more

Version 2.6.02.6.0

aimetci·1y ago·May 16, 2025

GitHub

📋 What's Changed

New Features
ONNX
Support for passing onnxruntime EPs directly to `QuantizationSimModel.__init__`
PyTorch
Support for simulating float8 quantization
Experimental: Added `aimet_torch.onnx.export` API for exporting `QuantizationSimModel` to onnx QDQ graph
Bug Fixes and Improvements
ONNX
+ 13 more

View all releases on GitHub

← Back to aimet wiki