qualcomm/aimet
AIMET is a library that provides advanced quantization and compression techniques for trained neural network models.
30 Releases
Latest: 1w ago
Version 2.32.12.32.1Latest
**Full Changelog**: https://github.com/qualcomm/aimet/compare/2.32.0...2.32.1
Version 2.32.02.32.0
📋 Changes
- Bug fixes and Improvements
- ONNX
- Add C++ support for bfloat16 quantization (ca7d3e01b)
- Fix large model support with protobuf 7.x (9ef22519a)
- Skip QDQ pair scale/zp in duplicate_shared_initializers (05e8332ce)
- Handle Identity passthrough in duplicate_shared_initializers (1b27d9841)
- Fix SpinQuant embed_tokens filter to exclude non-embedding Gathers (81b80411a)
- Inline fused supergroups after encoding propagation (68fdcb673)
- + 4 more
Version 2.31.02.31.0
📋 Changes
- New Features
- ONNX
- Support Qwen 3VL in AdaScale ONNX (35d2440db)
- Torch
- Add Gemma 3 support for AdaScale (a2da0de9c)
- LoRA integration (0b90d8a4f)
- Removed Features
- Torch
- + 17 more
Version 2.30.02.30.0
📋 Changes
- New Features
- ONNX
- Extend SpinQuant support for Vision-Language Models (VLM) (e5cd62847)
- Torch
- Remove legacy aimet_torch v1. v2 is now the sole API (4192da749, 00e3c7220)
- Bug fixes and Improvements
- ONNX
- Improve set_and_freeze_param_encodings (39bf1f69b, d320cae7e)
- + 7 more
Version 2.29.02.29.0
📋 Changes
- New Features
- ONNX
- Add support for Qwen 2.5 VL in aimet-onnx (f25668610)
- Torch
- Support OOTB quantization of nn.MultiHeadAttention (4d19f470f)
- Support OOTB quantization of Qwen 3.5 normalization layers (01b912f65)
- Support OOTB quantization of InternVL GELU (c5f65b782)
- Bug fixes and Improvements
- + 11 more
Version 2.28.02.28.0
📋 Changes
- New Features
- Torch
- Add resumable checkpointing for AdaScale optimization (20ecb0a)
- Common
- Migrate pybind11 bindings to Cython using Python's Stable ABI to enable Python-version-independent wheels (0d6f856)
- Bug fixes and Improvements
- Torch
- Fix rescale encodings not propagating with shared scale values (d9f3a90)
- + 2 more
Version 2.27.02.27.0
📋 Changes
- Bug fixes and Improvements
- ONNX
- Add `force_activation_as` option to export APIs to control activation signedness (3583462)
- Torch
- Reduce quantize-dequantize latency overhead (9ca3bf4, 525e993, b3de9a2)
- Optimize inference speed for GenAITests models (cacd5cc, b6ea5bd, 30ab60a)
- Allow checkpointing and loading during SeqMSE optimization (4eb97f0)
- Fix SeqMSE error when model contains unquantized Conv/Linear layers (3dd4ca9)
- + 4 more
Version 2.26.02.26.0
📋 Changes
- Bug fixes and Improvements
- ONNX
- Implement onnxscript RMSNorm fusion for improved graph optimization (68710d9)
- Propagate encoding through Concat during ONNX QDQ export (4811a34)
- Export scale & offset as initializers instead of Constants in ONNX QDQ export (ea9a619)
- Fix AdaScale (aimet-onnx) for Qwen3 models (beac8f8)
- Fix BN fold for YOLO models (bae9953)
- Torch
- + 20 more
Version 2.25.12.25.1
📋 Changes
- Bug fixes and Improvements
- ONNX
- Fix for encoding propagation for concat layers (5084af3)
- Torch
- Fix to reduce GPU RAM usage for AdaScale for Qwen 3 VL model (ee3d193)
Version 2.25.02.25.0
📋 Changes
- Bug fixes and Improvements
- ONNX
- Reduced peak CPU memory usage for AdaScale and SeqMSE techniques (28f89a7)
- Reduced peak CUDA memory usage for AdaScale technique (a29f44f)
- Added support for Qwen3 VL models in GenAITests (c014961)
- ONNX-IR based supergroup pattern detection and replacement (9972c1b)
- Tie concat and interpolation ops by default (a8ac6f4)
- Torch
- + 7 more
Version 2.24.02.24.0
📋 Changes
- Bug fixes and Improvements
- ONNX
- Add Windows ARM64 wheel build/test support, distribute Windows ARM64 wheel on GitHub releases (1390b96)
- Add transpose MatMul support in Sequential MSE (ff7a284)
- Torch
- Expose block-level AdaScale API (72246db)
- Improve numerical stability of zero point shifting ([-1.5, -.5, .5, 1.5]) implementation (489f7df)
- Fix :func:`replace_lora_layers_with_quantizable_layers` to inherit train/eval flag (af5a82d)
- + 6 more
Version 2.23.02.23.0
📋 Changes
- Bug fixes and Improvements
- ONNX
- Disable per-channel quantization for ConvTranspose ops (9395e32)
- New top level API for configuring parameter quantization type (a1c197d)
- Torch
- Enable Torch Dynamo ONNX export (59e0125)
- Common
- Enable per-channel matmul quantization in config files (7137849)
- + 2 more
Version 2.22.02.22.0
📋 Changes
- Bug fixes and Improvements
- ONNX
- Allow loading 2.0.0 encoding format to sim (e8cb098)
- Fix Cast unpacking error (6761a19)
- Enable exporting non-LPBQ encodings with zero_point shift (7b3cc4c)
- Implement aimet-onnx LPBQEncoding (5ad7ea6)
- Common
- Support exporting 1x1 Conv LPBQ to ONNX QDQ (58ce71d)
Version 2.21.02.21.0
📋 Changes
- Bug fixes and Improvements
- ONNX
- Fix IndexError when Conv or Linear layers are reused in the model (65c4b3b)
- Add optional argument `export_int32_bias` to aimet-onnx export (3b8e0f0)
- Unpin PyTorch version in aimet-onnx (d99b6c4)
- Align NaN handling with ORT CPU Execution Provider (e4c49eb)
- Fix quantization axis handling for transposed MatMul operations (6ca06d6)
- PyTorch
- + 1 more
Version 2.20.02.20.0
📋 Changes
- Bug fixes and Improvements
- Common
- Update supported python version to >=3.10 ([2bc8c94](https://github.com/quic/aimet/commit/2bc8c94fcced5ceff790f2c8a0b8347ee42f0be1))
- Repackage aimet_common as alias to aimet_onnx.common or aimet_torch.common ([074e85f](https://github.com/quic/aimet/commit/074e85fd15b92c2b65b03059374a5272f07bdeb5))
- Remove Pad op from data movement ops ([21cddb6](https://github.com/quic/aimet/commit/21cddb68889e3d01843de8744e8493f6daa3db28))
- ONNX
- Export data movement op output encoding in sim.export by default ([550c029](https://github.com/quic/aimet/commit/550c0291d074626e555db6b6a5fa3239f333787e))
- Assign generic node names if node name is missing or duplicate ([273dd82](https://github.com/quic/aimet/commit/273dd8202489205ff39d20d52a227053ee6cd2e6))
- + 18 more
Version 2.19.02.19.0
📋 Changes
- New Features
- Bug fixes and Improvements
- ONNX
- Make LiteMP API percentage float (69f96ff)
- Set layernorm int16 weight to symmetric by default (8560e13)
- Automatically insert data movement op output qdq during to_onnx_qdq (15c8b9b)
- Create LazyExtractor to handle external data for onnx Extractor utils (104e7e8)
- Tie input/output encodings across maximum Concat subgraph (832ea91)
- + 6 more
Version 2.18.02.18.0
📋 Changes
- New Features
- Torch
- Promoted aimettorch.onnx.export and QuantizationSimModel.onnx.export as production APIs (99160d2, e026fd1)
- Added utility functions to exclude some or all unknown nn.Modules from quantization (5a419f3, 501eebd)
- Bug fixes and Improvements
- ONNX
- Fixed supergroup misidentification bug upon MatMul-MatMul-Add sequence (ab63866)
- Torch
- + 4 more
Version 2.17.02.17.0
📋 Changes
- Bug fixes and Improvements
- ONNX
- Optimize SeqMSE latency and CPU memory usage (434ac6b)
- Support excluding nodes from SeqMSE optimization (6a37239)
- Support exporting large models (> 2GB) to ONNX QDQ (b1dafe6, 1bf8b82)
- Support exporting float16 ONNX models to ONNX QDQ (66ccb45)
- Allow disabling MatMul-Add supergroup via config file (e49660c)
- Fix bug where on-disk tensor data is deleted before InferenceSession (d57a934)
- + 7 more
Version 2.16.02.16.0
📋 Changes
- ONNX
- Experimental - Added Adascale, a post-training quantization technique ([5e23ceb](https://github.com/quic/aimet/commit/5e23cebea551c074f7a380ef2f385fd95433bb53))
- ONNX
- Skip tying Concat input/output quantizers with conflicting encoding constraints ([b924107](https://github.com/quic/aimet/commit/b9241073256c4a455426451efbc1f3d0672e37b2))
- Small updates to FPT Quant for improved accuracy ([ba10947](https://github.com/quic/aimet/commit/ba10947bdbdecdf2980f076560453991c3888e77))
- Implement partial encoding freezing mechanism in aimet-onnx ([658ec3c](https://github.com/quic/aimet/commit/658ec3c20be379b582321171e28f92e8fab1102b))
- Add Relu partial encoding constraints to HTP config files ([dc8d978](https://github.com/quic/aimet/commit/dc8d978f672e5a93ecb5c8de64017ccaf949d2bf))
- Clear encoding analyzer stats after computing param encodings ([3d4725f](https://github.com/quic/aimet/commit/3d4725fc172bffeadd87ee993b7a30e5d51691b2))
- + 5 more
Version 2.15.12.15.1
📋 Changes
- ONNX
- Experimental - Added Adascale, a post-training quantization technique ([5e23ceb](https://github.com/quic/aimet/commit/5e23cebea551c074f7a380ef2f385fd95433bb53))
Version 2.15.02.15.0
📋 Changes
- Bug fixes and Improvements
- ONNX
- Throws an error on `bfloat16` models (5181860)
- Added docs and examples for LiteMP (3d5e0dd)
- Export to QDQ ONNX with pre-quantized constants (a97354f)
- PyTorch
- Fix multiple dispatch issue when torch function is called in nested context manager (6216ca0)
- Keras
- + 3 more
Version 2.14.02.14.0
📋 Changes
- New Feature
- ONNX
- Add support for FP16 in `QuantizationSimModel` (2494d90)
- Bug fixes and Improvements
- ONNX
- Add sequential MSE support for ``onnx >= 1.18.0``. (754d030)
- Improve histogram granularity during TFE calibration (91109af)
- Improve runtime for `QuantizationSimModel` creation for large models like LLMs (f7e700f)
- + 9 more
Version 2.13.02.13.0
📋 Changes
- Bug fixes and Improvements
- ONNX
- Adjust weight scale for int32 bias overflow in W16A16 quantization (f39c0bf)
- AutoQuant: Remove deprecated feature (414cdde)
- Support exporting large models in aimet-onnx (0fe6701)
- AdaRound: Delete deprecated top-level API. (bfba557)
- AdaRound: Skip optimization if no input to layer (18dfedc)
- PyTorch
- + 4 more
Version 2.12.02.12.0
📋 Changes
- Bug fixes and Improvements
- Common
- Remove data movement ops from config (ae02aa8)
- ONNX
- Exclude bias from quantization when weights are not quantized (62f5879)
- AdaRound: Fix prelu failing in CUDA model (b2350b2)
- PyTorch
- Wrap aimet_torch.onnx.export with torch.no_grad (b73bb71)
- + 3 more
Version 2.11.02.11.0
📋 Changes
- New Feature
- PyTorch
- SpinQuant (experimental) - implement SpinQuant PTQ technique (https://arxiv.org/pdf/2308.13137) for Llama, Qwen2, and Mistral families (R1 rotation w/o optimization) (7364b37)
- Enable Adascale and Omniquant for Mistral (d33e98c)
- ONNX
- Enable llm_configurator for Llama (Experimental) (08c17b8)
- Bug fixes and Improvements
- Common
- + 22 more
Version 2.10.02.10.0
📋 What's Changed
- New Feature
- Promote to_onnx_qdq to a public API (f333188). Note: This is currently a beta feature
- Bug fixes and Improvements
- Common
- Added hover tooltip to plot per layer sensitivity. Changed x-axis to plot layer indices instead of names (c96894f)
- PyTorch
- Implement scaling factor in aimet-torch float QDQ (9b8c655)
- Fix CustomSiLU bug (499df9f)
- + 20 more
Version 2.9.02.9.0
📋 What's Changed
- Bug Fixes and Improvements
- ONNX
- Rename QuantizeLinear outputs from <...>_int to <...>_q in onnx QDQ export (e78dbec)
- Preserve I/O names in onnx QDQ export (35ad990)
- Allow freezing loaded encodings in load_encodings_to_sim (911af75)
- Represent activation QDQ with uint in encodings 2.0.0 in onnx QDQ export (92f63f5)
- Allow aimet-onnx to load partial encodings (6636515)
- Fix onnx sim.export permanently removing quantizers (9a2a407)
- + 7 more
Version 2.8.02.8.0
📋 What's Changed
- New Features
- ONNX
- Update aimet_onnx `QuantizationSimModel.__init__` function signature (cbe67ae)
- Defined new AdaRound API `aimet_onnx.apply_adaround` (84edcf5)
- Defined new sequential MSE API `aimet_onnx.apply_seq_mse` (836ab1e)
- Defined new per-layer sensitivity analysis API `aimet_onnx.analyze_per_layer_sensitivity` (dc34fa4)
- Allowed onnx `QuantizationSimModel.compute_encodings` to take iterables (2c8ae88)
- PyTorch
- + 12 more
Version 2.7.02.7.0
📋 What's Changed
- New Features
- PyTorch
- OmniQuant (experimental) - implement OmniQuant PTQ technique (https://arxiv.org/pdf/2308.13137) for Llama and Qwen2 model families
- Bug Fixes and Improvements
- ONNX
- Remove DlCompression, DlEqualization, OpenCV, zlib dependencies
- Support loading encodings for missing quantizers
- Set bitwidth of tensor quantizer while loading encodings
- + 5 more
Version 2.6.02.6.0
📋 What's Changed
- New Features
- ONNX
- Support for passing onnxruntime EPs directly to `QuantizationSimModel.__init__`
- PyTorch
- Support for simulating float8 quantization
- Experimental: Added `aimet_torch.onnx.export` API for exporting `QuantizationSimModel` to onnx QDQ graph
- Bug Fixes and Improvements
- ONNX
- + 13 more
