GitPedia
qualcomm

qualcomm/aimet

AIMET is a library that provides advanced quantization and compression techniques for trained neural network models.

30 Releases
Latest: 1w ago
Version 2.32.12.32.1Latest
aimet-botaimet-bot·1w ago·June 4, 2026
GitHub

**Full Changelog**: https://github.com/qualcomm/aimet/compare/2.32.0...2.32.1

Version 2.32.02.32.0
aimet-botaimet-bot·1w ago·June 3, 2026
GitHub

📋 Changes

  • Bug fixes and Improvements
  • ONNX
  • Add C++ support for bfloat16 quantization (ca7d3e01b)
  • Fix large model support with protobuf 7.x (9ef22519a)
  • Skip QDQ pair scale/zp in duplicate_shared_initializers (05e8332ce)
  • Handle Identity passthrough in duplicate_shared_initializers (1b27d9841)
  • Fix SpinQuant embed_tokens filter to exclude non-embedding Gathers (81b80411a)
  • Inline fused supergroups after encoding propagation (68fdcb673)
  • + 4 more
Version 2.31.02.31.0
bhushan23bhushan23·3w ago·May 20, 2026
GitHub

📋 Changes

  • New Features
  • ONNX
  • Support Qwen 3VL in AdaScale ONNX (35d2440db)
  • Torch
  • Add Gemma 3 support for AdaScale (a2da0de9c)
  • LoRA integration (0b90d8a4f)
  • Removed Features
  • Torch
  • + 17 more
Version 2.30.02.30.0
aimetciaimetci·1mo ago·May 5, 2026
GitHub

📋 Changes

  • New Features
  • ONNX
  • Extend SpinQuant support for Vision-Language Models (VLM) (e5cd62847)
  • Torch
  • Remove legacy aimet_torch v1. v2 is now the sole API (4192da749, 00e3c7220)
  • Bug fixes and Improvements
  • ONNX
  • Improve set_and_freeze_param_encodings (39bf1f69b, d320cae7e)
  • + 7 more
Version 2.29.02.29.0
aimetciaimetci·1mo ago·April 20, 2026
GitHub

📋 Changes

  • New Features
  • ONNX
  • Add support for Qwen 2.5 VL in aimet-onnx (f25668610)
  • Torch
  • Support OOTB quantization of nn.MultiHeadAttention (4d19f470f)
  • Support OOTB quantization of Qwen 3.5 normalization layers (01b912f65)
  • Support OOTB quantization of InternVL GELU (c5f65b782)
  • Bug fixes and Improvements
  • + 11 more
Version 2.28.02.28.0
aimetciaimetci·2mo ago·April 6, 2026
GitHub

📋 Changes

  • New Features
  • Torch
  • Add resumable checkpointing for AdaScale optimization (20ecb0a)
  • Common
  • Migrate pybind11 bindings to Cython using Python's Stable ABI to enable Python-version-independent wheels (0d6f856)
  • Bug fixes and Improvements
  • Torch
  • Fix rescale encodings not propagating with shared scale values (d9f3a90)
  • + 2 more
Version 2.27.02.27.0
aimetciaimetci·2mo ago·March 26, 2026
GitHub

📋 Changes

  • Bug fixes and Improvements
  • ONNX
  • Add `force_activation_as` option to export APIs to control activation signedness (3583462)
  • Torch
  • Reduce quantize-dequantize latency overhead (9ca3bf4, 525e993, b3de9a2)
  • Optimize inference speed for GenAITests models (cacd5cc, b6ea5bd, 30ab60a)
  • Allow checkpointing and loading during SeqMSE optimization (4eb97f0)
  • Fix SeqMSE error when model contains unquantized Conv/Linear layers (3dd4ca9)
  • + 4 more
Version 2.26.02.26.0
aimetciaimetci·3mo ago·March 9, 2026
GitHub

📋 Changes

  • Bug fixes and Improvements
  • ONNX
  • Implement onnxscript RMSNorm fusion for improved graph optimization (68710d9)
  • Propagate encoding through Concat during ONNX QDQ export (4811a34)
  • Export scale & offset as initializers instead of Constants in ONNX QDQ export (ea9a619)
  • Fix AdaScale (aimet-onnx) for Qwen3 models (beac8f8)
  • Fix BN fold for YOLO models (bae9953)
  • Torch
  • + 20 more
Version 2.25.12.25.1
aimetciaimetci·3mo ago·March 3, 2026
GitHub

📋 Changes

  • Bug fixes and Improvements
  • ONNX
  • Fix for encoding propagation for concat layers (5084af3)
  • Torch
  • Fix to reduce GPU RAM usage for AdaScale for Qwen 3 VL model (ee3d193)
Version 2.25.02.25.0
aimetciaimetci·3mo ago·February 25, 2026
GitHub

📋 Changes

  • Bug fixes and Improvements
  • ONNX
  • Reduced peak CPU memory usage for AdaScale and SeqMSE techniques (28f89a7)
  • Reduced peak CUDA memory usage for AdaScale technique (a29f44f)
  • Added support for Qwen3 VL models in GenAITests (c014961)
  • ONNX-IR based supergroup pattern detection and replacement (9972c1b)
  • Tie concat and interpolation ops by default (a8ac6f4)
  • Torch
  • + 7 more
Version 2.24.02.24.0
aimetciaimetci·4mo ago·February 10, 2026
GitHub

📋 Changes

  • Bug fixes and Improvements
  • ONNX
  • Add Windows ARM64 wheel build/test support, distribute Windows ARM64 wheel on GitHub releases (1390b96)
  • Add transpose MatMul support in Sequential MSE (ff7a284)
  • Torch
  • Expose block-level AdaScale API (72246db)
  • Improve numerical stability of zero point shifting ([-1.5, -.5, .5, 1.5]) implementation (489f7df)
  • Fix :func:`replace_lora_layers_with_quantizable_layers` to inherit train/eval flag (af5a82d)
  • + 6 more
Version 2.23.02.23.0
aimetciaimetci·4mo ago·January 28, 2026
GitHub

📋 Changes

  • Bug fixes and Improvements
  • ONNX
  • Disable per-channel quantization for ConvTranspose ops (9395e32)
  • New top level API for configuring parameter quantization type (a1c197d)
  • Torch
  • Enable Torch Dynamo ONNX export (59e0125)
  • Common
  • Enable per-channel matmul quantization in config files (7137849)
  • + 2 more
Version 2.22.02.22.0
aimetciaimetci·5mo ago·January 13, 2026
GitHub

📋 Changes

  • Bug fixes and Improvements
  • ONNX
  • Allow loading 2.0.0 encoding format to sim (e8cb098)
  • Fix Cast unpacking error (6761a19)
  • Enable exporting non-LPBQ encodings with zero_point shift (7b3cc4c)
  • Implement aimet-onnx LPBQEncoding (5ad7ea6)
  • Common
  • Support exporting 1x1 Conv LPBQ to ONNX QDQ (58ce71d)
Version 2.21.02.21.0
aimetciaimetci·6mo ago·December 15, 2025
GitHub

📋 Changes

  • Bug fixes and Improvements
  • ONNX
  • Fix IndexError when Conv or Linear layers are reused in the model (65c4b3b)
  • Add optional argument `export_int32_bias` to aimet-onnx export (3b8e0f0)
  • Unpin PyTorch version in aimet-onnx (d99b6c4)
  • Align NaN handling with ORT CPU Execution Provider (e4c49eb)
  • Fix quantization axis handling for transposed MatMul operations (6ca06d6)
  • PyTorch
  • + 1 more
Version 2.20.02.20.0
aimetciaimetci·6mo ago·December 2, 2025
GitHub

📋 Changes

  • Bug fixes and Improvements
  • Common
  • Update supported python version to >=3.10 ([2bc8c94](https://github.com/quic/aimet/commit/2bc8c94fcced5ceff790f2c8a0b8347ee42f0be1))
  • Repackage aimet_common as alias to aimet_onnx.common or aimet_torch.common ([074e85f](https://github.com/quic/aimet/commit/074e85fd15b92c2b65b03059374a5272f07bdeb5))
  • Remove Pad op from data movement ops ([21cddb6](https://github.com/quic/aimet/commit/21cddb68889e3d01843de8744e8493f6daa3db28))
  • ONNX
  • Export data movement op output encoding in sim.export by default ([550c029](https://github.com/quic/aimet/commit/550c0291d074626e555db6b6a5fa3239f333787e))
  • Assign generic node names if node name is missing or duplicate ([273dd82](https://github.com/quic/aimet/commit/273dd8202489205ff39d20d52a227053ee6cd2e6))
  • + 18 more
Version 2.19.02.19.0
aimetciaimetci·6mo ago·November 19, 2025
GitHub

📋 Changes

  • New Features
  • Bug fixes and Improvements
  • ONNX
  • Make LiteMP API percentage float (69f96ff)
  • Set layernorm int16 weight to symmetric by default (8560e13)
  • Automatically insert data movement op output qdq during to_onnx_qdq (15c8b9b)
  • Create LazyExtractor to handle external data for onnx Extractor utils (104e7e8)
  • Tie input/output encodings across maximum Concat subgraph (832ea91)
  • + 6 more
Version 2.18.02.18.0
aimetciaimetci·7mo ago·November 6, 2025
GitHub

📋 Changes

  • New Features
  • Torch
  • Promoted aimettorch.onnx.export and QuantizationSimModel.onnx.export as production APIs (99160d2, e026fd1)
  • Added utility functions to exclude some or all unknown nn.Modules from quantization (5a419f3, 501eebd)
  • Bug fixes and Improvements
  • ONNX
  • Fixed supergroup misidentification bug upon MatMul-MatMul-Add sequence (ab63866)
  • Torch
  • + 4 more
Version 2.17.02.17.0
aimetciaimetci·7mo ago·October 20, 2025
GitHub

📋 Changes

  • Bug fixes and Improvements
  • ONNX
  • Optimize SeqMSE latency and CPU memory usage (434ac6b)
  • Support excluding nodes from SeqMSE optimization (6a37239)
  • Support exporting large models (> 2GB) to ONNX QDQ (b1dafe6, 1bf8b82)
  • Support exporting float16 ONNX models to ONNX QDQ (66ccb45)
  • Allow disabling MatMul-Add supergroup via config file (e49660c)
  • Fix bug where on-disk tensor data is deleted before InferenceSession (d57a934)
  • + 7 more
Version 2.16.02.16.0
aimetciaimetci·8mo ago·October 7, 2025
GitHub

📋 Changes

  • ONNX
  • Experimental - Added Adascale, a post-training quantization technique ([5e23ceb](https://github.com/quic/aimet/commit/5e23cebea551c074f7a380ef2f385fd95433bb53))
  • ONNX
  • Skip tying Concat input/output quantizers with conflicting encoding constraints ([b924107](https://github.com/quic/aimet/commit/b9241073256c4a455426451efbc1f3d0672e37b2))
  • Small updates to FPT Quant for improved accuracy ([ba10947](https://github.com/quic/aimet/commit/ba10947bdbdecdf2980f076560453991c3888e77))
  • Implement partial encoding freezing mechanism in aimet-onnx ([658ec3c](https://github.com/quic/aimet/commit/658ec3c20be379b582321171e28f92e8fab1102b))
  • Add Relu partial encoding constraints to HTP config files ([dc8d978](https://github.com/quic/aimet/commit/dc8d978f672e5a93ecb5c8de64017ccaf949d2bf))
  • Clear encoding analyzer stats after computing param encodings ([3d4725f](https://github.com/quic/aimet/commit/3d4725fc172bffeadd87ee993b7a30e5d51691b2))
  • + 5 more
Version 2.15.12.15.1
aimetciaimetci·8mo ago·September 27, 2025
GitHub

📋 Changes

  • ONNX
  • Experimental - Added Adascale, a post-training quantization technique ([5e23ceb](https://github.com/quic/aimet/commit/5e23cebea551c074f7a380ef2f385fd95433bb53))
Version 2.15.02.15.0
aimetciaimetci·8mo ago·September 22, 2025
GitHub

📋 Changes

  • Bug fixes and Improvements
  • ONNX
  • Throws an error on `bfloat16` models (5181860)
  • Added docs and examples for LiteMP (3d5e0dd)
  • Export to QDQ ONNX with pre-quantized constants (a97354f)
  • PyTorch
  • Fix multiple dispatch issue when torch function is called in nested context manager (6216ca0)
  • Keras
  • + 3 more
Version 2.14.02.14.0
aimetciaimetci·9mo ago·September 8, 2025
GitHub

📋 Changes

  • New Feature
  • ONNX
  • Add support for FP16 in `QuantizationSimModel` (2494d90)
  • Bug fixes and Improvements
  • ONNX
  • Add sequential MSE support for ``onnx >= 1.18.0``. (754d030)
  • Improve histogram granularity during TFE calibration (91109af)
  • Improve runtime for `QuantizationSimModel` creation for large models like LLMs (f7e700f)
  • + 9 more
Version 2.13.02.13.0
aimetciaimetci·9mo ago·August 26, 2025
GitHub

📋 Changes

  • Bug fixes and Improvements
  • ONNX
  • Adjust weight scale for int32 bias overflow in W16A16 quantization (f39c0bf)
  • AutoQuant: Remove deprecated feature (414cdde)
  • Support exporting large models in aimet-onnx (0fe6701)
  • AdaRound: Delete deprecated top-level API. (bfba557)
  • AdaRound: Skip optimization if no input to layer (18dfedc)
  • PyTorch
  • + 4 more
Version 2.12.02.12.0
aimetciaimetci·10mo ago·August 13, 2025
GitHub

📋 Changes

  • Bug fixes and Improvements
  • Common
  • Remove data movement ops from config (ae02aa8)
  • ONNX
  • Exclude bias from quantization when weights are not quantized (62f5879)
  • AdaRound: Fix prelu failing in CUDA model (b2350b2)
  • PyTorch
  • Wrap aimet_torch.onnx.export with torch.no_grad (b73bb71)
  • + 3 more
Version 2.11.02.11.0
aimetciaimetci·10mo ago·July 29, 2025
GitHub

📋 Changes

  • New Feature
  • PyTorch
  • SpinQuant (experimental) - implement SpinQuant PTQ technique (https://arxiv.org/pdf/2308.13137) for Llama, Qwen2, and Mistral families (R1 rotation w/o optimization) (7364b37)
  • Enable Adascale and Omniquant for Mistral (d33e98c)
  • ONNX
  • Enable llm_configurator for Llama (Experimental) (08c17b8)
  • Bug fixes and Improvements
  • Common
  • + 22 more
Version 2.10.02.10.0
aimetciaimetci·11mo ago·July 14, 2025
GitHub

📋 What's Changed

  • New Feature
  • Promote to_onnx_qdq to a public API (f333188). Note: This is currently a beta feature
  • Bug fixes and Improvements
  • Common
  • Added hover tooltip to plot per layer sensitivity. Changed x-axis to plot layer indices instead of names (c96894f)
  • PyTorch
  • Implement scaling factor in aimet-torch float QDQ (9b8c655)
  • Fix CustomSiLU bug (499df9f)
  • + 20 more
Version 2.9.02.9.0
aimetciaimetci·11mo ago·July 1, 2025
GitHub

📋 What's Changed

  • Bug Fixes and Improvements
  • ONNX
  • Rename QuantizeLinear outputs from <...>_int to <...>_q in onnx QDQ export (e78dbec)
  • Preserve I/O names in onnx QDQ export (35ad990)
  • Allow freezing loaded encodings in load_encodings_to_sim (911af75)
  • Represent activation QDQ with uint in encodings 2.0.0 in onnx QDQ export (92f63f5)
  • Allow aimet-onnx to load partial encodings (6636515)
  • Fix onnx sim.export permanently removing quantizers (9a2a407)
  • + 7 more
Version 2.8.02.8.0
aimetciaimetci·12mo ago·June 18, 2025
GitHub

📋 What's Changed

  • New Features
  • ONNX
  • Update aimet_onnx `QuantizationSimModel.__init__` function signature (cbe67ae)
  • Defined new AdaRound API `aimet_onnx.apply_adaround` (84edcf5)
  • Defined new sequential MSE API `aimet_onnx.apply_seq_mse` (836ab1e)
  • Defined new per-layer sensitivity analysis API `aimet_onnx.analyze_per_layer_sensitivity` (dc34fa4)
  • Allowed onnx `QuantizationSimModel.compute_encodings` to take iterables (2c8ae88)
  • PyTorch
  • + 12 more
Version 2.7.02.7.0
aimetciaimetci·1y ago·June 2, 2025
GitHub

📋 What's Changed

  • New Features
  • PyTorch
  • OmniQuant (experimental) - implement OmniQuant PTQ technique (https://arxiv.org/pdf/2308.13137) for Llama and Qwen2 model families
  • Bug Fixes and Improvements
  • ONNX
  • Remove DlCompression, DlEqualization, OpenCV, zlib dependencies
  • Support loading encodings for missing quantizers
  • Set bitwidth of tensor quantizer while loading encodings
  • + 5 more
Version 2.6.02.6.0
aimetciaimetci·1y ago·May 16, 2025
GitHub

📋 What's Changed

  • New Features
  • ONNX
  • Support for passing onnxruntime EPs directly to `QuantizationSimModel.__init__`
  • PyTorch
  • Support for simulating float8 quantization
  • Experimental: Added `aimet_torch.onnx.export` API for exporting `QuantizationSimModel` to onnx QDQ graph
  • Bug Fixes and Improvements
  • ONNX
  • + 13 more