Skip to content

Releases: microsoft/Olive

Olive-ai 0.11.0

29 Jan 23:00

Choose a tag to compare

Passes

Quantization

CLI

Evaluation

Bug Fixes and other updates

Olive-ai 0.10.1

11 Nov 18:13

Choose a tag to compare

Improvements and Bug Fixes

  • Improve quantization consistency, respect user and mixed-precision overrides, and fix TorchScript export issues with external data loading (#2246)
  • Remove nested model folder in output path for composite model (#2249)
  • Fix bug when saving output models from cache to the specified output path (#2249)

Olive-ai 0.10.0

05 Nov 19:24

Choose a tag to compare

New Features

  • Quark Quantization for ONNX Models (#2236) — New QuarkQuantization pass via olive run with support for int8/uint8/int16/uint16/int32/uint32/bf16/bfp16 and CLE/SmoothQuant/AdaRound/AdaQuant.
  • Embedding Quantization & RTN Improvements (#2238) — Added QuantEmbedding, a composable Rtn pass, and a unified checkpoint format aligned with MatMulNBits/GatherBlockQuantized (block/shape constraints enforced; AutoGPTQ/AutoAWQ export updated to 2D params).
  • Word Embedding Tying Surgery (#2240) — TieWordEmbeddings ties input embeddings and lm_head for both unquantized (Gemm) and quantized (MatMulNBits + GatherBlockQuantized) graphs.
  • Custom ONNX Model Naming (#2235) — Allows specifying a custom ONNX model name in the output directory.
  • Intel OpenVINO Weight Compression Pass (#2180) — Adds NNCF-based weight compression for HF/ONNX models to OpenVINO or compressed ONNX.

Improvements

  • AIMET Enhancements (#2158, #2187, #2215) — Adds Sequential MSE, enables AIMET in quantize CLI, and supports manual precision overrides.
  • GPTQ Updates (#2202, #2203) — Supports user-provided module overrides and transformers >= 4.53.
  • Quantization Export Compatibility (#2218) — Updates checks for ort-genai > 0.9.0 and fixes minor OnnxDAG name clashes.
  • Torch Dynamo Export Alignment (#2185) — extract_adapter recovers folded LoRA and decomposes DORA-fused Gemm to MatMul for quantization.
  • Post-Surgery Deduplication (#2228) — Runs DeduplicateHashedInitializersPass after surgeries to remove duplicate initializers.
  • QNN Execution Provider: GPU Enablement (#2220) — Enables QNN-EP GPU, updates StaticLLM and ContextBinaryGeneration, keeps NPU default.
  • Run API Ergonomics (#2199) — olive.run() now accepts a dict run_config.
  • OpenVINO Config Overrides (#2191) — Allows overriding genai_config.json properties in OV encapsulation.
  • ReplaceAttentionMaskValue Robustness (#2213) — Adds Shape to ALLOWED_CONSUMER_OPS for text-encoder graphs.
  • Implicit Olive Version Tagging (#2183) — Automatically embeds the Olive version in saved ONNX model protos.

Olive-ai 0.9.3

22 Sep 22:31

Choose a tag to compare

New Features:

  • Compatibility with Windows ML for ONNX model inference and evaluation (#2052, #2056, #2059, #2084).
  • Gptq quantization supports lm_head quantization and more generic weight packing (#2137).

Improvements

  • optimize CLI supports WebGPU execution provider (#2076) and NVTensorRtRTX execution provider (#2078).
  • quantize CLI supports Gptq pass as an implementation (#2115).
  • Onnx static quantization supports strided calibration data for lower memory usage (#2086).
  • Extra options can be provided directedly to the ModelBuilder pass (#2107).
  • LMEvaluator has a new ORT backend with IOBinding leading to large speedup in runtime (#2133).
  • OnnxFloatToFloat16 allows more granular control through op_include_list and node_include_list (#2134).
  • AIMET quantization pass: Support for exclude op types (#2055), pre-quantized models (#2111), LLM augmented dataloaders (#2108), LPBQ (#2119), and Adaround (#2140).

Deprecation

As per the deprecation warning in the previous release, the following Azure ML related features have been removed:

  • Azure ML system
  • Azure ML resource types: model, datastore, job outputs.
  • Remote workflow
  • Azure ML artifact packaging

Other removed features include:

  • IsolatedORT System (#2070)
  • Quantization Aware Training (#2089)
  • AppendPrePostProcessingOps pass (#2090)
  • SNPE passes (#2098)

Recipes Migration

All recipes have been migrated to olive-recipes repository.

Olive-ai 0.9.2

07 Aug 17:52

Choose a tag to compare

New Features:

  • Selective Mixed Precision. (#1898)
  • Native GPTQ Implementation with support for Selective Mixed Precision. (#1949)
  • Blockwise RTN Quantization for ONNX models. (#1899)
  • Ability to add custom metadata in ONNX model. (#1900)
  • New simplified olive optimize CLI command and the olive.quantize() Python API for effortless model optimization with minimal developer input. See CLI usage and Python API docs for more details. (#1996)
  • New command line olive run-pass provides advanced users ability to run individual passes. (#1904)

New Integrations

  • GPTQModel. (#1999)
  • AIMET (#2028). This is a work in progress.
  • ONNX model support while targeting OpenVINO. (#2019)
  • QuarkQuantization: AMD Quark quantization for LLMs. (#2010)
  • VitisGenerateModelLLM for optimized LLM model generation for Vitis AI Execution Provider. (#2010)

Improvements

  • New graph surgeries including dla transformers, DecomposeRotaryEmbedding and DecomposeQuickGelu. (#2018, #1972, #2000)
  • Exposed WorkflowOutput in Python API and added unified APIs for CLI commands. (#1907)
  • Refactored Docker system for simplified setup and execution. (#1990)
  • ExtractAdapters:
    • Added support for DORA and LoHA adapters. (#1611)
  • NVMO quantization:
    • Exposed more configurable parameters: nodes_to_exclude, save_external_data, calibration_params, calibration_providers and int4_block_size support. Add RTN algorithm. (#2004, #1985)
  • OnnxPeepholeOptimizer:
    • Removed fuse_transpose_qat and patch_unsupported_argmax_operator. (#1976)

Deprecation

Azure ML will be deprecated in the next release, including:

  • Azure ML system
  • Azure ML workspace model
  • Remote workflow

Recipes Migration

All recipes are being migrated to the olive-recipes repository. New recipes will be added and maintained there going forward.

Olive-ai 0.9.1

16 May 05:59

Choose a tag to compare

Minor release to fix following issues

  • OpenVINO Encapsulation pad_token_id fix (#1847)
  • Add support for Nvidia TensorRT RTX execution provider in Olive (#1852)
  • Basic support for ONNX auto EP selection introduced in onnxruntime v1.22.0 (#1854, #1863)
  • Add Nvidia TensorRT-RTX Olive recipe for vit, clip and bert examples (#1858)
  • gate optimum[openvino] version to <=1.24 (#1864)

Olive-ai 0.9.0

12 May 16:43

Choose a tag to compare

Feature Updates

  • Implement lm-eval-harness based LLM quality evaluator for ONNX GenAI models #1720
  • Update minimum supported target opset for ONNX to 17. #1741
  • QDQ support for ModelBuilder pass #1736
  • Refactor OnnxOpVersionConversion to conditionally use onnxscript version converter #1784
  • HQQ Quantizer Pass #1799, #1835
  • Introducing global definitions for Precision & PrecisionBits #1808
  • Improvements in PeepholeHoleOptimizer #1697, #1698

New Passes

New/Updated Examples

Deprecated Examples

Deprecated Passes

  • InsertBeamSearchOp #1805

Olive-ai 0.8.0

17 Mar 22:14

Choose a tag to compare

New Features (Passes)

  • QuaRot performs offline weight rotation
  • SpinQuant performs offline weight rotation
  • StaticLLM converts dynamic shaped llm into a static shaped llm for NPUs.
  • GraphSurgeries applies surgeries to ONNX model. Surgeries are modular and individually configurable.
  • LoHa, LoKr and DoRA finetuning
  • OnnxQuantizationPreprocess applies quantization preprocessing.
  • EPContextBinaryGenerator creates EP specific context binary onnx models.
  • ComposeOnnxModels composes split onnx models.
  • OnnxIOFloat16ToFloat32 replaced with more generic OnnxIODataTypeConverter

Command Line Interface

New command line tools have been added and existing tools have been improved.

  • generate_config_file option to save the workflow config file.
  • extract-adapters command to extract multiple adapters from a PyTorch model.
  • Simplied quantize command

Improvements

  • Better output model structure for workflow and CLI runs.
    • New no_artifacts options in workflow config to disable saving run artifacts such as footprints.
  • Hf data preprocessing:
    • Dataset is truncated if max_samples is set.
    • Empty text are filtered.
    • padding_side is configurable and defaults to "right".
  • SplitModel pass keeps QDQ nodes together in the same split.
  • OnnxPeepholeOptimizer: constant folding + onnxoptimizer added.
  • CaptureSplitInfo: Separate split for memory intensive module.
  • OnnxConversion:
    • Dynamic shapes for dynamo export.
    • optimize option to perform constant folding and redundancies elimination on dynamo exported model.
  • GPTQ: Default wikitest calibration dataset. Patch to support newer versions of transformers.
  • MatMulNBitsToQDQ: nodes_to_exclude option.
  • SplitModel: split_assignments option to provide custom split assignments.
  • CaptureSplitInfo: block_to_split can be a single block (str) or multiple blocks (list).
  • OnnxMatMul4Quantizer: Support onnxruntime 1.18+
  • OnnxQuantization:
    • Support onnxruntime 1.18+.
    • op_types_to_exclude option.
    • LLMAugmentedDataLoader augments the calibration data for llms with kv cache and other missing inputs.
  • New document theme and organization.
  • Reimplement search logic to include passes in search space.

Examples:

  • New QNN EP examples:
    • SLMs:
      • Phi-3.5
      • Deepseek R1 Distill
      • Llama 3.2
    • MobileNet
    • ResNet
    • CLIP VIT
    • BAAI/bge-small-en-v1.5
    • Table Transformer Detection
    • adetailer
  • Deepseek R1 Distill Finetuning
  • timm MobileNet

Olive-ai 0.7.1.1

14 Nov 19:39

Choose a tag to compare

Same as 0.7.1 with updated dependencies for nvmo extra and NVIDIA TensorRT Model Optimizer example doc.

Refer 0.7.1 Release Notes for other details.

Olive-ai 0.7.1

12 Nov 20:57

Choose a tag to compare

Command Line Interface

New command line tools have been added and existing tools have been improved.

  • olive --help works as expected.
  • auto-opt:
    • The command chooses a set of passes compatible with the provided model type, precision and accelerator information.
    • New options to split a model, either using --num-splits or --cost-model.

Improvements

  • ExtractAdapters:
    • Support lora adapter nodes in Stable Diffusion unet or text-embedding models.
    • Default initializers for quantized adapter to run the model without adapter inputs.
  • GPTQ:
    • Avoid saving unused bias weights (all zeros).
    • Set use_exllama to False by default to allow exporting and fine-tuning external GPTQ checkpoints.
  • AWQ: Patch autoawq to run quantization on newer transformers versions.
  • Atomic SharedCache operations
  • New CaptureSplitInfo and Split passes to split models into components. Number of splits can be user provided or inferred from a cost model.
  • disable_search is deprecated from pass configuration in an olive workflow config.
  • OrtSessionParamsTuning redone to use olive search features.
  • OrtModelOptimizer renamed to OrtPeepholeOptimizer and some bug fixes.

Examples:

  • Stable Diffusion: New MultiLora Example
  • Phi3: New int quantization example using nvidia-modelopt