29 Jan 23:00

shaahji

562e50b

Olive-ai 0.11.0 Latest

Latest

Passes

Quantization

CLI

Evaluation

Add size_on_disk API to OliveModelHandler and corresponding metric

Bug Fixes and other updates

Assets 2

11 Nov 18:13

xiaoyu-work

v0.10.1

32d4645

Olive-ai 0.10.1

Improvements and Bug Fixes

Improve quantization consistency, respect user and mixed-precision overrides, and fix TorchScript export issues with external data loading (#2246)
Remove nested model folder in output path for composite model (#2249)
Fix bug when saving output models from cache to the specified output path (#2249)

Assets 2

05 Nov 19:24

xiaoyu-work

v0.10.0

4705557

Olive-ai 0.10.0

New Features

Quark Quantization for ONNX Models (#2236) — New QuarkQuantization pass via olive run with support for int8/uint8/int16/uint16/int32/uint32/bf16/bfp16 and CLE/SmoothQuant/AdaRound/AdaQuant.
Embedding Quantization & RTN Improvements (#2238) — Added QuantEmbedding, a composable Rtn pass, and a unified checkpoint format aligned with MatMulNBits/GatherBlockQuantized (block/shape constraints enforced; AutoGPTQ/AutoAWQ export updated to 2D params).
Word Embedding Tying Surgery (#2240) — TieWordEmbeddings ties input embeddings and lm_head for both unquantized (Gemm) and quantized (MatMulNBits + GatherBlockQuantized) graphs.
Custom ONNX Model Naming (#2235) — Allows specifying a custom ONNX model name in the output directory.
Intel OpenVINO Weight Compression Pass (#2180) — Adds NNCF-based weight compression for HF/ONNX models to OpenVINO or compressed ONNX.

Improvements

AIMET Enhancements (#2158, #2187, #2215) — Adds Sequential MSE, enables AIMET in quantize CLI, and supports manual precision overrides.
GPTQ Updates (#2202, #2203) — Supports user-provided module overrides and transformers >= 4.53.
Quantization Export Compatibility (#2218) — Updates checks for ort-genai > 0.9.0 and fixes minor OnnxDAG name clashes.
Torch Dynamo Export Alignment (#2185) — extract_adapter recovers folded LoRA and decomposes DORA-fused Gemm to MatMul for quantization.
Post-Surgery Deduplication (#2228) — Runs DeduplicateHashedInitializersPass after surgeries to remove duplicate initializers.
QNN Execution Provider: GPU Enablement (#2220) — Enables QNN-EP GPU, updates StaticLLM and ContextBinaryGeneration, keeps NPU default.
Run API Ergonomics (#2199) — olive.run() now accepts a dict run_config.
OpenVINO Config Overrides (#2191) — Allows overriding genai_config.json properties in OV encapsulation.
ReplaceAttentionMaskValue Robustness (#2213) — Adds Shape to ALLOWED_CONSUMER_OPS for text-encoder graphs.
Implicit Olive Version Tagging (#2183) — Automatically embeds the Olive version in saved ONNX model protos.

Assets 2

22 Sep 22:31

jambayk

v0.9.3

413c6bf

Olive-ai 0.9.3

New Features:

Compatibility with Windows ML for ONNX model inference and evaluation (#2052, #2056, #2059, #2084).
Gptq quantization supports lm_head quantization and more generic weight packing (#2137).

Improvements

optimize CLI supports WebGPU execution provider (#2076) and NVTensorRtRTX execution provider (#2078).
quantize CLI supports Gptq pass as an implementation (#2115).
Onnx static quantization supports strided calibration data for lower memory usage (#2086).
Extra options can be provided directedly to the ModelBuilder pass (#2107).
LMEvaluator has a new ORT backend with IOBinding leading to large speedup in runtime (#2133).
OnnxFloatToFloat16 allows more granular control through op_include_list and node_include_list (#2134).
AIMET quantization pass: Support for exclude op types (#2055), pre-quantized models (#2111), LLM augmented dataloaders (#2108), LPBQ (#2119), and Adaround (#2140).

Deprecation

As per the deprecation warning in the previous release, the following Azure ML related features have been removed:

Azure ML system
Azure ML resource types: model, datastore, job outputs.
Remote workflow
Azure ML artifact packaging

Other removed features include:

IsolatedORT System (#2070)
Quantization Aware Training (#2089)
AppendPrePostProcessingOps pass (#2090)
SNPE passes (#2098)

Recipes Migration

All recipes have been migrated to olive-recipes repository.

Assets 3

07 Aug 17:52

xiaoyu-work

v0.9.2

b2d32b2

Olive-ai 0.9.2

New Features:

Selective Mixed Precision. (#1898)
Native GPTQ Implementation with support for Selective Mixed Precision. (#1949)
Blockwise RTN Quantization for ONNX models. (#1899)
Ability to add custom metadata in ONNX model. (#1900)
New simplified olive optimize CLI command and the olive.quantize() Python API for effortless model optimization with minimal developer input. See CLI usage and Python API docs for more details. (#1996)
New command line olive run-pass provides advanced users ability to run individual passes. (#1904)

New Integrations

GPTQModel. (#1999)
AIMET (#2028). This is a work in progress.
ONNX model support while targeting OpenVINO. (#2019)
QuarkQuantization: AMD Quark quantization for LLMs. (#2010)
VitisGenerateModelLLM for optimized LLM model generation for Vitis AI Execution Provider. (#2010)

Improvements

New graph surgeries including dla transformers, DecomposeRotaryEmbedding and DecomposeQuickGelu. (#2018, #1972, #2000)
Exposed WorkflowOutput in Python API and added unified APIs for CLI commands. (#1907)
Refactored Docker system for simplified setup and execution. (#1990)
ExtractAdapters:
- Added support for DORA and LoHA adapters. (#1611)
NVMO quantization:
- Exposed more configurable parameters: nodes_to_exclude, save_external_data, calibration_params, calibration_providers and int4_block_size support. Add RTN algorithm. (#2004, #1985)
OnnxPeepholeOptimizer:
- Removed fuse_transpose_qat and patch_unsupported_argmax_operator. (#1976)

Deprecation

Azure ML will be deprecated in the next release, including:

Azure ML system
Azure ML workspace model
Remote workflow

Recipes Migration

All recipes are being migrated to the olive-recipes repository. New recipes will be added and maintained there going forward.

Assets 3

16 May 05:59

shaahji

v0.9.1

de1da79

Olive-ai 0.9.1

Minor release to fix following issues

OpenVINO Encapsulation pad_token_id fix (#1847)
Add support for Nvidia TensorRT RTX execution provider in Olive (#1852)
Basic support for ONNX auto EP selection introduced in onnxruntime v1.22.0 (#1854, #1863)
Add Nvidia TensorRT-RTX Olive recipe for vit, clip and bert examples (#1858)
gate optimum[openvino] version to <=1.24 (#1864)

Assets 3

12 May 16:43

shaahji

v0.9.0

4e2f0ec

Olive-ai 0.9.0

Feature Updates

Implement lm-eval-harness based LLM quality evaluator for ONNX GenAI models #1720
Update minimum supported target opset for ONNX to 17. #1741
QDQ support for ModelBuilder pass #1736
Refactor OnnxOpVersionConversion to conditionally use onnxscript version converter #1784
HQQ Quantizer Pass #1799, #1835
Introducing global definitions for Precision & PrecisionBits #1808
Improvements in PeepholeHoleOptimizer #1697, #1698

New Passes

OnnxScriptFusion: ONNX script fusion
OpenVINOEncapsulation, OpenVINOReshape, OpenVINOIoUpdate: OpenVINO encapsulation #1754
TrtMatMulToConvTransform: Convert non-4D MatMul to Transpose-Conv-Transpose sequence
OpenVINOOptimumConversion: Add optimum Intel® pass for converting a Huggingface Model to an OpenVINO Model
Graph Surgeries
- MatMulAddGemm: Graph surgery to transform Add Op followed by Matmul as Gemm op
- PowReduceSumPowDiv2LpNorm: Graph surgery to merge Pow ReduceSum Pow Div pattern to L2Norm
OnnxHqqQuantization: Implements 4-bit HQQ quantization
VitisAIAddMetaData: Adds metadata to an ONNX model based on specified model attributes.

New/Updated Examples

Alibaba-NLP/gte #1695
DeepSeek
- OpenVINO #1786
Google BERT
- QDQ #1701, #1718, #1733, #1797, #1817
- QNN #1764
- VitisAI #1728
Google VIT
- QDQ #1701, #1733, #1797, #1817
- QNN #1701, #1749
- VitisAI #1728
- OpenVINO #1757, #1767
Intel BERT
- QDQ #1797, #1817
- QNN #1749
- OpenVINO #1767, #1768, #1777, #1822
Laion Clip
- QDQ #1701, #1733, #1797
- QNN #1701, #1749
- VitisAI #1728
- OpenVINO #1793
Llama3
- OpenVINO #1786
Meta Llama3
- QDQ #1707
OpenAI Clip (16 and 32)
- QDQ #1701, #1733, #1797, #1817
- QNN #1701, #1764
- VitisAI #1728
- OpenVINO #1793
Phi3.5
- QDQ #1707, #1733, #1817
- VitisAI #1707, #1728
- OpenVINO #1786
Phi4
- OpenVINO #1828
Qwen
- QNN #1699
- OpenVINO #1786, #1828
Resnet50
- QDQ #1701, #1749, #1817
- QNN #1701, #1749
- OpenVINO #1757, #1767, #1786
Sentence Transformers CLIP
- QDQ #1797
- QNN #1694, #1797
Stable Diffusion
- QDQ #1730

Deprecated Examples

Mobilenet QNN #1743
Inception #1743

Deprecated Passes

InsertBeamSearchOp #1805

Assets 3

17 Mar 22:14

jambayk

v0.8.0

6ab9d8b

Olive-ai 0.8.0

New Features (Passes)

QuaRot performs offline weight rotation
SpinQuant performs offline weight rotation
StaticLLM converts dynamic shaped llm into a static shaped llm for NPUs.
GraphSurgeries applies surgeries to ONNX model. Surgeries are modular and individually configurable.
LoHa, LoKr and DoRA finetuning
OnnxQuantizationPreprocess applies quantization preprocessing.
EPContextBinaryGenerator creates EP specific context binary onnx models.
ComposeOnnxModels composes split onnx models.
OnnxIOFloat16ToFloat32 replaced with more generic OnnxIODataTypeConverter

Command Line Interface

New command line tools have been added and existing tools have been improved.

generate_config_file option to save the workflow config file.
extract-adapters command to extract multiple adapters from a PyTorch model.
Simplied quantize command

Improvements

Better output model structure for workflow and CLI runs.
- New no_artifacts options in workflow config to disable saving run artifacts such as footprints.
Hf data preprocessing:
- Dataset is truncated if max_samples is set.
- Empty text are filtered.
- padding_side is configurable and defaults to "right".
SplitModel pass keeps QDQ nodes together in the same split.
OnnxPeepholeOptimizer: constant folding + onnxoptimizer added.
CaptureSplitInfo: Separate split for memory intensive module.
OnnxConversion:
- Dynamic shapes for dynamo export.
- optimize option to perform constant folding and redundancies elimination on dynamo exported model.
GPTQ: Default wikitest calibration dataset. Patch to support newer versions of transformers.
MatMulNBitsToQDQ: nodes_to_exclude option.
SplitModel: split_assignments option to provide custom split assignments.
CaptureSplitInfo: block_to_split can be a single block (str) or multiple blocks (list).
OnnxMatMul4Quantizer: Support onnxruntime 1.18+
OnnxQuantization:
- Support onnxruntime 1.18+.
- op_types_to_exclude option.
- LLMAugmentedDataLoader augments the calibration data for llms with kv cache and other missing inputs.
New document theme and organization.
Reimplement search logic to include passes in search space.

Examples:

New QNN EP examples:
- SLMs:
  - Phi-3.5
  - Deepseek R1 Distill
  - Llama 3.2
- MobileNet
- ResNet
- CLIP VIT
- BAAI/bge-small-en-v1.5
- Table Transformer Detection
- adetailer
Deepseek R1 Distill Finetuning
timm MobileNet

Assets 3

14 Nov 19:39

jambayk

v0.7.1.1

a2d32aa

Olive-ai 0.7.1.1

Same as 0.7.1 with updated dependencies for nvmo extra and NVIDIA TensorRT Model Optimizer example doc.

Refer 0.7.1 Release Notes for other details.

Assets 3

12 Nov 20:57

jambayk

v.0.7.1

9885cee

Olive-ai 0.7.1

Command Line Interface

New command line tools have been added and existing tools have been improved.

olive --help works as expected.
auto-opt:
- The command chooses a set of passes compatible with the provided model type, precision and accelerator information.
- New options to split a model, either using --num-splits or --cost-model.

Improvements

ExtractAdapters:
- Support lora adapter nodes in Stable Diffusion unet or text-embedding models.
- Default initializers for quantized adapter to run the model without adapter inputs.
GPTQ:
- Avoid saving unused bias weights (all zeros).
- Set use_exllama to False by default to allow exporting and fine-tuning external GPTQ checkpoints.
AWQ: Patch autoawq to run quantization on newer transformers versions.
Atomic SharedCache operations
New CaptureSplitInfo and Split passes to split models into components. Number of splits can be user provided or inferred from a cost model.
disable_search is deprecated from pass configuration in an olive workflow config.
OrtSessionParamsTuning redone to use olive search features.
OrtModelOptimizer renamed to OrtPeepholeOptimizer and some bug fixes.

Examples:

Stable Diffusion: New MultiLora Example
Phi3: New int quantization example using nvidia-modelopt

Assets 3

Releases: microsoft/Olive

Olive-ai 0.11.0

Passes

Quantization

CLI

Evaluation

Bug Fixes and other updates

Uh oh!

Olive-ai 0.10.1

Improvements and Bug Fixes

Uh oh!

Olive-ai 0.10.0

New Features

Improvements

Uh oh!

Olive-ai 0.9.3

New Features:

Improvements

Deprecation

Recipes Migration

Uh oh!

Olive-ai 0.9.2

New Features:

New Integrations

Improvements

Deprecation

Recipes Migration

Uh oh!

Olive-ai 0.9.1

Minor release to fix following issues

Uh oh!

Olive-ai 0.9.0

Feature Updates

New Passes

New/Updated Examples

Deprecated Examples

Deprecated Passes

Uh oh!

Olive-ai 0.8.0

New Features (Passes)

Command Line Interface

Improvements

Examples:

Uh oh!

Olive-ai 0.7.1.1

Uh oh!

Olive-ai 0.7.1

Command Line Interface

Improvements

Examples:

Uh oh!