Skip to content

Add support for Quark onnx quantization#2236

Merged
xiaoyu-work merged 13 commits intomicrosoft:mainfrom
gengxinwu:integrate-quark-onnx
Nov 3, 2025
Merged

Add support for Quark onnx quantization#2236
xiaoyu-work merged 13 commits intomicrosoft:mainfrom
gengxinwu:integrate-quark-onnx

Conversation

@gengxinwu
Copy link
Contributor

@gengxinwu gengxinwu commented Oct 31, 2025

Describe your changes

What it does:

  1. The QuarkQuantization pass can also quantize ONNX format models through olive run interface
  2. Support int8/uint8/int16/uint16/int32/uint32/bf16/bfp16 quantization data types
  3. Support commonly used algorithms, like CLE, SmoothQuant, AdaRound and AdaQuant

What is next:

  1. Add integration with olive quantize interface
  2. Integrate higher versions of amd-quark
  3. Support more data formats, including MX formats
  4. Support more algorithms, like GPTQ, QuaRot ...

Checklist before requesting a review

  • Add unit tests for this change.
  • Make sure all tests can pass.
  • Update documents if necessary.
  • Lint and apply fixes to your code by running lintrunner -a
  • Is this a user-facing change? If yes, give a description of this change to be included in the release notes.

(Optional) Issue link

@xiaoyu-work xiaoyu-work requested a review from Copilot November 3, 2025 18:18
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds ONNX model quantization support to the Quark quantizer pass. The existing QuarkQuantization pass only supported HuggingFace models; now it supports both ONNX and HuggingFace models.

Key changes:

  • Extended QuarkQuantization pass to handle ONNXModelHandler in addition to HfModelHandler
  • Added new ONNX-specific quantization logic and configuration preparation utilities
  • Included test coverage for the new ONNX quantization functionality

Reviewed Changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
test/requirements-test.txt Added amd-quark dependency version 0.10
test/passes/quark_quantizer/init.py Created package initialization file for quark quantizer tests
test/passes/quark_quantizer/test_quark_onnx_quantization.py Added test case for static QDQ U8S8 quantization
olive/passes/quark_quantizer/quark_quantization.py Extended pass to support ONNX models with new configuration parameters and _run_quark_onnx method
olive/passes/quark_quantizer/onnx/init.py Created package initialization file for ONNX quantizer
olive/passes/quark_quantizer/onnx/quantize_quark.py Implemented ONNX model quantization using Quark's ModelQuantizer
olive/passes/quark_quantizer/onnx/configuration_preparation.py Added configuration mapping utilities for converting dictionaries to Quark ONNX config objects

Comment on lines +197 to +201
else:
# TODO(Gengxin): Configure the rest algorithms
pass


Copy link

Copilot AI Nov 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This TODO comment indicates incomplete implementation. The update_algo_config function handles only AdaRoundConfig, AdaQuantConfig, CLEConfig, and SmoothQuantConfig, but the algorithm_mapping dictionary includes GPTQConfig, AutoMixprecisionConfig, and QuarotConfig which are not configured. Consider either implementing the missing algorithm configurations or documenting which algorithms are intentionally not yet supported.

Suggested change
else:
# TODO(Gengxin): Configure the rest algorithms
pass
elif isinstance(algo_config, GPTQConfig):
# TODO: Implement configuration for GPTQConfig if/when fields are known
logger.warning("GPTQConfig configuration is not yet supported. Using default values.")
elif isinstance(algo_config, AutoMixprecisionConfig):
# TODO: Implement configuration for AutoMixprecisionConfig if/when fields are known
logger.warning("AutoMixprecisionConfig configuration is not yet supported. Using default values.")
elif isinstance(algo_config, QuarotConfig):
# TODO: Implement configuration for QuarotConfig if/when fields are known
logger.warning("QuarotConfig configuration is not yet supported. Using default values.")
else:
logger.warning("Unknown algorithm config type: %s. No configuration applied.", type(algo_config).__name__)

Copilot uses AI. Check for mistakes.
"exclude": PassConfigParam(
type_=dict,
default_value=None,
description="List of nodes or subgraphs excluded from quantization. Default is None.",
Copy link

Copilot AI Nov 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The description states 'List of nodes or subgraphs' but the type is declared as dict. This is inconsistent. Either update the description to match the dict type (e.g., 'Dictionary defining nodes or subgraphs excluded from quantization') or change the type to list if it should actually be a list.

Suggested change
description="List of nodes or subgraphs excluded from quantization. Default is None.",
description="Dictionary defining nodes or subgraphs excluded from quantization. Default is None.",

Copilot uses AI. Check for mistakes.
@xiaoyu-work xiaoyu-work merged commit 2df25a7 into microsoft:main Nov 3, 2025
17 checks passed
skywall pushed a commit to NXP/eiq-olive that referenced this pull request Mar 10, 2026
Merge in AITEC/eiq-olive from feature/EITO-565-rebase-to-newest-version-of-olive-0.9.3 to main

* commit 'fd44fa6a51e382d59a88e4fceec49042b7e2caa5': (370 commits)
  ruff safe fixes
  update rebased on badge readme
  fix import
  things I've missed during rebasing
  ruff
  Revert "ruff stuff"
  ruff stuff
  Bump up version to 0.10.1
  Fix cache output model name bug (microsoft#2249)
  HfModelHandler: Check for tokenizer_config.json instead of try/else (microsoft#2247)
  Quantization: Keep embeddings tied in SelectiveMixedPrecision, Clean overrides (microsoft#2246)
  TieWordEmbeddings: return model when no tieing detected (microsoft#2242)
  Static Quantization: Always patch `MinMaxCalibrator` (microsoft#2241)
  Release branch 0.10.0
  Add custom onnx model name support for output dir (microsoft#2235)
  TieWordEmbeddings: unquantized and quantized support (microsoft#2240)
  Quantization: Embeddings quantization, new packing format, Rtn quantizer (microsoft#2238)
  Add support for Quark onnx quantization (microsoft#2236)
  Spelling fixes (microsoft#2234)
  LLMAugmentedDataLoader: No decode phase for non-GQA model (microsoft#2204)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants