Add support for Quark onnx quantization#2236
Conversation
There was a problem hiding this comment.
Pull Request Overview
This PR adds ONNX model quantization support to the Quark quantizer pass. The existing QuarkQuantization pass only supported HuggingFace models; now it supports both ONNX and HuggingFace models.
Key changes:
- Extended QuarkQuantization pass to handle ONNXModelHandler in addition to HfModelHandler
- Added new ONNX-specific quantization logic and configuration preparation utilities
- Included test coverage for the new ONNX quantization functionality
Reviewed Changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| test/requirements-test.txt | Added amd-quark dependency version 0.10 |
| test/passes/quark_quantizer/init.py | Created package initialization file for quark quantizer tests |
| test/passes/quark_quantizer/test_quark_onnx_quantization.py | Added test case for static QDQ U8S8 quantization |
| olive/passes/quark_quantizer/quark_quantization.py | Extended pass to support ONNX models with new configuration parameters and _run_quark_onnx method |
| olive/passes/quark_quantizer/onnx/init.py | Created package initialization file for ONNX quantizer |
| olive/passes/quark_quantizer/onnx/quantize_quark.py | Implemented ONNX model quantization using Quark's ModelQuantizer |
| olive/passes/quark_quantizer/onnx/configuration_preparation.py | Added configuration mapping utilities for converting dictionaries to Quark ONNX config objects |
| else: | ||
| # TODO(Gengxin): Configure the rest algorithms | ||
| pass | ||
|
|
||
|
|
There was a problem hiding this comment.
This TODO comment indicates incomplete implementation. The update_algo_config function handles only AdaRoundConfig, AdaQuantConfig, CLEConfig, and SmoothQuantConfig, but the algorithm_mapping dictionary includes GPTQConfig, AutoMixprecisionConfig, and QuarotConfig which are not configured. Consider either implementing the missing algorithm configurations or documenting which algorithms are intentionally not yet supported.
| else: | |
| # TODO(Gengxin): Configure the rest algorithms | |
| pass | |
| elif isinstance(algo_config, GPTQConfig): | |
| # TODO: Implement configuration for GPTQConfig if/when fields are known | |
| logger.warning("GPTQConfig configuration is not yet supported. Using default values.") | |
| elif isinstance(algo_config, AutoMixprecisionConfig): | |
| # TODO: Implement configuration for AutoMixprecisionConfig if/when fields are known | |
| logger.warning("AutoMixprecisionConfig configuration is not yet supported. Using default values.") | |
| elif isinstance(algo_config, QuarotConfig): | |
| # TODO: Implement configuration for QuarotConfig if/when fields are known | |
| logger.warning("QuarotConfig configuration is not yet supported. Using default values.") | |
| else: | |
| logger.warning("Unknown algorithm config type: %s. No configuration applied.", type(algo_config).__name__) |
| "exclude": PassConfigParam( | ||
| type_=dict, | ||
| default_value=None, | ||
| description="List of nodes or subgraphs excluded from quantization. Default is None.", |
There was a problem hiding this comment.
The description states 'List of nodes or subgraphs' but the type is declared as dict. This is inconsistent. Either update the description to match the dict type (e.g., 'Dictionary defining nodes or subgraphs excluded from quantization') or change the type to list if it should actually be a list.
| description="List of nodes or subgraphs excluded from quantization. Default is None.", | |
| description="Dictionary defining nodes or subgraphs excluded from quantization. Default is None.", |
Merge in AITEC/eiq-olive from feature/EITO-565-rebase-to-newest-version-of-olive-0.9.3 to main * commit 'fd44fa6a51e382d59a88e4fceec49042b7e2caa5': (370 commits) ruff safe fixes update rebased on badge readme fix import things I've missed during rebasing ruff Revert "ruff stuff" ruff stuff Bump up version to 0.10.1 Fix cache output model name bug (microsoft#2249) HfModelHandler: Check for tokenizer_config.json instead of try/else (microsoft#2247) Quantization: Keep embeddings tied in SelectiveMixedPrecision, Clean overrides (microsoft#2246) TieWordEmbeddings: return model when no tieing detected (microsoft#2242) Static Quantization: Always patch `MinMaxCalibrator` (microsoft#2241) Release branch 0.10.0 Add custom onnx model name support for output dir (microsoft#2235) TieWordEmbeddings: unquantized and quantized support (microsoft#2240) Quantization: Embeddings quantization, new packing format, Rtn quantizer (microsoft#2238) Add support for Quark onnx quantization (microsoft#2236) Spelling fixes (microsoft#2234) LLMAugmentedDataLoader: No decode phase for non-GQA model (microsoft#2204) ...
Describe your changes
What it does:
olive runinterfaceWhat is next:
olive quantizeinterfaceChecklist before requesting a review
lintrunner -a(Optional) Issue link