Support kv quant for llama4 example #2376
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
User description
Type of Change
example update
Description
kv cache quantization is supported
PR Type
Enhancement
Description
Added support for KV quantization in Llama4 example
Introduced
static_kv_dtypeargument for key-value cache quantizationUpdated scripts to handle
static_kv_dtypeparameterDiagram Walkthrough
File Walkthrough
main.py
Enhance argument parsing and KV quantizationexamples/pytorch/multimodal-modeling/quantization/auto_round/llama4/main.py
BasicArgumentParserwith standardargparse.ArgumentParserstatic_kv_dtypeargument for key-value cache quantizationstatic_kv_dtypetotunefunctionrun_benchmark.sh
Update benchmark script for KV quantizationexamples/pytorch/multimodal-modeling/quantization/auto_round/llama4/run_benchmark.sh
--static_kv_dtypeparameter parsingextra_model_argsto includekv_cache_dtyperun_quant.sh
Update quantization script for KV quantizationexamples/pytorch/multimodal-modeling/quantization/auto_round/llama4/run_quant.sh
--static_kv_dtypeparameter parsingstatic_kv_dtypetoextra_cmdREADME.md
Document KV quantization supportexamples/pytorch/multimodal-modeling/quantization/auto_round/llama4/README.md