Support kv quant for llama4 example #2376

mengniwang95 · 2025-12-26T05:39:46Z

User description

Type of Change

example update

Description

kv cache quantization is supported

PR Type

Enhancement

Description

Added support for KV quantization in Llama4 example
Introduced static_kv_dtype argument for key-value cache quantization
Updated scripts to handle static_kv_dtype parameter

Diagram Walkthrough

flowchart LR
  A["Update main.py"] -- "Add static_kv_dtype" --> B["Modify setup_parser"]
  B -- "Pass static_kv_dtype" --> C["Update tune function"]
  C -- "Add static_kv_dtype arg" --> D["Modify run_benchmark.sh"]
  D -- "Handle kv_cache_dtype" --> E["Modify run_quant.sh"]
  E -- "Pass kv_cache_dtype" --> F["Update README.md"]

File Walkthrough

Relevant files

Enhancement

main.py `Enhance argument parsing and KV quantization` examples/pytorch/multimodal-modeling/quantization/auto_round/llama4/main.py Replaced `BasicArgumentParser` with standard `argparse.ArgumentParser` Added `static_kv_dtype` argument for key-value cache quantization Passed `static_kv_dtype` to `tune` function	+59/-24
run_benchmark.sh `Update benchmark script for KV quantization` examples/pytorch/multimodal-modeling/quantization/auto_round/llama4/run_benchmark.sh Added `--static_kv_dtype` parameter parsing Updated `extra_model_args` to include `kv_cache_dtype` Added conditional logic for FP8 KV cache	+12/-2
run_quant.sh `Update quantization script for KV quantization` examples/pytorch/multimodal-modeling/quantization/auto_round/llama4/run_quant.sh Added `--static_kv_dtype` parameter parsing Conditionally added `static_kv_dtype` to `extra_cmd`	+14/-6

Documentation

README.md `Document KV quantization support` examples/pytorch/multimodal-modeling/quantization/auto_round/llama4/README.md Updated documentation to reflect KV quantization support	+1/-0

Signed-off-by: Mengni Wang <mengni.wang@intel.com>

PRAgent4INC · 2025-12-26T05:40:26Z

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
🧪 No relevant tests
🔒 No security concerns identified
⚡ Recommended focus areas for review Typo There is a typo in the help text for the `--scheme` argument. It should be "quantization" instead of "quantizaion". default="MXFP4", type=str, help="quantizaion scheme." Redundant Argument The `reloading=False` argument in the `tune` function call seems redundant unless it is intended to override a default value. Consider removing it if it is not necessary. reloading=False,

PRAgent4INC · 2025-12-26T05:40:42Z

PR Code Suggestions ✨

Support kv quant for llama4 example

39cce0a

Signed-off-by: Mengni Wang <mengni.wang@intel.com>

PRAgent4INC added the Review effort 3/5 label Dec 26, 2025

mengniwang95 requested a review from chensuyue December 26, 2025 05:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support kv quant for llama4 example #2376

Support kv quant for llama4 example #2376

Uh oh!

mengniwang95 commented Dec 26, 2025 •

edited by PRAgent4INC

Loading

Uh oh!

PRAgent4INC commented Dec 26, 2025

Uh oh!

PRAgent4INC commented Dec 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Support kv quant for llama4 example #2376

Are you sure you want to change the base?

Support kv quant for llama4 example #2376

Uh oh!

Conversation

mengniwang95 commented Dec 26, 2025 • edited by PRAgent4INC Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

User description

Type of Change

Description

PR Type

Description

Diagram Walkthrough

File Walkthrough

Uh oh!

PRAgent4INC commented Dec 26, 2025

PR Reviewer Guide 🔍

Uh oh!

PRAgent4INC commented Dec 26, 2025

PR Code Suggestions ✨

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mengniwang95 commented Dec 26, 2025 •

edited by PRAgent4INC

Loading