Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] The accuracy of the A16W16 quantized model is very poor if per_channel is True #21000

Closed
duanshengliu opened this issue Jun 11, 2024 · 3 comments
Assignees
Labels
quantization issues related to quantization

Comments

@duanshengliu
Copy link
Contributor

duanshengliu commented Jun 11, 2024

Describe the issue

I am using quantize_static for quantization, and I found that when per_channel=True and both the weight and activation types are INT16, the accuracy of the quantized model has a significant drop, but it is normal when per_channel=False.

To reproduce

The issue can be reproduced by using the relevant files in demo_1.zip. The reproduction commands and results are as follows,

  • per_channel=True, A16W16:
    python run.py --per_channel --weight_type int16 --activation_type int16 --input_model mobilenetv2-7-infer.onnx --output_model mobilenetv2-7.quant.onnx --calibrate_dataset ./test_images/

    cosine similarity: 0.546422
    mean absolute error: 1.7504646

  • per_channel=False, A16W16:
    python run.py --weight_type int16 --activation_type int16 --input_model mobilenetv2-7-infer.onnx --output_model mobilenetv2-7.quant.onnx --calibrate_dataset ./test_images/

    cosine similarity: 0.99867153 ✔️
    mean absolute error: 0.038529985 ✔️

In addition, I compared the case of A8W8, and the above issue did not occur.

Summary:

cosine similarity A16W16 A8W8
per_channel=True ✔️
per_channel=False ✔️ ✔️

Urgency

No response

Platform

Linux

OS Version

Ubuntu 22.04

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.18.0

ONNX Runtime API

Python

Architecture

X64

Execution Provider

Default CPU

Execution Provider Library Version

No response

@github-actions github-actions bot added the quantization issues related to quantization label Jun 11, 2024
@duanshengliu
Copy link
Contributor Author

@yihonglyu
According to your suggestion, I tried to set reduce_range=True, and the accuracy did improve greatly. The results are as follows :
cosine similarity: 0.9986828
mean absolute error: 0.03800502

So why does this work?

@yihonglyu
Copy link
Contributor

When executing A16W16 per_channel, a warning is generated:

C:\Users\yilyu\AppData\Local\miniconda3\envs\21000\lib\site-packages\onnxruntime\quantization\base_quantizer.py:232: RuntimeWarning: invalid value encountered in cast

This warning suggests that the scale might be too small to fit into the int16 range. If the accuracy is within acceptable limits, it could be beneficial to avoid using per_channel."

@duanshengliu
Copy link
Contributor Author

Thanks, I get it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
quantization issues related to quantization
Projects
None yet
Development

No branches or pull requests

2 participants