You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was wondering if there any optimized kernels for 4-bit quantization in whisper.cpp?
Context: I want to test out HQQ with 4-bit quantized weights in the whisper.cpp repository and I am wondering how i need to interpret the results. Is there room for improvement in terms of speed or not?
"The TorchAO kernel is optimized to speed up GEMV operations with 4-bit quantized weights."
source: https://mobiusml.github.io/whisper-static-cache-blog/
I was wondering if there any optimized kernels for 4-bit quantization in whisper.cpp?
Context: I want to test out HQQ with 4-bit quantized weights in the whisper.cpp repository and I am wondering how i need to interpret the results. Is there room for improvement in terms of speed or not?
Additional references:
https://github.com/mobiusml/hqq
The text was updated successfully, but these errors were encountered: