Optimized/fused kernels for GEMV with 4-bit quantized weights #2207

EliasVansteenkiste · 2024-06-03T14:07:48Z

"The TorchAO kernel is optimized to speed up GEMV operations with 4-bit quantized weights."
source: https://mobiusml.github.io/whisper-static-cache-blog/

I was wondering if there any optimized kernels for 4-bit quantization in whisper.cpp?

Context: I want to test out HQQ with 4-bit quantized weights in the whisper.cpp repository and I am wondering how i need to interpret the results. Is there room for improvement in terms of speed or not?

Additional references:
https://github.com/mobiusml/hqq

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimized/fused kernels for GEMV with 4-bit quantized weights #2207

Optimized/fused kernels for GEMV with 4-bit quantized weights #2207

EliasVansteenkiste commented Jun 3, 2024 •

edited

Loading

Optimized/fused kernels for GEMV with 4-bit quantized weights #2207

Optimized/fused kernels for GEMV with 4-bit quantized weights #2207

Comments

EliasVansteenkiste commented Jun 3, 2024 • edited Loading

EliasVansteenkiste commented Jun 3, 2024 •

edited

Loading