On-device LLM Inference Powered by X-Bit Quantization
natural-language-processing
compression
self-hosted
llama
language-models
quantization
language-model
gemma
mistral
model-compression
efficient-inference
llm
llms
generative-ai
large-language-model
llama2
mixtral
llm-infernece
llama3
-
Updated
Jun 24, 2024 - Python