2025-01-06 21:40:31 +08:00
|
|
|
(quantization-index)=
|
|
|
|
|
|
|
|
# Quantization
|
|
|
|
|
|
|
|
Quantization trades off model precision for smaller memory footprint, allowing large models to be run on a wider range of devices.
|
|
|
|
|
2025-01-29 03:38:29 +00:00
|
|
|
:::{toctree}
|
2025-01-06 21:40:31 +08:00
|
|
|
:caption: Contents
|
|
|
|
:maxdepth: 1
|
|
|
|
|
|
|
|
supported_hardware
|
|
|
|
auto_awq
|
|
|
|
bnb
|
|
|
|
gguf
|
|
|
|
int8
|
|
|
|
fp8
|
2025-01-22 22:18:09 -05:00
|
|
|
quantized_kvcache
|
2025-01-29 03:38:29 +00:00
|
|
|
:::
|