20 lines
301 B
Markdown
20 lines
301 B
Markdown
![]() |
(quantization-index)=
|
||
|
|
||
|
# Quantization
|
||
|
|
||
|
Quantization trades off model precision for smaller memory footprint, allowing large models to be run on a wider range of devices.
|
||
|
|
||
|
```{toctree}
|
||
|
:caption: Contents
|
||
|
:maxdepth: 1
|
||
|
|
||
|
supported_hardware
|
||
|
auto_awq
|
||
|
bnb
|
||
|
gguf
|
||
|
int8
|
||
|
fp8
|
||
|
fp8_e5m2_kvcache
|
||
|
fp8_e4m3_kvcache
|
||
|
```
|