vllm/docs/source/features/quantization/supported_hardware.md

(quantization-supported-hardware)=

# Supported Hardware

The table below shows the compatibility of various quantization implementations with different hardware platforms in vLLM:

```{list-table}
:header-rows: 1
:widths: 20 8 8 8 8 8 8 8 8 8 8

* - Implementation
  - Volta
  - Turing
  - Ampere
  - Ada
  - Hopper
  - AMD GPU
  - Intel GPU
  - x86 CPU
  - AWS Inferentia
  - Google TPU
* - AWQ
  - ✗
  - ✅︎
  - ✅︎
  - ✅︎
  - ✅︎
  - ✗
  - ✅︎
  - ✅︎
  - ✗
  - ✗
* - GPTQ
  - ✅︎
  - ✅︎
  - ✅︎
  - ✅︎
  - ✅︎
  - ✗
  - ✅︎
  - ✅︎
  - ✗
  - ✗
* - Marlin (GPTQ/AWQ/FP8)
  - ✗
  - ✗
  - ✅︎
  - ✅︎
  - ✅︎
  - ✗
  - ✗
  - ✗
  - ✗
  - ✗
* - INT8 (W8A8)
  - ✗
  - ✅︎
  - ✅︎
  - ✅︎
  - ✅︎
  - ✗
  - ✗
  - ✅︎
  - ✗
  - ✗
* - FP8 (W8A8)
  - ✗
  - ✗
  - ✗
  - ✅︎
  - ✅︎
  - ✅︎
  - ✗
  - ✗
  - ✗
  - ✗
* - AQLM
  - ✅︎
  - ✅︎
  - ✅︎
  - ✅︎
  - ✅︎
  - ✗
  - ✗
  - ✗
  - ✗
  - ✗
* - bitsandbytes
  - ✅︎
  - ✅︎
  - ✅︎
  - ✅︎
  - ✅︎
  - ✗
  - ✗
  - ✗
  - ✗
  - ✗
* - DeepSpeedFP
  - ✅︎
  - ✅︎
  - ✅︎
  - ✅︎
  - ✅︎
  - ✗
  - ✗
  - ✗
  - ✗
  - ✗
* - GGUF
  - ✅︎
  - ✅︎
  - ✅︎
  - ✅︎
  - ✅︎
  - ✅︎
  - ✗
  - ✗
  - ✗
  - ✗
```

- Volta refers to SM 7.0, Turing to SM 7.5, Ampere to SM 8.0/8.6, Ada to SM 8.9, and Hopper to SM 9.0.
- "✅︎" indicates that the quantization method is supported on the specified hardware.
- "✗" indicates that the quantization method is not supported on the specified hardware.

```{note}
This compatibility chart is subject to change as vLLM continues to evolve and expand its support for different hardware platforms and quantization methods.

For the most up-to-date information on hardware support and quantization methods, please refer to <gh-dir:vllm/model_executor/layers/quantization> or consult with the vLLM development team.
```
[Doc][2/N] Reorganize Models and Usage sections (#11755) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> 2025-01-06 21:40:31 +08:00			`(quantization-supported-hardware)=`
[Docs] Convert rST to MyST (Markdown) (#11145) Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com> 2024-12-23 17:35:38 -05:00
[Doc][2/N] Reorganize Models and Usage sections (#11755) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> 2025-01-06 21:40:31 +08:00			`# Supported Hardware`
[Docs] Convert rST to MyST (Markdown) (#11145) Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com> 2024-12-23 17:35:38 -05:00
			`The table below shows the compatibility of various quantization implementations with different hardware platforms in vLLM:`

[Doc] Convert list tables to MyST (#11594) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> 2024-12-29 15:56:22 +08:00			```{list-table}
			`:header-rows: 1`
			`:widths: 20 8 8 8 8 8 8 8 8 8 8`
[Docs] Convert rST to MyST (Markdown) (#11145) Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com> 2024-12-23 17:35:38 -05:00
[Doc] Convert list tables to MyST (#11594) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> 2024-12-29 15:56:22 +08:00			`* - Implementation`
			`- Volta`
			`- Turing`
			`- Ampere`
			`- Ada`
			`- Hopper`
			`- AMD GPU`
			`- Intel GPU`
			`- x86 CPU`
			`- AWS Inferentia`
			`- Google TPU`
			`* - AWQ`
			`- ✗`
			`- ✅︎`
			`- ✅︎`
			`- ✅︎`
			`- ✅︎`
			`- ✗`
			`- ✅︎`
			`- ✅︎`
			`- ✗`
			`- ✗`
			`* - GPTQ`
			`- ✅︎`
			`- ✅︎`
			`- ✅︎`
			`- ✅︎`
			`- ✅︎`
			`- ✗`
			`- ✅︎`
			`- ✅︎`
			`- ✗`
			`- ✗`
			`* - Marlin (GPTQ/AWQ/FP8)`
			`- ✗`
			`- ✗`
			`- ✅︎`
			`- ✅︎`
			`- ✅︎`
			`- ✗`
			`- ✗`
			`- ✗`
			`- ✗`
			`- ✗`
			`* - INT8 (W8A8)`
			`- ✗`
			`- ✅︎`
			`- ✅︎`
			`- ✅︎`
			`- ✅︎`
			`- ✗`
			`- ✗`
			`- ✅︎`
			`- ✗`
			`- ✗`
			`* - FP8 (W8A8)`
			`- ✗`
			`- ✗`
			`- ✗`
			`- ✅︎`
			`- ✅︎`
			`- ✅︎`
			`- ✗`
			`- ✗`
			`- ✗`
			`- ✗`
			`* - AQLM`
			`- ✅︎`
			`- ✅︎`
			`- ✅︎`
			`- ✅︎`
			`- ✅︎`
			`- ✗`
			`- ✗`
			`- ✗`
			`- ✗`
			`- ✗`
			`* - bitsandbytes`
			`- ✅︎`
			`- ✅︎`
			`- ✅︎`
			`- ✅︎`
			`- ✅︎`
			`- ✗`
			`- ✗`
			`- ✗`
			`- ✗`
			`- ✗`
			`* - DeepSpeedFP`
			`- ✅︎`
			`- ✅︎`
			`- ✅︎`
			`- ✅︎`
			`- ✅︎`
			`- ✗`
			`- ✗`
			`- ✗`
			`- ✗`
			`- ✗`
			`* - GGUF`
			`- ✅︎`
			`- ✅︎`
			`- ✅︎`
			`- ✅︎`
			`- ✅︎`
[Doc] Update Quantization Hardware Support Documentation (#12025) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com> Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com> 2025-01-14 12:37:52 +08:00			`- ✅︎`
[Doc] Convert list tables to MyST (#11594) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> 2024-12-29 15:56:22 +08:00			`- ✗`
			`- ✗`
			`- ✗`
			`- ✗`
[Docs] Convert rST to MyST (Markdown) (#11145) Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com> 2024-12-23 17:35:38 -05:00			```

			`- Volta refers to SM 7.0, Turing to SM 7.5, Ampere to SM 8.0/8.6, Ada to SM 8.9, and Hopper to SM 9.0.`
			`- "✅︎" indicates that the quantization method is supported on the specified hardware.`
			`- "✗" indicates that the quantization method is not supported on the specified hardware.`

[Doc][2/N] Reorganize Models and Usage sections (#11755) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> 2025-01-06 21:40:31 +08:00			```{note}
			`This compatibility chart is subject to change as vLLM continues to evolve and expand its support for different hardware platforms and quantization methods.`
[Docs] Convert rST to MyST (Markdown) (#11145) Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com> 2024-12-23 17:35:38 -05:00
[Doc] Improve GitHub links (#11491) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> 2024-12-26 06:49:26 +08:00			`For the most up-to-date information on hardware support and quantization methods, please refer to <gh-dir:vllm/model_executor/layers/quantization> or consult with the vLLM development team.`
[Doc][2/N] Reorganize Models and Usage sections (#11755) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> 2025-01-06 21:40:31 +08:00			```