vllm/docs/source/features/quantization/supported_hardware.md

(quantization-supported-hardware)=

# Supported Hardware

The table below shows the compatibility of various quantization implementations with different hardware platforms in vLLM:

:::{list-table}
:header-rows: 1
:widths: 20 8 8 8 8 8 8 8 8 8 8

- * Implementation
  * Volta
  * Turing
  * Ampere
  * Ada
  * Hopper
  * AMD GPU
  * Intel GPU
  * x86 CPU
  * AWS Inferentia
  * Google TPU
- * AWQ
  * ❌
  * ✅︎
  * ✅︎
  * ✅︎
  * ✅︎
  * ❌
  * ✅︎
  * ✅︎
  * ❌
  * ❌
- * GPTQ
  * ✅︎
  * ✅︎
  * ✅︎
  * ✅︎
  * ✅︎
  * ❌
  * ✅︎
  * ✅︎
  * ❌
  * ❌
- * Marlin (GPTQ/AWQ/FP8)
  * ❌
  * ❌
  * ✅︎
  * ✅︎
  * ✅︎
  * ❌
  * ❌
  * ❌
  * ❌
  * ❌
- * INT8 (W8A8)
  * ❌
  * ✅︎
  * ✅︎
  * ✅︎
  * ✅︎
  * ❌
  * ❌
  * ✅︎
  * ❌
  * ❌
- * FP8 (W8A8)
  * ❌
  * ❌
  * ❌
  * ✅︎
  * ✅︎
  * ✅︎
  * ❌
  * ❌
  * ❌
  * ❌
- * AQLM
  * ✅︎
  * ✅︎
  * ✅︎
  * ✅︎
  * ✅︎
  * ❌
  * ❌
  * ❌
  * ❌
  * ❌
- * bitsandbytes
  * ✅︎
  * ✅︎
  * ✅︎
  * ✅︎
  * ✅︎
  * ❌
  * ❌
  * ❌
  * ❌
  * ❌
- * DeepSpeedFP
  * ✅︎
  * ✅︎
  * ✅︎
  * ✅︎
  * ✅︎
  * ❌
  * ❌
  * ❌
  * ❌
  * ❌
- * GGUF
  * ✅︎
  * ✅︎
  * ✅︎
  * ✅︎
  * ✅︎
  * ✅︎
  * ❌
  * ❌
  * ❌
  * ❌

:::

- Volta refers to SM 7.0, Turing to SM 7.5, Ampere to SM 8.0/8.6, Ada to SM 8.9, and Hopper to SM 9.0.
- ✅︎ indicates that the quantization method is supported on the specified hardware.
- ❌ indicates that the quantization method is not supported on the specified hardware.

:::{note}
This compatibility chart is subject to change as vLLM continues to evolve and expand its support for different hardware platforms and quantization methods.

For the most up-to-date information on hardware support and quantization methods, please refer to <gh-dir:vllm/model_executor/layers/quantization> or consult with the vLLM development team.
:::
[Doc][2/N] Reorganize Models and Usage sections (#11755) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> 2025-01-06 21:40:31 +08:00			`(quantization-supported-hardware)=`
[Docs] Convert rST to MyST (Markdown) (#11145) Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com> 2024-12-23 17:35:38 -05:00
[Doc][2/N] Reorganize Models and Usage sections (#11755) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> 2025-01-06 21:40:31 +08:00			`# Supported Hardware`
[Docs] Convert rST to MyST (Markdown) (#11145) Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com> 2024-12-23 17:35:38 -05:00
			`The table below shows the compatibility of various quantization implementations with different hardware platforms in vLLM:`

[Doc] Convert docs to use colon fences (#12471) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> 2025-01-29 03:38:29 +00:00			`:::{list-table}`
[Doc] Convert list tables to MyST (#11594) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> 2024-12-29 15:56:22 +08:00			`:header-rows: 1`
			`:widths: 20 8 8 8 8 8 8 8 8 8 8`
[Docs] Convert rST to MyST (Markdown) (#11145) Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com> 2024-12-23 17:35:38 -05:00
[Doc] Convert docs to use colon fences (#12471) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> 2025-01-29 03:38:29 +00:00			`- * Implementation`
			`* Volta`
			`* Turing`
			`* Ampere`
			`* Ada`
			`* Hopper`
			`* AMD GPU`
			`* Intel GPU`
			`* x86 CPU`
			`* AWS Inferentia`
			`* Google TPU`
			`- * AWQ`
[Doc]: Improve feature tables (#13224) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> 2025-02-18 10:52:39 +00:00			`* ❌`
[Doc] Convert docs to use colon fences (#12471) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> 2025-01-29 03:38:29 +00:00			`* ✅︎`
			`* ✅︎`
			`* ✅︎`
			`* ✅︎`
[Doc]: Improve feature tables (#13224) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> 2025-02-18 10:52:39 +00:00			`* ❌`
[Doc] Convert docs to use colon fences (#12471) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> 2025-01-29 03:38:29 +00:00			`* ✅︎`
			`* ✅︎`
[Doc]: Improve feature tables (#13224) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> 2025-02-18 10:52:39 +00:00			`* ❌`
			`* ❌`
[Doc] Convert docs to use colon fences (#12471) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> 2025-01-29 03:38:29 +00:00			`- * GPTQ`
			`* ✅︎`
			`* ✅︎`
			`* ✅︎`
			`* ✅︎`
			`* ✅︎`
[Doc]: Improve feature tables (#13224) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> 2025-02-18 10:52:39 +00:00			`* ❌`
[Doc] Convert docs to use colon fences (#12471) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> 2025-01-29 03:38:29 +00:00			`* ✅︎`
			`* ✅︎`
[Doc]: Improve feature tables (#13224) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> 2025-02-18 10:52:39 +00:00			`* ❌`
			`* ❌`
[Doc] Convert docs to use colon fences (#12471) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> 2025-01-29 03:38:29 +00:00			`- * Marlin (GPTQ/AWQ/FP8)`
[Doc]: Improve feature tables (#13224) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> 2025-02-18 10:52:39 +00:00			`* ❌`
			`* ❌`
[Doc] Convert docs to use colon fences (#12471) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> 2025-01-29 03:38:29 +00:00			`* ✅︎`
			`* ✅︎`
			`* ✅︎`
[Doc]: Improve feature tables (#13224) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> 2025-02-18 10:52:39 +00:00			`* ❌`
			`* ❌`
			`* ❌`
			`* ❌`
			`* ❌`
[Doc] Convert docs to use colon fences (#12471) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> 2025-01-29 03:38:29 +00:00			`- * INT8 (W8A8)`
[Doc]: Improve feature tables (#13224) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> 2025-02-18 10:52:39 +00:00			`* ❌`
[Doc] Convert docs to use colon fences (#12471) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> 2025-01-29 03:38:29 +00:00			`* ✅︎`
			`* ✅︎`
			`* ✅︎`
			`* ✅︎`
[Doc]: Improve feature tables (#13224) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> 2025-02-18 10:52:39 +00:00			`* ❌`
			`* ❌`
[Doc] Convert docs to use colon fences (#12471) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> 2025-01-29 03:38:29 +00:00			`* ✅︎`
[Doc]: Improve feature tables (#13224) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> 2025-02-18 10:52:39 +00:00			`* ❌`
			`* ❌`
[Doc] Convert docs to use colon fences (#12471) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> 2025-01-29 03:38:29 +00:00			`- * FP8 (W8A8)`
[Doc]: Improve feature tables (#13224) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> 2025-02-18 10:52:39 +00:00			`* ❌`
			`* ❌`
			`* ❌`
[Doc] Convert docs to use colon fences (#12471) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> 2025-01-29 03:38:29 +00:00			`* ✅︎`
			`* ✅︎`
			`* ✅︎`
[Doc]: Improve feature tables (#13224) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> 2025-02-18 10:52:39 +00:00			`* ❌`
			`* ❌`
			`* ❌`
			`* ❌`
[Doc] Convert docs to use colon fences (#12471) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> 2025-01-29 03:38:29 +00:00			`- * AQLM`
			`* ✅︎`
			`* ✅︎`
			`* ✅︎`
			`* ✅︎`
			`* ✅︎`
[Doc]: Improve feature tables (#13224) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> 2025-02-18 10:52:39 +00:00			`* ❌`
			`* ❌`
			`* ❌`
			`* ❌`
			`* ❌`
[Doc] Convert docs to use colon fences (#12471) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> 2025-01-29 03:38:29 +00:00			`- * bitsandbytes`
			`* ✅︎`
			`* ✅︎`
			`* ✅︎`
			`* ✅︎`
			`* ✅︎`
[Doc]: Improve feature tables (#13224) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> 2025-02-18 10:52:39 +00:00			`* ❌`
			`* ❌`
			`* ❌`
			`* ❌`
			`* ❌`
[Doc] Convert docs to use colon fences (#12471) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> 2025-01-29 03:38:29 +00:00			`- * DeepSpeedFP`
			`* ✅︎`
			`* ✅︎`
			`* ✅︎`
			`* ✅︎`
			`* ✅︎`
[Doc]: Improve feature tables (#13224) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> 2025-02-18 10:52:39 +00:00			`* ❌`
			`* ❌`
			`* ❌`
			`* ❌`
			`* ❌`
[Doc] Convert docs to use colon fences (#12471) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> 2025-01-29 03:38:29 +00:00			`- * GGUF`
			`* ✅︎`
			`* ✅︎`
			`* ✅︎`
			`* ✅︎`
			`* ✅︎`
			`* ✅︎`
[Doc]: Improve feature tables (#13224) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> 2025-02-18 10:52:39 +00:00			`* ❌`
			`* ❌`
			`* ❌`
			`* ❌`
[Doc] Convert docs to use colon fences (#12471) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> 2025-01-29 03:38:29 +00:00
			`:::`
[Docs] Convert rST to MyST (Markdown) (#11145) Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com> 2024-12-23 17:35:38 -05:00
			`- Volta refers to SM 7.0, Turing to SM 7.5, Ampere to SM 8.0/8.6, Ada to SM 8.9, and Hopper to SM 9.0.`
[Doc]: Improve feature tables (#13224) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> 2025-02-18 10:52:39 +00:00			`- ✅︎ indicates that the quantization method is supported on the specified hardware.`
			`- ❌ indicates that the quantization method is not supported on the specified hardware.`
[Docs] Convert rST to MyST (Markdown) (#11145) Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com> 2024-12-23 17:35:38 -05:00
[Doc] Convert docs to use colon fences (#12471) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> 2025-01-29 03:38:29 +00:00			`:::{note}`
[Doc][2/N] Reorganize Models and Usage sections (#11755) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> 2025-01-06 21:40:31 +08:00			`This compatibility chart is subject to change as vLLM continues to evolve and expand its support for different hardware platforms and quantization methods.`
[Docs] Convert rST to MyST (Markdown) (#11145) Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com> 2024-12-23 17:35:38 -05:00
[Doc] Improve GitHub links (#11491) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> 2024-12-26 06:49:26 +08:00			`For the most up-to-date information on hardware support and quantization methods, please refer to <gh-dir:vllm/model_executor/layers/quantization> or consult with the vLLM development team.`
[Doc] Convert docs to use colon fences (#12471) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> 2025-01-29 03:38:29 +00:00			`:::`