(quantization-supported-hardware)= # Supported Hardware The table below shows the compatibility of various quantization implementations with different hardware platforms in vLLM: :::{list-table} :header-rows: 1 :widths: 20 8 8 8 8 8 8 8 8 8 8 - * Implementation * Volta * Turing * Ampere * Ada * Hopper * AMD GPU * Intel GPU * x86 CPU * AWS Inferentia * Google TPU - * AWQ * ❌ * ✅︎ * ✅︎ * ✅︎ * ✅︎ * ❌ * ✅︎ * ✅︎ * ❌ * ❌ - * GPTQ * ✅︎ * ✅︎ * ✅︎ * ✅︎ * ✅︎ * ❌ * ✅︎ * ✅︎ * ❌ * ❌ - * Marlin (GPTQ/AWQ/FP8) * ❌ * ❌ * ✅︎ * ✅︎ * ✅︎ * ❌ * ❌ * ❌ * ❌ * ❌ - * INT8 (W8A8) * ❌ * ✅︎ * ✅︎ * ✅︎ * ✅︎ * ❌ * ❌ * ✅︎ * ❌ * ✅︎ - * FP8 (W8A8) * ❌ * ❌ * ❌ * ✅︎ * ✅︎ * ✅︎ * ❌ * ❌ * ❌ * ❌ - * AQLM * ✅︎ * ✅︎ * ✅︎ * ✅︎ * ✅︎ * ❌ * ❌ * ❌ * ❌ * ❌ - * bitsandbytes * ✅︎ * ✅︎ * ✅︎ * ✅︎ * ✅︎ * ❌ * ❌ * ❌ * ❌ * ❌ - * DeepSpeedFP * ✅︎ * ✅︎ * ✅︎ * ✅︎ * ✅︎ * ❌ * ❌ * ❌ * ❌ * ❌ - * GGUF * ✅︎ * ✅︎ * ✅︎ * ✅︎ * ✅︎ * ✅︎ * ❌ * ❌ * ❌ * ❌ ::: - Volta refers to SM 7.0, Turing to SM 7.5, Ampere to SM 8.0/8.6, Ada to SM 8.9, and Hopper to SM 9.0. - ✅︎ indicates that the quantization method is supported on the specified hardware. - ❌ indicates that the quantization method is not supported on the specified hardware. :::{note} This compatibility chart is subject to change as vLLM continues to evolve and expand its support for different hardware platforms and quantization methods. For the most up-to-date information on hardware support and quantization methods, please refer to or consult with the vLLM development team. :::