vllm/quantization at e31045f95ca0f7262b156cd7e3e34100cbf1f4d1 - vllm - Luminance Code Repo

20231088/vllm

History

Michael Goin ed37599544

Update supported_hardware.md for TPU INT8 (#16437 )

2025-04-11 12:28:07 +08:00

..

auto_awq.md

[Docs] Add GPTQModel (#14056 )

2025-03-03 21:59:09 +00:00

bnb.md

[Misc] Auto detect bitsandbytes pre-quantized models (#16027 )

2025-04-04 23:30:45 -07:00

fp8.md

[Doc] Convert docs to use colon fences (#12471 )

2025-01-29 11:38:29 +08:00

gguf.md

doc: fix some typos in doc (#16154 )

2025-04-07 05:32:06 +00:00

gptqmodel.md

[Docs] Add GPTQModel (#14056 )

2025-03-03 21:59:09 +00:00

index.md

Torchao (#14231 )

2025-04-07 19:39:28 -04:00

int4.md

[Doc] int4 w4a16 example (#12585 )

2025-01-31 15:38:48 -08:00

int8.md

[Doc] int4 w4a16 example (#12585 )

2025-01-31 15:38:48 -08:00

quantized_kvcache.md

[FP8][Kernel] Dynamic kv cache scaling factors computation (#11906 )

2025-01-23 18:04:03 +00:00

quark.md

[Doc] Quark quantization documentation (#15861 )

2025-04-01 08:32:45 -07:00

supported_hardware.md

Update supported_hardware.md for TPU INT8 (#16437 )

2025-04-11 12:28:07 +08:00

torchao.md

Torchao (#14231 )

2025-04-07 19:39:28 -04:00