vllm/gptq_marlin at 970d6d0776076f17604077ba4d484cdadd604ceb - vllm - Luminance Code Repo

20231088/vllm

History

ElizaWszola b00b33d77e

[Model][Quantization] HQQ support through Marlin kernel expansion (#9766 )

Signed-off-by: ElizaWszola <eliza@neuralmagic.com>

2024-11-19 13:31:12 -08:00

..

awq_marlin_repack.cu

[CI/Build] Per file CUDA Archs (improve wheel size and dev build times) (#8845 )

2024-10-03 22:55:25 -04:00

gptq_marlin_repack.cu

[CI/Build] Per file CUDA Archs (improve wheel size and dev build times) (#8845 )

2024-10-03 22:55:25 -04:00

gptq_marlin.cu

[Model][Quantization] HQQ support through Marlin kernel expansion (#9766 )

2024-11-19 13:31:12 -08:00

marlin_dtypes.cuh

[Kernel][Core] Add AWQ support to the Marlin kernel (#6612 )

2024-07-21 19:41:42 -04:00

marlin.cuh

[Kernel][Core] Add AWQ support to the Marlin kernel (#6612 )

2024-07-21 19:41:42 -04:00