vllm/gptq at a10d3056da644c31e4ebf95a2b6ad65a626a7350 - vllm - Luminance Code Repo

20231088/vllm

History

Antoni Baum a10d3056da

[Core] Set linear_weights directly on the layer (#3977 )

2024-04-11 16:35:51 -04:00

..

compat.cuh

Add GPTQ support (#916 )

2023-12-15 03:04:22 -08:00

matrix_view.cuh

Add Support for 2/3/8-bit GPTQ Quantization Models (#2330 )

2024-02-28 21:52:23 -08:00

q_gemm.cu

[Core] Set linear_weights directly on the layer (#3977 )

2024-04-11 16:35:51 -04:00

qdq_2.cuh

Add Support for 2/3/8-bit GPTQ Quantization Models (#2330 )

2024-02-28 21:52:23 -08:00

qdq_3.cuh

Add Support for 2/3/8-bit GPTQ Quantization Models (#2330 )

2024-02-28 21:52:23 -08:00

qdq_4.cuh

Add Support for 2/3/8-bit GPTQ Quantization Models (#2330 )

2024-02-28 21:52:23 -08:00

qdq_8.cuh

Add Support for 2/3/8-bit GPTQ Quantization Models (#2330 )

2024-02-28 21:52:23 -08:00

qdq_util.cuh

Add GPTQ support (#916 )

2023-12-15 03:04:22 -08:00