Lucas Wilkinson
|
a8d604ca2a
|
[Misc] Disambiguate quantized types via a new ScalarType (#6396)
|
2024-08-02 13:51:58 -07:00 |
|
HandH1998
|
6512937de1
|
Support W4A8 quantization for vllm (#5218)
|
2024-07-31 07:55:21 -06:00 |
|
Alexander Matveev
|
75acdaa4b6
|
[Kernel] Increase precision of GPTQ/AWQ Marlin kernel (#6795)
|
2024-07-27 17:52:33 -04:00 |
|
Alexander Matveev
|
396d92d5e0
|
[Kernel][Core] Add AWQ support to the Marlin kernel (#6612)
|
2024-07-21 19:41:42 -04:00 |
|
Robert Shaw
|
b675069d74
|
[ Misc ] Refactor Marlin Python Utilities (#6082)
Co-authored-by: Robert Shaw <rshaw@neuralmagic.com>
|
2024-07-11 15:40:11 +00:00 |
|
Michael Goin
|
47f0954af0
|
[Kernel] Expand FP8 support to Ampere GPUs using FP8 Marlin (#5975)
|
2024-07-03 17:38:00 +00:00 |
|
Alexander Matveev
|
6066253296
|
Marlin 24 prefill performance improvement (about 25% better on average) (#4983)
|
2024-05-23 02:39:27 -04:00 |
|
Alexander Matveev
|
27ce85476e
|
[Kernel] Add marlin_24 unit tests (#4901)
|
2024-05-19 11:37:34 -04:00 |
|
alexm-nm
|
5c342570d7
|
Add marlin unit tests and marlin benchmark script (#4815)
|
2024-05-16 09:36:49 -04:00 |
|