Lucas Wilkinson
|
a8d604ca2a
|
[Misc] Disambiguate quantized types via a new ScalarType (#6396)
|
2024-08-02 13:51:58 -07:00 |
|
Tyler Michael Smith
|
61a97c32f6
|
[Kernel] Fix marlin divide-by-zero warnings (#6904)
|
2024-07-30 01:26:07 +00:00 |
|
Tyler Michael Smith
|
b23ce92032
|
[Bugfix] Fix CUDA version check for mma warning suppression (#5642)
|
2024-06-18 23:48:49 +00:00 |
|
Tyler Michael Smith
|
348616ac4b
|
[Kernel] Suppress mma.sp warning on CUDA 12.5 and later (#5401)
|
2024-06-14 10:02:00 -07:00 |
|
bnellnm
|
5467ac3196
|
[Kernel][Misc] Use TORCH_LIBRARY instead of PYBIND11_MODULE for custom ops (#5047)
|
2024-06-09 16:23:30 -04:00 |
|
Simon Mo
|
e9d3aa04f6
|
Revert "[Kernel] Marlin_24: Ensure the mma.sp instruction is using the ::ordered_metadata modifier (introduced with PTX 8.5)" (#5149)
|
2024-05-30 22:00:26 -07:00 |
|
Alexander Matveev
|
6d21fa1cad
|
[Kernel] Marlin_24: Ensure the mma.sp instruction is using the ::ordered_metadata modifier (introduced with PTX 8.5) (#5136)
|
2024-05-30 21:02:11 -05:00 |
|
Alexander Matveev
|
6066253296
|
Marlin 24 prefill performance improvement (about 25% better on average) (#4983)
|
2024-05-23 02:39:27 -04:00 |
|
Michael Goin
|
5f6d10c14c
|
[CI/Build] Enforce style for C++ and CUDA code with clang-format (#4722)
|
2024-05-22 07:18:41 +00:00 |
|
Alexander Matveev
|
6979ade384
|
Add GPTQ Marlin 2:4 sparse structured support (#4790)
Co-authored-by: Robert Shaw <rshaw@neuralmagic.com>
|
2024-05-16 12:56:15 -04:00 |
|