Luka Govedič
|
8d59dbb000
|
[Kernel] Add per-tensor and per-token AZP epilogues (#5941)
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2024-08-06 18:17:08 +00:00 |
|
Varun Sundar Rabindranath
|
af647fb8b3
|
[Kernel] Tuned int8 kernels for Ada Lovelace (#6848)
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
|
2024-07-29 20:24:58 -06:00 |
|
Varun Sundar Rabindranath
|
766435e660
|
[Kernel] Tuned FP8 Kernels for Ada Lovelace (#6677)
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
|
2024-07-29 09:42:35 -06:00 |
|
youkaichao
|
482045ee77
|
[hardware][misc] introduce platform abstraction (#6080)
|
2024-07-02 20:12:22 -07:00 |
|
youkaichao
|
614aa51203
|
[misc][cuda] use nvml to avoid accidentally cuda initialization (#6007)
|
2024-06-30 20:07:34 -07:00 |
|
Tyler Michael Smith
|
6a2d659d28
|
[Bugfix] Fix compute datatype for cutlass 3.x epilogues (#5931)
|
2024-06-28 17:10:34 +00:00 |
|
Luka Govedič
|
5bfd1bbc98
|
[Kernel] Adding bias epilogue support for cutlass_scaled_mm (#5560)
Co-authored-by: Chih-Chieh-Yang <7364402+cyang49@users.noreply.github.com>
Co-authored-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
|
2024-06-26 15:16:00 +00:00 |
|
Cyrus Leung
|
0e9164b40a
|
[mypy] Enable type checking for test directory (#5017)
|
2024-06-15 04:45:31 +00:00 |
|
Tyler Michael Smith
|
85657b5607
|
[Kernel] Factor out epilogues from cutlass kernels (#5391)
Co-authored-by: Michael Goin <michael@neuralmagic.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: zifeitong <zifei.tong@parasail.io>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com>
|
2024-06-13 11:22:19 -07:00 |
|
Varun Sundar Rabindranath
|
f081c3ce4b
|
[Kernel] Update Cutlass fp8 configs (#5144)
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com>
|
2024-06-01 08:46:07 +00:00 |
|
Tyler Michael Smith
|
260d119e86
|
[Kernel] Refactor CUTLASS kernels to always take scales that reside on the GPU (#5137)
|
2024-06-01 06:45:32 +00:00 |
|
Tyler Michael Smith
|
8674f9880e
|
[Kernel] Fixup for CUTLASS kernels in CUDA graphs (#4954)
Pass the CUDA stream into the CUTLASS GEMMs, to avoid future issues with CUDA graphs
|
2024-05-22 14:10:43 +00:00 |
|
Tyler Michael Smith
|
2060e93659
|
[Kernel] Add w8a8 CUTLASS kernels (#4749)
|
2024-05-16 18:32:50 -04:00 |
|