Dipika Sikka
|
60508ffda9
|
[Kernel]: Cutlass 2:4 Sparsity + FP8/Int8 Quant Support (#10995)
Co-authored-by: Faraz Shahsavan <faraz.shahsavan@gmail.com>
Co-authored-by: ilmarkov <markovilya197@gmail.com>
Co-authored-by: Rahul Tuli <rahul@neuralmagic.com>
Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
|
2024-12-18 09:57:16 -05:00 |
|
Luka Govedič
|
8d59dbb000
|
[Kernel] Add per-tensor and per-token AZP epilogues (#5941)
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2024-08-06 18:17:08 +00:00 |
|
Varun Sundar Rabindranath
|
35e9c12bfa
|
[Kernel] Tuned int8 Cutlass Kernels for SM75 (T4) (#6996)
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
|
2024-07-31 14:40:32 -07:00 |
|
Varun Sundar Rabindranath
|
766435e660
|
[Kernel] Tuned FP8 Kernels for Ada Lovelace (#6677)
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
|
2024-07-29 09:42:35 -06:00 |
|
Woosuk Kwon
|
a9a2e74d21
|
[Misc] Use torch.Tensor for type annotation (#6505)
|
2024-07-17 13:01:10 +00:00 |
|
Michael Goin
|
8065a7e220
|
[Frontend] Add FlexibleArgumentParser to support both underscore and dash in names (#5718)
|
2024-06-20 17:00:13 -06:00 |
|
Tyler Michael Smith
|
6820724e51
|
[Bugfix] Fix w8a8 benchmarks for int8 case (#5643)
|
2024-06-19 00:33:25 +00:00 |
|
Cody Yu
|
e2b85cf86a
|
Fix w8a8 benchmark and add Llama-3-8B (#5562)
|
2024-06-17 06:48:06 +00:00 |
|
Tyler Michael Smith
|
85657b5607
|
[Kernel] Factor out epilogues from cutlass kernels (#5391)
Co-authored-by: Michael Goin <michael@neuralmagic.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: zifeitong <zifei.tong@parasail.io>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com>
|
2024-06-13 11:22:19 -07:00 |
|
Varun Sundar Rabindranath
|
f081c3ce4b
|
[Kernel] Update Cutlass fp8 configs (#5144)
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com>
|
2024-06-01 08:46:07 +00:00 |
|