Lu Fang
|
4068f4b5b5
|
[MISC] Replace c10::optional with std::optional (#11730)
Signed-off-by: Lu Fang <lufang@fb.com>
|
2025-01-05 10:20:34 +09:00 |
|
wchen61
|
5dba257506
|
Resolve race conditions in Marlin kernel (#11493)
Signed-off-by: wchen61 <wchen61@foxmail.com>
|
2025-01-02 22:58:56 +00:00 |
|
Tyler Michael Smith
|
970d6d0776
|
[Build][Kernel] Update CUTLASS to v3.6.0 (#11607)
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2024-12-30 17:22:13 +08:00 |
|
Varun Sundar Rabindranath
|
8936316d58
|
[Kernel] Refactor Cutlass c3x (#10049)
Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
|
2024-12-19 07:00:18 +00:00 |
|
Dipika Sikka
|
60508ffda9
|
[Kernel]: Cutlass 2:4 Sparsity + FP8/Int8 Quant Support (#10995)
Co-authored-by: Faraz Shahsavan <faraz.shahsavan@gmail.com>
Co-authored-by: ilmarkov <markovilya197@gmail.com>
Co-authored-by: Rahul Tuli <rahul@neuralmagic.com>
Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
|
2024-12-18 09:57:16 -05:00 |
|
Luka Govedič
|
30870b4f66
|
[torch.compile] Dynamic fp8 + rms_norm fusion (#10906)
Signed-off-by: luka <luka@neuralmagic.com>
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
|
2024-12-13 03:19:23 +00:00 |
|
Tyler Michael Smith
|
e2251109c7
|
[Kernel] Remove if-else with identical branches in marlin 2:4 (#10687)
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2024-11-26 22:55:32 -08:00 |
|
kliuae
|
7c25fe45a6
|
[AMD] Add support for GGUF quantization on ROCm (#10254)
|
2024-11-22 21:14:49 -08:00 |
|
Lucas Wilkinson
|
d200972e7f
|
[Bugfix] Marlin 2:4 temp fix for large M dim (>256) (#10464)
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
|
2024-11-19 19:40:33 -08:00 |
|
ElizaWszola
|
b00b33d77e
|
[Model][Quantization] HQQ support through Marlin kernel expansion (#9766)
Signed-off-by: ElizaWszola <eliza@neuralmagic.com>
|
2024-11-19 13:31:12 -08:00 |
|
Lucas Wilkinson
|
96d999fbe8
|
[Kernel] Initial Machete W4A8 support + Refactors (#9855)
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
|
2024-11-18 12:59:29 -07:00 |
|
Luka Govedič
|
4f93dfe952
|
[torch.compile] Fuse RMSNorm with quant (#9138)
Signed-off-by: luka <luka@neuralmagic.com>
Co-authored-by: youkaichao <youkaichao@126.com>
|
2024-11-08 21:20:08 +00:00 |
|
Aaron Pham
|
21063c11c7
|
[CI/Build] drop support for Python 3.8 EOL (#8464)
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
|
2024-11-06 07:11:55 +00:00 |
|
Lucas Wilkinson
|
d1e8240875
|
[Bugfix] Fix spurious "No compiled cutlass_scaled_mm ..." for W8A8 on Turing (#9487)
|
2024-10-22 15:41:13 -07:00 |
|
bnellnm
|
eca2c5f7c0
|
[Bugfix] Fix support for dimension like integers and ScalarType (#9299)
|
2024-10-17 19:08:34 +00:00 |
|
rasmith
|
92d86da217
|
[BugFix] [Kernel] Fix GPU SEGV occurring in int8 kernels (#9391)
|
2024-10-17 01:34:06 +00:00 |
|
Tyler Michael Smith
|
c3fab5f769
|
[Bugfix][Kernel] Prevent integer overflow in fp8 dynamic per-token quantize kernel (#9425)
|
2024-10-16 23:46:06 +00:00 |
|
Lucas Wilkinson
|
18511aeda6
|
[Bugfix] Fix Machete unittests failing with NotImplementedError (#9218)
|
2024-10-10 17:39:56 +00:00 |
|
Lucas Wilkinson
|
a64e7b9407
|
[Bugfix] Machete garbage results for some models (large K dim) (#9212)
|
2024-10-10 14:16:17 +08:00 |
|
ElizaWszola
|
05d686432f
|
[Kernel] Zero point support in fused MarlinMoE kernel + AWQ Fused MoE (#8973)
Co-authored-by: Dipika <dipikasikka1@gmail.com>
Co-authored-by: Dipika Sikka <ds3822@columbia.edu>
|
2024-10-04 12:34:44 -06:00 |
|
Lucas Wilkinson
|
aeb37c2a72
|
[CI/Build] Per file CUDA Archs (improve wheel size and dev build times) (#8845)
|
2024-10-03 22:55:25 -04:00 |
|
Kevin H. Luu
|
aaccca2b4d
|
[CI/Build] Fix machete generated kernel files ordering (#8976)
Signed-off-by: kevin <kevin@anyscale.com>
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>
|
2024-10-01 03:33:12 +00:00 |
|
Lucas Wilkinson
|
86e9c8df29
|
[Kernel] (2/N) Machete - Integrate into CompressedTensorsWNA16 and GPTQMarlin (#7701)
Co-authored-by: mgoin <michael@neuralmagic.com>
Co-authored-by: Divakar Verma <137818590+divakar-amd@users.noreply.github.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2024-09-23 13:46:26 -04:00 |
|
Luka Govedič
|
5d73ae49d6
|
[Kernel] AQ AZP 3/4: Asymmetric quantization kernels (#7270)
|
2024-09-16 11:52:40 -07:00 |
|
sasha0552
|
781e3b9a42
|
[Bugfix][Kernel] Fix build for sm_60 in GGUF kernel (#8506)
|
2024-09-16 12:15:57 -06:00 |
|
Isotr0py
|
fc990f9795
|
[Bugfix][Kernel] Add IQ1_M quantization implementation to GGUF kernel (#8357)
|
2024-09-15 16:51:44 -06:00 |
|
bnellnm
|
73202dbe77
|
[Kernel][Misc] register ops to prevent graph breaks (#6917)
Co-authored-by: Sage Moore <sage@neuralmagic.com>
|
2024-09-11 12:52:19 -07:00 |
|
Dipika Sikka
|
23f322297f
|
[Misc] Remove SqueezeLLM (#8220)
|
2024-09-06 16:29:03 -06:00 |
|
Lucas Wilkinson
|
55d63b1211
|
[Bugfix] Don't build machete on cuda <12.0 (#7757)
|
2024-08-22 08:28:52 -04:00 |
|
Luka Govedič
|
7937009a7e
|
[Kernel] Replaced blockReduce[...] functions with cub::BlockReduce (#7233)
Co-authored-by: Michael Goin <michael@neuralmagic.com>
|
2024-08-21 20:18:00 -04:00 |
|
Lucas Wilkinson
|
5288c06aa0
|
[Kernel] (1/N) Machete - Hopper Optimized Mixed Precision Linear Kernel (#7174)
|
2024-08-20 07:09:33 -06:00 |
|
bnellnm
|
37fd47e780
|
[Kernel] fix types used in aqlm and ggml kernels to support dynamo (#7596)
|
2024-08-16 14:00:11 -07:00 |
|
Charlie Fu
|
e837b624f2
|
[Feature][Hardware][Amd] Add fp8 Linear Layer for Rocm (#7210)
|
2024-08-16 10:06:30 -07:00 |
|
Lucas Wilkinson
|
6aa33cb2dd
|
[Misc] Use scalar type to dispatch to different gptq_marlin kernels (#7323)
|
2024-08-12 14:40:13 -04:00 |
|
Luka Govedič
|
8d59dbb000
|
[Kernel] Add per-tensor and per-token AZP epilogues (#5941)
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2024-08-06 18:17:08 +00:00 |
|
Isotr0py
|
360bd67cf0
|
[Core] Support loading GGUF model (#5191)
Co-authored-by: Michael Goin <michael@neuralmagic.com>
|
2024-08-05 17:54:23 -06:00 |
|
Tyler Michael Smith
|
6e4852ce28
|
[CI/Build] Suppress divide-by-zero and missing return statement warnings (#7001)
|
2024-08-05 16:00:01 -04:00 |
|
Tyler Michael Smith
|
8571ac4672
|
[Kernel] Update CUTLASS to 3.5.1 (#7085)
|
2024-08-05 15:13:43 -04:00 |
|
Lucas Wilkinson
|
a8d604ca2a
|
[Misc] Disambiguate quantized types via a new ScalarType (#6396)
|
2024-08-02 13:51:58 -07:00 |
|
Varun Sundar Rabindranath
|
35e9c12bfa
|
[Kernel] Tuned int8 Cutlass Kernels for SM75 (T4) (#6996)
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
|
2024-07-31 14:40:32 -07:00 |
|
Varun Sundar Rabindranath
|
93548eb37e
|
[Kernel] Enable FP8 Cutlass for Ada Lovelace (#6950)
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
|
2024-07-31 14:40:22 -07:00 |
|
HandH1998
|
6512937de1
|
Support W4A8 quantization for vllm (#5218)
|
2024-07-31 07:55:21 -06:00 |
|
Tyler Michael Smith
|
cbbc904470
|
[Kernel] Squash a few more warnings (#6914)
|
2024-07-30 13:50:42 -04:00 |
|
Varun Sundar Rabindranath
|
af647fb8b3
|
[Kernel] Tuned int8 kernels for Ada Lovelace (#6848)
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
|
2024-07-29 20:24:58 -06:00 |
|
Tyler Michael Smith
|
61a97c32f6
|
[Kernel] Fix marlin divide-by-zero warnings (#6904)
|
2024-07-30 01:26:07 +00:00 |
|
Tyler Michael Smith
|
aae6d36f7e
|
[Kernel] Remove unused variables in awq/gemm_kernels.cu (#6908)
|
2024-07-29 18:01:17 -06:00 |
|
Tyler Michael Smith
|
60d1c6e584
|
[Kernel] Fix deprecation function warnings squeezellm quant_cuda_kernel (#6901)
|
2024-07-29 09:59:02 -07:00 |
|
Varun Sundar Rabindranath
|
766435e660
|
[Kernel] Tuned FP8 Kernels for Ada Lovelace (#6677)
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
|
2024-07-29 09:42:35 -06:00 |
|
Alexander Matveev
|
75acdaa4b6
|
[Kernel] Increase precision of GPTQ/AWQ Marlin kernel (#6795)
|
2024-07-27 17:52:33 -04:00 |
|
Lucas Wilkinson
|
55712941e5
|
[Bug Fix] Illegal memory access, FP8 Llama 3.1 405b (#6852)
|
2024-07-27 02:27:44 +00:00 |
|