8 Commits

Author SHA1 Message Date
Lucas Wilkinson
288cc6c234
[Attention] MLA with chunked prefill (#12639)
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Co-authored-by: Patrick Horn <patrick.horn@gmail.com>
Co-authored-by: simon-mo <xmo@berkeley.edu>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
2025-02-21 15:30:12 -08:00
Kaixi Hou
4fc5c23bb6
[NVIDIA] Support nvfp4 quantization (#12784) 2025-02-12 19:51:51 -08:00
Lucas Wilkinson
5288c06aa0
[Kernel] (1/N) Machete - Hopper Optimized Mixed Precision Linear Kernel (#7174) 2024-08-20 07:09:33 -06:00
bnellnm
5467ac3196
[Kernel][Misc] Use TORCH_LIBRARY instead of PYBIND11_MODULE for custom ops (#5047) 2024-06-09 16:23:30 -04:00
Michael Goin
5f6d10c14c
[CI/Build] Enforce style for C++ and CUDA code with clang-format (#4722) 2024-05-22 07:18:41 +00:00
Hongxia Yang
6b7de1a030
[ROCm] add support to ROCm 6.0 and MI300 (#2274) 2024-01-26 12:41:10 -08:00
Mingcan Xiang
614856da25
Avoid multiple redefinition (#1817) 2023-12-14 09:35:58 -08:00
Yanming W
e0c6f556e8
[Build] Avoid building too many extensions (#1624) 2023-11-23 16:31:19 -08:00