14 Commits

Author SHA1 Message Date
Luka Govedič
4f93dfe952
[torch.compile] Fuse RMSNorm with quant (#9138)
Signed-off-by: luka <luka@neuralmagic.com>
Co-authored-by: youkaichao <youkaichao@126.com>
2024-11-08 21:20:08 +00:00
Luka Govedič
7937009a7e
[Kernel] Replaced blockReduce[...] functions with cub::BlockReduce (#7233)
Co-authored-by: Michael Goin <michael@neuralmagic.com>
2024-08-21 20:18:00 -04:00
bnellnm
5467ac3196
[Kernel][Misc] Use TORCH_LIBRARY instead of PYBIND11_MODULE for custom ops (#5047) 2024-06-09 16:23:30 -04:00
Michael Goin
5f6d10c14c
[CI/Build] Enforce style for C++ and CUDA code with clang-format (#4722) 2024-05-22 07:18:41 +00:00
Matt Wong
59a6abf3c9
[Hotfix][CI/Build][Kernel] CUDA 11.8 does not support layernorm optimizations (#3782) 2024-04-08 14:31:02 -07:00
mawong-amd
b6d103542c
[Kernel] Layernorm performance optimization (#3662) 2024-03-30 14:26:38 -07:00
Jee Li
77af974b40
[FIX] Support non-zero CUDA devices in custom kernels (#1959) 2024-01-02 19:09:59 -08:00
ljss
e1054247ba
[Optimization] Implement fused add rmsnorm (#1667) 2023-11-18 18:18:02 -08:00
Woosuk Kwon
c1376e0f82
Change scheduler & input tensor shape (#1381) 2023-10-16 17:48:42 -07:00
Woosuk Kwon
8ce9c50d40
Avoid compiling kernels for double data type (#933) 2023-09-02 14:59:47 +09:00
Woosuk Kwon
0b98ba15c7
Change the name to vLLM (#150) 2023-06-17 03:07:40 -07:00
Woosuk Kwon
e070829ae8
Support bfloat16 data type (#54) 2023-05-03 14:09:44 -07:00
Woosuk Kwon
436e523bf1
Refactor attention kernels (#53) 2023-05-03 13:40:13 -07:00
Woosuk Kwon
09e9245478
Add custom kernel for RMS normalization (#16) 2023-04-01 00:51:22 +08:00