bnellnm
|
de6f90a13d
|
[Misc] guard against change in cuda library name (#8609)
|
2024-09-20 06:36:30 +08:00 |
|
bnellnm
|
73202dbe77
|
[Kernel][Misc] register ops to prevent graph breaks (#6917)
Co-authored-by: Sage Moore <sage@neuralmagic.com>
|
2024-09-11 12:52:19 -07:00 |
|
Li, Jiang
|
0b952af458
|
[Hardware][Intel] Support compressed-tensor W8A8 for CPU backend (#7257)
|
2024-09-11 09:46:46 -07:00 |
|
Jee Jee Li
|
f80ab3521c
|
Clean up remaining Punica C information (#7027)
|
2024-08-04 15:37:08 -07:00 |
|
Lucas Wilkinson
|
a8d604ca2a
|
[Misc] Disambiguate quantized types via a new ScalarType (#6396)
|
2024-08-02 13:51:58 -07:00 |
|
Li, Jiang
|
3bbb4936dc
|
[Hardware] [Intel] Enable Multiprocessing and tensor parallel in CPU backend and update documentation (#6125)
|
2024-07-26 13:50:10 -07:00 |
|
Chip Kerchner
|
38a1674abb
|
Support CPU inference with VSX PowerPC ISA (#5652)
|
2024-06-26 21:53:04 +00:00 |
|
Matt Wong
|
dd793d1de5
|
[Hardware][AMD][CI/Build][Doc] Upgrade to ROCm 6.1, Dockerfile improvements, test fixes (#5422)
|
2024-06-25 15:56:15 -07:00 |
|
Hongxia Yang
|
f758aed0e8
|
[Bugfix][CI/Build][AMD][ROCm]Fixed the cmake build bug which generate garbage on certain devices (#5641)
|
2024-06-18 23:21:29 -07:00 |
|
Jie Fu (傅杰)
|
ab66536dbf
|
[CI/BUILD] Support non-AVX512 vLLM building and testing (#5574)
|
2024-06-17 14:36:10 -04:00 |
|
Jie Fu (傅杰)
|
cd9c0d65d9
|
[Hardware][Intel] Support CPU inference with AVX2 ISA (#5452)
|
2024-06-13 17:22:24 -06:00 |
|
bnellnm
|
5467ac3196
|
[Kernel][Misc] Use TORCH_LIBRARY instead of PYBIND11_MODULE for custom ops (#5047)
|
2024-06-09 16:23:30 -04:00 |
|
Cody Yu
|
c833101740
|
[Kernel] Refactor FP8 kv-cache with NVIDIA float8_e4m3 support (#4535)
|
2024-05-09 18:04:17 -06:00 |
|
Matt Wong
|
59a6abf3c9
|
[Hotfix][CI/Build][Kernel] CUDA 11.8 does not support layernorm optimizations (#3782)
|
2024-04-08 14:31:02 -07:00 |
|
Adrian Abeyta
|
2ff767b513
|
Enable scaled FP8 (e4m3fn) KV cache on ROCm (AMD GPU) (#3290)
Co-authored-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
Co-authored-by: HaiShaw <hixiao@gmail.com>
Co-authored-by: AdrianAbeyta <Adrian.Abeyta@amd.com>
Co-authored-by: Matthew Wong <Matthew.Wong2@amd.com>
Co-authored-by: root <root@gt-pla-u18-08.pla.dcgpu>
Co-authored-by: mawong-amd <156021403+mawong-amd@users.noreply.github.com>
Co-authored-by: ttbachyinsda <ttbachyinsda@outlook.com>
Co-authored-by: guofangze <guofangze@kuaishou.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: jacobthebanana <50071502+jacobthebanana@users.noreply.github.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2024-04-03 14:15:55 -07:00 |
|
bigPYJ1151
|
0e3f06fe9c
|
[Hardware][Intel] Add CPU inference backend (#3634)
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
Co-authored-by: Yuan Zhou <yuan.zhou@intel.com>
|
2024-04-01 22:07:30 -07:00 |
|
mawong-amd
|
b6d103542c
|
[Kernel] Layernorm performance optimization (#3662)
|
2024-03-30 14:26:38 -07:00 |
|
Simon Mo
|
51c31bc10c
|
CMake build elf without PTX (#3739)
|
2024-03-30 01:53:08 +00:00 |
|
bnellnm
|
3ad438c66f
|
Fix build when nvtools is missing (#3698)
|
2024-03-29 18:52:39 -07:00 |
|
SangBin Cho
|
01bfb22b41
|
[CI] Try introducing isort. (#3495)
|
2024-03-25 07:59:47 -07:00 |
|
bnellnm
|
9fdf3de346
|
Cmake based build system (#2830)
|
2024-03-18 15:38:33 -07:00 |
|