20231088/vllm - vllm - Luminance Code Repo

20231088/vllm

Author	SHA1	Message	Date
Lucas Wilkinson	103bd17ac5	[Build] Only build 9.0a for scaled_mm and sparse kernels (#12339 ) Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>	2025-01-27 10:40:00 -05:00
tvirolai-amd	cd9d06fb8d	Allow hip sources to be directly included when compiling for rocm. (#12087 )	2025-01-15 16:46:03 -05:00
Wallas Henrique	cfd3219f58	[Hardware][Apple] Native support for macOS Apple Silicon (#11696 ) Signed-off-by: Wallas Santos <wallashss@ibm.com> Co-authored-by: Michael Goin <michael@neuralmagic.com>	2025-01-08 16:35:49 +08:00
Sanket Kale	a6760f6456	[Feature] vLLM ARM Enablement for AARCH64 CPUs (#9228 ) Signed-off-by: Sanket Kale <sanketk.kale@fujitsu.com> Co-authored-by: Sanket Kale <sanketk.kale@fujitsu.com> Co-authored-by: mgoin <michael@neuralmagic.com>	2024-11-25 18:32:39 -08:00
Manjul Mohan	1ea291a417	Fix: Build error seen on Power Architecture (#10421 ) Signed-off-by: Manjul Mohan <manjul.mohan@ibm.com> Signed-off-by: B-201 <Joy25810@foxmail.com> Signed-off-by: Isotr0py <2037008807@qq.com> Signed-off-by: youkaichao <youkaichao@gmail.com> Signed-off-by: ismael-dm <ismaeldm99@gmail.com> Signed-off-by: Andrew Nesbitt <andrewnez@gmail.com> Signed-off-by: mgoin <michael@neuralmagic.com> Signed-off-by: yan ma <yan.ma@intel.com> Signed-off-by: Angus Wang <wangjadehao@gmail.com> Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com> Signed-off-by: rickyx <rickyx@anyscale.com> Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Signed-off-by: Mengqing Cao <cmq0113@163.com> Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com> Co-authored-by: Manjul Mohan manjul.mohan@ibm.com <manjulmohan@ltcd97-lp2.aus.stglabs.ibm.com> Co-authored-by: B-201 <Joy25810@foxmail.com> Co-authored-by: Isotr0py <2037008807@qq.com> Co-authored-by: youkaichao <youkaichao@gmail.com> Co-authored-by: ismael-dm <ismaeldm99@gmail.com> Co-authored-by: Andrew Nesbitt <andrewnez@gmail.com> Co-authored-by: Michael Goin <michael@neuralmagic.com> Co-authored-by: Yan Ma <yan.ma@intel.com> Co-authored-by: Angus Wang <wangjadehao@gmail.com> Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Co-authored-by: Ricky Xu <rickyx@anyscale.com> Co-authored-by: Kevin H. Luu <kevin@anyscale.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com> Co-authored-by: Mengqing Cao <cmq0113@163.com> Co-authored-by: Travis Johnson <tsjohnso@us.ibm.com> Co-authored-by: Russell Bryant <rbryant@redhat.com>	2024-11-19 09:34:57 -08:00
Li, Jiang	a6f332d0d9	[Hardware][CPU][bugfix] Fix half dtype support on AVX2-only target (#10108 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2024-11-07 18:42:50 +08:00
Li, Jiang	a4b3e0c1e9	[Hardware][CPU] Update torch 2.5 (#9911 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2024-11-07 04:43:08 +00:00
bnellnm	3cb07a36a2	[Misc] Upgrade to pytorch 2.5 (#9588 ) Signed-off-by: Bill Nell <bill@neuralmagic.com> Signed-off-by: youkaichao <youkaichao@gmail.com> Co-authored-by: youkaichao <youkaichao@gmail.com>	2024-10-27 09:44:24 +00:00
Li, Jiang	5eda21e773	[Hardware][CPU] compressed-tensor INT8 W8A8 AZP support (#9344 )	2024-10-17 12:21:04 -04:00
Varad Ahirwadkar	e5dc713c23	[Hardware][PowerPC] Make oneDNN dependency optional for Power (#9039 ) Signed-off-by: Varad Ahirwadkar <varad.ahirwadkar1@ibm.com>	2024-10-04 17:24:42 +00:00
Lucas Wilkinson	aeb37c2a72	[CI/Build] Per file CUDA Archs (improve wheel size and dev build times) (#8845 )	2024-10-03 22:55:25 -04:00
Luka Govedič	57a0702e63	[Bugfix] Fix CPU CMake build (#8723 ) Co-authored-by: Yuan <yuan.zhou@intel.com>	2024-09-22 20:40:46 -07:00
Luka Govedič	71c60491f2	[Kernel] Build flash-attn from source (#8245 )	2024-09-20 23:27:10 -07:00
bnellnm	de6f90a13d	[Misc] guard against change in cuda library name (#8609 )	2024-09-20 06:36:30 +08:00
bnellnm	73202dbe77	[Kernel][Misc] register ops to prevent graph breaks (#6917 ) Co-authored-by: Sage Moore <sage@neuralmagic.com>	2024-09-11 12:52:19 -07:00
Li, Jiang	0b952af458	[Hardware][Intel] Support compressed-tensor W8A8 for CPU backend (#7257 )	2024-09-11 09:46:46 -07:00
Jee Jee Li	f80ab3521c	Clean up remaining Punica C information (#7027 )	2024-08-04 15:37:08 -07:00
Lucas Wilkinson	a8d604ca2a	[Misc] Disambiguate quantized types via a new ScalarType (#6396 )	2024-08-02 13:51:58 -07:00
Li, Jiang	3bbb4936dc	[Hardware] [Intel] Enable Multiprocessing and tensor parallel in CPU backend and update documentation (#6125 )	2024-07-26 13:50:10 -07:00
Chip Kerchner	38a1674abb	Support CPU inference with VSX PowerPC ISA (#5652 )	2024-06-26 21:53:04 +00:00
Matt Wong	dd793d1de5	[Hardware][AMD][CI/Build][Doc] Upgrade to ROCm 6.1, Dockerfile improvements, test fixes (#5422 )	2024-06-25 15:56:15 -07:00
Hongxia Yang	f758aed0e8	[Bugfix][CI/Build][AMD][ROCm]Fixed the cmake build bug which generate garbage on certain devices (#5641 )	2024-06-18 23:21:29 -07:00
Jie Fu (傅杰)	ab66536dbf	[CI/BUILD] Support non-AVX512 vLLM building and testing (#5574 )	2024-06-17 14:36:10 -04:00
Jie Fu (傅杰)	cd9c0d65d9	[Hardware][Intel] Support CPU inference with AVX2 ISA (#5452 )	2024-06-13 17:22:24 -06:00
bnellnm	5467ac3196	[Kernel][Misc] Use TORCH_LIBRARY instead of PYBIND11_MODULE for custom ops (#5047 )	2024-06-09 16:23:30 -04:00
Cody Yu	c833101740	[Kernel] Refactor FP8 kv-cache with NVIDIA float8_e4m3 support (#4535 )	2024-05-09 18:04:17 -06:00
Matt Wong	59a6abf3c9	[Hotfix][CI/Build][Kernel] CUDA 11.8 does not support layernorm optimizations (#3782 )	2024-04-08 14:31:02 -07:00
Adrian Abeyta	2ff767b513	Enable scaled FP8 (e4m3fn) KV cache on ROCm (AMD GPU) (#3290 ) Co-authored-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com> Co-authored-by: HaiShaw <hixiao@gmail.com> Co-authored-by: AdrianAbeyta <Adrian.Abeyta@amd.com> Co-authored-by: Matthew Wong <Matthew.Wong2@amd.com> Co-authored-by: root <root@gt-pla-u18-08.pla.dcgpu> Co-authored-by: mawong-amd <156021403+mawong-amd@users.noreply.github.com> Co-authored-by: ttbachyinsda <ttbachyinsda@outlook.com> Co-authored-by: guofangze <guofangze@kuaishou.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: jacobthebanana <50071502+jacobthebanana@users.noreply.github.com> Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2024-04-03 14:15:55 -07:00
bigPYJ1151	0e3f06fe9c	[Hardware][Intel] Add CPU inference backend (#3634 ) Co-authored-by: Kunshang Ji <kunshang.ji@intel.com> Co-authored-by: Yuan Zhou <yuan.zhou@intel.com>	2024-04-01 22:07:30 -07:00
mawong-amd	b6d103542c	[Kernel] Layernorm performance optimization (#3662 )	2024-03-30 14:26:38 -07:00
Simon Mo	51c31bc10c	CMake build elf without PTX (#3739 )	2024-03-30 01:53:08 +00:00
bnellnm	3ad438c66f	Fix build when nvtools is missing (#3698 )	2024-03-29 18:52:39 -07:00
SangBin Cho	01bfb22b41	[CI] Try introducing isort. (#3495 )	2024-03-25 07:59:47 -07:00
bnellnm	9fdf3de346	Cmake based build system (#2830 )	2024-03-18 15:38:33 -07:00

34 Commits