20231088/vllm - vllm - Luminance Code Repo

20231088/vllm

Author	SHA1	Message	Date
rongfu.leng	4e9cf8c1dd	[Bugfix] fix gettid method is not define (#16084 ) Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>	2025-04-08 19:12:44 -07:00
Li, Jiang	550b2801ad	[CPU][Bugfix] Using custom allreduce for CPU backend (#15934 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2025-04-02 07:46:47 -07:00
Thien Tran	4f044b1d67	[Kernel][CPU] CPU MLA (#14744 ) Signed-off-by: Thien Tran <gau.nernst@yahoo.com.sg>	2025-03-25 09:34:59 +00:00
Li, Jiang	a2ae496589	[CPU] Support FP8 KV cache (#14741 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2025-03-14 22:07:36 -07:00
Thien Tran	27b50f1fe6	[Bugfix][Kernel][CPU] Fix num_tokens in CPU rotary embedding kernel (#14667 ) Signed-off-by: Thien Tran <gau.nernst@yahoo.com.sg>	2025-03-13 23:47:49 -07:00
Dilip Gowda Bhagavan	ada19210a3	Adding cpu inference with VXE ISA for s390x architecture (#12613 ) Signed-off-by: Dilip Gowda Bhagavan <dilip.bhagavan@ibm.com> Signed-off-by: Rishika Kedia <rishika.kedia@in.ibm.com> Co-authored-by: Rishika Kedia <rishika.kedia@in.ibm.com>	2025-03-06 08:40:53 -08:00
Sheng Yao	09e56f9262	[Bugfix] Explicitly include "omp.h" for MacOS to avoid installation failure (#14051 )	2025-03-02 17:35:01 -08:00
Gregory Shtrasberg	e97f802b2d	[FP8][Kernel] Dynamic kv cache scaling factors computation (#11906 ) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com> Co-authored-by: Micah Williamson <micah.williamson@amd.com>	2025-01-23 18:04:03 +00:00
Harry Mellor	3ea7b94523	Move linting to `pre-commit` (#11975 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-01-20 14:58:01 +08:00
Wallas Henrique	cfd3219f58	[Hardware][Apple] Native support for macOS Apple Silicon (#11696 ) Signed-off-by: Wallas Santos <wallashss@ibm.com> Co-authored-by: Michael Goin <michael@neuralmagic.com>	2025-01-08 16:35:49 +08:00
Lu Fang	4068f4b5b5	[MISC] Replace c10::optional with std::optional (#11730 ) Signed-off-by: Lu Fang <lufang@fb.com>	2025-01-05 10:20:34 +09:00
Sanket Kale	a6760f6456	[Feature] vLLM ARM Enablement for AARCH64 CPUs (#9228 ) Signed-off-by: Sanket Kale <sanketk.kale@fujitsu.com> Co-authored-by: Sanket Kale <sanketk.kale@fujitsu.com> Co-authored-by: mgoin <michael@neuralmagic.com>	2024-11-25 18:32:39 -08:00
Manjul Mohan	1ea291a417	Fix: Build error seen on Power Architecture (#10421 ) Signed-off-by: Manjul Mohan <manjul.mohan@ibm.com> Signed-off-by: B-201 <Joy25810@foxmail.com> Signed-off-by: Isotr0py <2037008807@qq.com> Signed-off-by: youkaichao <youkaichao@gmail.com> Signed-off-by: ismael-dm <ismaeldm99@gmail.com> Signed-off-by: Andrew Nesbitt <andrewnez@gmail.com> Signed-off-by: mgoin <michael@neuralmagic.com> Signed-off-by: yan ma <yan.ma@intel.com> Signed-off-by: Angus Wang <wangjadehao@gmail.com> Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com> Signed-off-by: rickyx <rickyx@anyscale.com> Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Signed-off-by: Mengqing Cao <cmq0113@163.com> Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com> Co-authored-by: Manjul Mohan manjul.mohan@ibm.com <manjulmohan@ltcd97-lp2.aus.stglabs.ibm.com> Co-authored-by: B-201 <Joy25810@foxmail.com> Co-authored-by: Isotr0py <2037008807@qq.com> Co-authored-by: youkaichao <youkaichao@gmail.com> Co-authored-by: ismael-dm <ismaeldm99@gmail.com> Co-authored-by: Andrew Nesbitt <andrewnez@gmail.com> Co-authored-by: Michael Goin <michael@neuralmagic.com> Co-authored-by: Yan Ma <yan.ma@intel.com> Co-authored-by: Angus Wang <wangjadehao@gmail.com> Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Co-authored-by: Ricky Xu <rickyx@anyscale.com> Co-authored-by: Kevin H. Luu <kevin@anyscale.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com> Co-authored-by: Mengqing Cao <cmq0113@163.com> Co-authored-by: Travis Johnson <tsjohnso@us.ibm.com> Co-authored-by: Russell Bryant <rbryant@redhat.com>	2024-11-19 09:34:57 -08:00
Maximilien de Bayser	4a18fd14ba	Support Roberta embedding models (#9387 ) Signed-off-by: Max de Bayser <mbayser@br.ibm.com> Signed-off-by: Flavia Beo <flavia.beo@ibm.com> Co-authored-by: Flavia Beo <flavia.beo@ibm.com>	2024-11-14 21:23:29 +00:00
Li, Jiang	a6f332d0d9	[Hardware][CPU][bugfix] Fix half dtype support on AVX2-only target (#10108 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2024-11-07 18:42:50 +08:00
Li, Jiang	a4b3e0c1e9	[Hardware][CPU] Update torch 2.5 (#9911 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2024-11-07 04:43:08 +00:00
Li, Jiang	5eda21e773	[Hardware][CPU] compressed-tensor INT8 W8A8 AZP support (#9344 )	2024-10-17 12:21:04 -04:00
Luka Govedič	5d73ae49d6	[Kernel] AQ AZP 3/4: Asymmetric quantization kernels (#7270 )	2024-09-16 11:52:40 -07:00
bnellnm	73202dbe77	[Kernel][Misc] register ops to prevent graph breaks (#6917 ) Co-authored-by: Sage Moore <sage@neuralmagic.com>	2024-09-11 12:52:19 -07:00
Li, Jiang	0b952af458	[Hardware][Intel] Support compressed-tensor W8A8 for CPU backend (#7257 )	2024-09-11 09:46:46 -07:00
Lucas Wilkinson	a8d604ca2a	[Misc] Disambiguate quantized types via a new ScalarType (#6396 )	2024-08-02 13:51:58 -07:00
Li, Jiang	3bbb4936dc	[Hardware] [Intel] Enable Multiprocessing and tensor parallel in CPU backend and update documentation (#6125 )	2024-07-26 13:50:10 -07:00
Michael Goin	978aed5300	[Kernel][Attention] Separate `Attention.kv_scale` into `k_scale` and `v_scale` (#6081 )	2024-07-16 15:31:32 -07:00
Chip Kerchner	38a1674abb	Support CPU inference with VSX PowerPC ISA (#5652 )	2024-06-26 21:53:04 +00:00
Roger Wang	bd620b01fb	[Kernel][CPU] Add Quick `gelu` to CPU (#5717 )	2024-06-21 06:39:40 +00:00
Jie Fu (傅杰)	cd9c0d65d9	[Hardware][Intel] Support CPU inference with AVX2 ISA (#5452 )	2024-06-13 17:22:24 -06:00
bnellnm	5467ac3196	[Kernel][Misc] Use TORCH_LIBRARY instead of PYBIND11_MODULE for custom ops (#5047 )	2024-06-09 16:23:30 -04:00
Jie Fu (傅杰)	6840a71610	[Misc] Remove unused cuda_utils.h in CPU backend (#5345 )	2024-06-07 14:09:13 -07:00
Yuan	cafb8e06c5	[CI/BUILD] enable intel queue for longer CPU tests (#4113 )	2024-06-03 10:39:50 -07:00
SnowDist	a22dea54d3	[Model] Support MAP-NEO model (#5081 ) Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>	2024-05-30 19:24:41 -07:00
Eric Xihui Lin	8e192ff967	[Kernel][Backend][Model] Blocksparse flash attention kernel and Phi-3-Small model (#4799 ) Co-authored-by: beagleski <yunanzhang@microsoft.com> Co-authored-by: bapatra <bapatra@microsoft.com> Co-authored-by: Barun Patra <codedecde@users.noreply.github.com> Co-authored-by: Michael Goin <michael@neuralmagic.com>	2024-05-24 22:00:52 -07:00
Michael Goin	5f6d10c14c	[CI/Build] Enforce style for C++ and CUDA code with `clang-format` (#4722 )	2024-05-22 07:18:41 +00:00
Steve Grubb	dac6a3f6ed	[Misc] Apply a couple g++ cleanups (#4719 )	2024-05-10 13:37:05 +00:00
youkaichao	20cfcdec99	[Core][Optimization] change python dict to pytorch tensor for blocks to swap (#4659 )	2024-05-08 12:07:05 -07:00
youkaichao	63575bc2e1	[Core][Optimization] change python dict to pytorch tensor (#4607 )	2024-05-06 21:30:27 -07:00
SangBin Cho	3521ba4f25	[Core][Model runner refactoring 1/N] Refactor attn metadata term (#4518 )	2024-05-03 10:20:12 -07:00
Woosuk Kwon	498eb5cfa3	[Bugfix] Add kv_scale input parameter to CPU backend (#3840 )	2024-04-04 04:33:08 +00:00
bigPYJ1151	0e3f06fe9c	[Hardware][Intel] Add CPU inference backend (#3634 ) Co-authored-by: Kunshang Ji <kunshang.ji@intel.com> Co-authored-by: Yuan Zhou <yuan.zhou@intel.com>	2024-04-01 22:07:30 -07:00

38 Commits