20231088/vllm - vllm - Luminance Code Repo

20231088/vllm

Author	SHA1	Message	Date
Ethan Xu	dbfe254eda	[Feature] vLLM CLI (#5090 ) Co-authored-by: simon-mo <simon.mo@hey.com>	2024-07-14 15:36:43 -07:00
youkaichao	ccd3c04571	[ci][build] fix commit id (#6420 ) Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2024-07-14 22:16:21 +08:00
Michael Goin	111fc6e7ec	[Misc] Add generated git commit hash as `vllm.__commit__` (#6386 )	2024-07-12 22:52:15 +00:00
Ilya Lavrenov	57f09a419c	[Hardware][Intel] OpenVINO vLLM backend (#5379 )	2024-06-28 13:50:16 +00:00
Kunshang Ji	728c4c8a06	[Hardware][Intel GPU] Add Intel GPU(XPU) inference backend (#3814 ) Co-authored-by: Jiang Li <jiang1.li@intel.com> Co-authored-by: Abhilash Majumder <abhilash.majumder@intel.com> Co-authored-by: Abhilash Majumder <30946547+abhilash1910@users.noreply.github.com>	2024-06-17 11:01:25 -07:00
Cyrus Leung	03dccc886e	[Misc] Add vLLM version getter to utils (#5098 )	2024-06-13 11:21:39 -07:00
Kevin H. Luu	916d219d62	[ci] Use sccache to build images (#5419 ) Signed-off-by: kevin <kevin@anyscale.com>	2024-06-12 17:58:12 -07:00
Woosuk Kwon	1a8bfd92d5	[Hardware] Initial TPU integration (#5292 )	2024-06-12 11:53:03 -07:00
Woosuk Kwon	8bab4959be	[Misc] Remove VLLM_BUILD_WITH_NEURON env variable (#5389 )	2024-06-11 00:37:56 -07:00
bnellnm	5467ac3196	[Kernel][Misc] Use TORCH_LIBRARY instead of PYBIND11_MODULE for custom ops (#5047 )	2024-06-09 16:23:30 -04:00
Divakar Verma	a66cf40b20	[Kernel][ROCm][AMD] enable fused topk_softmax kernel for moe layer (#4927 ) This PR enables the fused topk_softmax kernel used in moe layer for HIP	2024-06-02 14:13:26 -07:00
Daniele	a360ff80bb	[CI/Build] CMakeLists: build all extensions' cmake targets at the same time (#5034 )	2024-05-31 22:06:45 -06:00
youkaichao	5bd3c65072	[Core][Optimization] remove vllm-nccl (#5091 )	2024-05-29 05:13:52 +00:00
Sanger Steel	8bc68e198c	[Frontend] [Core] perf: Automatically detect vLLM-tensorized model, update `tensorizer` to version 2.9.0 (#4208 )	2024-05-13 14:57:07 -07:00
kliuae	ff5abcd746	[ROCm] Add support for Punica kernels on AMD GPUs (#3140 ) Co-authored-by: miloice <jeffaw99@hotmail.com>	2024-05-09 09:19:50 -07:00
Woosuk Kwon	89579a201f	[Misc] Use vllm-flash-attn instead of flash-attn (#4686 )	2024-05-08 13:15:34 -07:00
youkaichao	344bf7cd2d	[Misc] add installation time env vars (#4574 )	2024-05-03 15:55:56 -07:00
Hu Dong	5ad60b0cbd	[Misc] Exclude the `tests` directory from being packaged (#4552 )	2024-05-02 10:50:25 -07:00
Travis Johnson	8b798eec75	[CI/Build][Bugfix] VLLM_USE_PRECOMPILED should skip compilation (#4534 ) Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>	2024-05-01 18:01:50 +00:00
Alpay Ariyak	715c2d854d	[Frontend] [Core] Tensorizer: support dynamic `num_readers`, update version (#4467 )	2024-04-30 16:32:13 -07:00
SangBin Cho	a88081bf76	[CI] Disable non-lazy string operation on logging (#4326 ) Co-authored-by: Danny Guinther <dguinther@neuralmagic.com>	2024-04-26 00:16:58 -07:00
Liangfu Chen	cd2f63fb36	[CI/CD] add neuron docker and ci test scripts (#3571 )	2024-04-18 15:26:01 -07:00
Nick Hill	563c54f760	[BugFix] Fix tensorizer extra in setup.py (#4072 )	2024-04-14 14:12:42 -07:00
Sanger Steel	711a000255	[Frontend] [Core] feat: Add model loading using `tensorizer` (#3476 )	2024-04-13 17:13:01 -07:00
Michael Feil	c2b4a1bce9	[Doc] Add typing hints / mypy types cleanup (#3816 ) Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>	2024-04-11 17:17:21 -07:00
Woosuk Kwon	cfaf49a167	[Misc] Define common requirements (#3841 )	2024-04-05 00:39:17 -07:00
youkaichao	ca81ff5196	[Core] manage nccl via a pypi package & upgrade to pt 2.2.1 (#3805 )	2024-04-04 10:26:19 -07:00
bigPYJ1151	0e3f06fe9c	[Hardware][Intel] Add CPU inference backend (#3634 ) Co-authored-by: Kunshang Ji <kunshang.ji@intel.com> Co-authored-by: Yuan Zhou <yuan.zhou@intel.com>	2024-04-01 22:07:30 -07:00
youkaichao	3492859b68	[CI/Build] update default number of jobs and nvcc threads to avoid overloading the system (#3675 )	2024-03-28 00:18:54 -04:00
youkaichao	8f44facddd	[Core] remove cupy dependency (#3625 )	2024-03-27 00:33:26 -07:00
SangBin Cho	01bfb22b41	[CI] Try introducing isort. (#3495 )	2024-03-25 07:59:47 -07:00
youkaichao	42bc386129	[CI/Build] respect the common environment variable MAX_JOBS (#3600 )	2024-03-24 17:04:00 -07:00
Zhuohan Li	523e30ea0c	[BugFix] Hot fix in setup.py for neuron build (#3537 )	2024-03-20 17:59:52 -07:00
bnellnm	ba8ae1d84f	Check for _is_cuda() in compute_num_jobs (#3481 )	2024-03-20 10:06:56 -07:00
bnellnm	9fdf3de346	Cmake based build system (#2830 )	2024-03-18 15:38:33 -07:00
Woosuk Kwon	abfc4f3387	[Misc] Use dataclass for InputMetadata (#3452 ) Co-authored-by: youkaichao <youkaichao@126.com>	2024-03-17 10:02:46 +00:00
Simon Mo	6b78837b29	Fix setup.py neuron-ls issue (#2671 )	2024-03-16 16:00:25 -07:00
Simon Mo	8e67598aa6	[Misc] fix line length for entire codebase (#3444 )	2024-03-16 00:36:29 -07:00
youkaichao	604f235937	[Misc] add error message in non linux platform (#3438 )	2024-03-15 21:21:37 +00:00
陈序	739c350c19	[Minor Fix] Use cupy-cuda11x in CUDA 11.8 build (#3256 )	2024-03-13 09:43:24 -07:00
Zhuohan Li	2f8844ba08	Re-enable the 80 char line width limit (#3305 )	2024-03-10 19:49:14 -07:00
Woosuk Kwon	1cb0cc2975	[FIX] Make `flash_attn` optional (#3269 )	2024-03-08 10:52:20 -08:00
Woosuk Kwon	2daf23ab0c	Separate attention backends (#3005 )	2024-03-07 01:45:50 -08:00
Robert Shaw	c0c2335ce0	Integrate Marlin Kernels for Int4 GPTQ inference (#2497 ) Co-authored-by: Robert Shaw <114415538+rib-2@users.noreply.github.com> Co-authored-by: alexm <alexm@neuralmagic.com>	2024-03-01 12:47:51 -08:00
Billy Cao	2c08ff23c0	Fix building from source on WSL (#3112 )	2024-02-29 11:13:58 -08:00
Philipp Moritz	cfc15a1031	Optimize Triton MoE Kernel (#2979 ) Co-authored-by: Cade Daniel <edacih@gmail.com>	2024-02-26 13:48:56 -08:00
James Whedbee	264017a2bf	[ROCm] include gfx908 as supported (#2792 )	2024-02-19 17:58:59 -08:00
Hongxia Yang	0580aab02f	[ROCm] support Radeon™ 7900 series (gfx1100) without using flash-attention (#2768 )	2024-02-10 23:14:37 -08:00
Philipp Moritz	931746bc6d	Add documentation on how to do incremental builds (#2796 )	2024-02-07 14:42:02 -08:00
Woosuk Kwon	f0d4e14557	Add fused top-K softmax kernel for MoE (#2769 )	2024-02-05 17:38:02 -08:00

1 2

96 Commits