20231088/vllm - vllm - Luminance Code Repo

20231088/vllm

Author	SHA1	Message	Date
Benjamin Kitor	82eb61dd4c	[misc] use tqdm.auto where appropriate (#16290 ) Signed-off-by: Benjamin Kitor <bkitor@gigaio.com>	2025-04-09 21:54:54 -07:00
yihong	2de4118243	fix: change GB to GiB in logging close #14979 (#15807 ) Signed-off-by: yihong0618 <zouzou0208@gmail.com>	2025-03-31 10:00:50 -07:00
Varun Sundar Rabindranath	6c663dfd5e	[misc] LoRA - Skip LoRA kernels when not required (#15152 ) Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com> Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>	2025-03-26 11:33:45 +08:00
Cyrus Leung	f6137adbcb	Revert "[Bugfix] Limit profiling run sequence length by max_model_len (#14785 ) (#14892 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-03-16 09:13:46 -07:00
Kyle Sayers	d30aa7e9e6	[Bugfix] Limit profiling run sequence length by max_model_len (#14785 ) Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>	2025-03-16 07:44:19 -07:00
Jee Jee Li	b8b0ccbd2d	[Bugfix] Make the deviceprofiler include LoRA memory. (#14469 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-03-08 07:12:22 +00:00
Jun Duan	82fbeae92b	[Misc] Accurately capture the time of loading weights (#14063 ) Signed-off-by: Jun Duan <jun.duan.phd@outlook.com>	2025-03-01 17:20:30 -08:00
Benjamin Chislett	9804145cac	[Model][Speculative Decoding] Expand DeepSeek MTP code to support k > n_predict (#13626 ) Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai>	2025-02-27 15:28:08 -08:00
Joe Runde	3f808cc044	[Bugfix] Do not crash V0 engine on input errors (#13101 ) Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>	2025-02-26 19:07:29 +08:00
Harry Mellor	cdc1fa12eb	Remove unused kwargs from model definitions (#13555 )	2025-02-24 17:13:52 -08:00
Jun Duan	68d535ef44	[Misc] Capture and log the time of loading weights (#13666 )	2025-02-21 22:06:34 -08:00
Lucia Fang	f525c0be8b	[Model][Speculative Decoding] DeepSeek MTP spec decode (#12755 ) Signed-off-by: Lu Fang <fanglu@fb.com> Co-authored-by: LiuXiaoxuanPKU <lilyliupku@gmail.com>	2025-02-19 17:06:23 +08:00
Zhe Zhang	fdc5df6f54	use device param in load_model method (#13037 )	2025-02-19 16:05:02 +08:00
shangmingc	913df14da3	[Bugfix] Remove unused seq_group_metadata_list from ModelInputForGPU (#12935 ) Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>	2025-02-08 14:46:19 +00:00
Roger Wang	bf3b79efb8	[VLM] Qwen2.5-VL	2025-02-05 13:31:38 -08:00
Cody Yu	cf58b9c4ca	[MISC] Remove model input dumping when exception (#12582 ) Signed-off-by: Cody Yu <hao.yu.cody@gmail.com>	2025-02-03 13:34:16 -08:00
Russell Bryant	e489ad7a21	[Misc] Add SPDX-License-Identifier headers to python source files (#12628 ) - Add SPDX license headers to python source files - Check for SPDX headers using pre-commit commit 9d7ef44c3cfb72ca4c32e1c677d99259d10d4745 Author: Russell Bryant <rbryant@redhat.com> Date: Fri Jan 31 14:18:24 2025 -0500 Add SPDX license headers to python source files This commit adds SPDX license headers to python source files as recommended to the project by the Linux Foundation. These headers provide a concise way that is both human and machine readable for communicating license information for each source file. It helps avoid any ambiguity about the license of the code and can also be easily used by tools to help manage license compliance. The Linux Foundation runs license scans against the codebase to help ensure we are in compliance with the licenses of the code we use, including dependencies. Having these headers in place helps that tool do its job. More information can be found on the SPDX site: - https://spdx.dev/learn/handling-license-info/ Signed-off-by: Russell Bryant <rbryant@redhat.com> commit 5a1cf1cb3b80759131c73f6a9dddebccac039dea Author: Russell Bryant <rbryant@redhat.com> Date: Fri Jan 31 14:36:32 2025 -0500 Check for SPDX headers using pre-commit Signed-off-by: Russell Bryant <rbryant@redhat.com> --------- Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-02-02 11:58:18 -08:00
fade_away	cb3e73e4c8	[BugFix] fix wrong output when using lora and num_scheduler_steps=8 (#11161 ) FIX issue https://github.com/vllm-project/vllm/issues/9688 https://github.com/vllm-project/vllm/issues/11086 #12487 --------- Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Co-authored-by: weilong.yu <weilong.yu@shopee.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>	2025-02-01 12:52:07 +08:00
Lucas Wilkinson	cabaf4eff3	[Attention] MLA decode optimizations (#12528 ) Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com> Signed-off-by: simon-mo <xmo@berkeley.edu> Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Co-authored-by: simon-mo <simon.mo@hey.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Zhuohan Li <zhuohan123@gmail.com> Co-authored-by: Tyler Michael Smith <tysmith@redhat.com> Co-authored-by: Alexander Matveev <59768536+alexm-neuralmagic@users.noreply.github.com> Co-authored-by: simon-mo <xmo@berkeley.edu>	2025-01-30 23:49:37 -08:00
youkaichao	6dd94dbe94	[perf] fix perf regression from #12253 (#12380 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-01-24 11:34:27 +08:00
Gregory Shtrasberg	e97f802b2d	[FP8][Kernel] Dynamic kv cache scaling factors computation (#11906 ) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com> Co-authored-by: Micah Williamson <micah.williamson@amd.com>	2025-01-23 18:04:03 +00:00
youkaichao	6e650f56a1	[torch.compile] decouple compile sizes and cudagraph sizes (#12243 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-01-24 02:01:30 +08:00
youkaichao	66818e5b63	[core] separate builder init and builder prepare for each batch (#12253 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-01-22 14:13:52 +08:00
Cyrus Leung	59a0192fb9	[Core] Interface for accessing model from `VllmRunner` (#10353 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-01-20 15:00:59 +08:00
Elfie Guo	0794e7446e	[Misc] Add multipstep chunked-prefill support for FlashInfer (#10467 )	2025-01-15 12:47:49 +08:00
Chen Zhang	cf5f000d21	[torch.compile] Hide KV cache behind torch.compile boundary (#11677 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-01-10 13:14:42 +08:00
Yan Burman	300acb8347	[Core][Bugfix] Use correct device to initialize GPU data during CUDA-graph-capture (#11233 ) Signed-off-by: Yan Burman <yanburman@users.noreply.github.com> Signed-off-by: Ido Asraff <idoa@atero.ai>	2025-01-04 14:50:16 +08:00
bjmsong	187e32997c	[Bugfix] Change kv scaling factor by param json on nvidia gpu (#11688 ) Signed-off-by: bjmsong <bjmsong@126.com> Co-authored-by: bjmsong <bjmsong@126.com>	2025-01-02 21:11:39 +00:00
Michael Goin	b880ffb87e	[Misc] Add tqdm progress bar during graph capture (#11349 ) Signed-off-by: mgoin <michael@neuralmagic.com>	2024-12-20 04:35:18 +00:00
youkaichao	be39e3cd18	[core] clean up cudagraph batchsize padding logic (#10996 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-12-13 06:57:50 +00:00
youkaichao	91642db952	[torch.compile] use depyf to dump torch.compile internals (#10972 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-12-11 10:43:05 -08:00
Aurick Qiao	d5c5154fcf	[Misc] LoRA + Chunked Prefill (#9057 )	2024-12-11 10:09:20 +08:00
youkaichao	dcdc3fafe5	[ci] fix broken tests (#10956 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-12-06 11:25:47 -08:00
youkaichao	dc5ce861bf	[torch.compile] remove compilation_context and simplify code (#10838 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-12-03 06:19:02 +00:00
Kuntai Du	0590ec3fd9	[Core] Implement disagg prefill by StatelessProcessGroup (#10502 ) This PR provides initial support for single-node disaggregated prefill in 1P1D scenario. Signed-off-by: KuntaiDu <kuntai@uchicago.edu> Co-authored-by: ApostaC <yihua98@uchicago.edu> Co-authored-by: YaoJiayi <120040070@link.cuhk.edu.cn>	2024-12-01 19:01:00 -06:00
Maximilien de Bayser	214efc2c3c	Support Cross encoder models (#10400 ) Signed-off-by: Max de Bayser <maxdebayser@gmail.com> Signed-off-by: Max de Bayser <mbayser@br.ibm.com> Signed-off-by: Flavia Beo <flavia.beo@ibm.com> Co-authored-by: Flavia Beo <flavia.beo@ibm.com>	2024-11-24 18:56:20 -08:00
youkaichao	eebad39f26	[torch.compile] support all attention backends (#10558 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-11-22 14:04:42 -08:00
youkaichao	7851b45196	[5/N][torch.compile] torch.jit.script --> torch.compile (#10406 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-11-18 23:20:06 +08:00
youkaichao	51bb12d17b	[4/N][torch.compile] clean up set_torch_compile_backend (#10401 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-11-17 23:57:20 -08:00
youkaichao	4fd9375028	[2/N][torch.compile] make compilation cfg part of vllm cfg (#10383 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-11-16 18:02:14 -08:00
电脑星人	361c29e174	[Bugfix] Fix M-RoPE position calculation when chunked prefill is enabled (#10388 ) Signed-off-by: imkero <kerorek@outlook.com>	2024-11-17 02:10:00 +08:00
Cyrus Leung	0b8bb86bf1	[1/N] Initial prototype for multi-modal processor (#10044 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-11-13 12:39:03 +00:00
zifeitong	47db6ec831	[Frontend] Add per-request number of cached token stats (#10174 )	2024-11-12 16:42:28 +00:00
Cyrus Leung	e0191a95d8	[0/N] Rename `MultiModalInputs` to `MultiModalKwargs` (#10040 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-11-09 11:31:02 +08:00
Cyrus Leung	db7db4aab9	[Misc] Consolidate ModelConfig code related to HF config (#10104 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-11-07 06:00:21 +00:00
Wallas Henrique	966e31697b	[Bugfix] Fix pickle of input when async output processing is on (#9931 ) Signed-off-by: Wallas Santos <wallashss@ibm.com>	2024-11-06 00:39:26 +00:00
Chenghao (Alan) Yang	09d3550372	[Misc] Add logging for CUDA memory (#10027 ) Signed-off-by: Chenghao Yang <yangalan1996@gmail.com> Signed-off-by: youkaichao <youkaichao@gmail.com> Co-authored-by: Chenghao Yang <yangalan1996@gmail.com> Co-authored-by: youkaichao <youkaichao@gmail.com>	2024-11-05 09:50:50 -08:00
youkaichao	cea808f325	[3/N] model runner pass the whole config to model (#9958 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-11-02 12:08:49 -07:00
youkaichao	e893795443	[2/N] executor pass the complete config to worker/modelrunner (#9938 ) Signed-off-by: youkaichao <youkaichao@gmail.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2024-11-02 07:35:05 -07:00
Peter Salas	6c0b7f548d	[Core][VLM] Add precise multi-modal placeholder tracking (#8346 ) Signed-off-by: Peter Salas <peter@fixie.ai>	2024-11-01 16:21:10 -07:00

1 2 3 4 5

204 Commits