20231088/vllm - vllm - Luminance Code Repo

20231088/vllm

Author	SHA1	Message	Date
Chen Zhang	69d765f5a5	[V1] Move more control of kv cache initialization from model_executor to EngineCore (#11960 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com> Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>	2025-01-17 07:39:35 +00:00
Roger Wang	70755e819e	[V1][Core] Autotune encoder cache budget (#11895 ) Signed-off-by: Roger Wang <ywang@roblox.com>	2025-01-15 11:29:00 -08:00
Chen Zhang	cf5f000d21	[torch.compile] Hide KV cache behind torch.compile boundary (#11677 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-01-10 13:14:42 +08:00
Roger Wang	91b361ae89	[V1] Extend beyond image modality and support mixed-modality inference with Llava-OneVision (#11685 ) Signed-off-by: Roger Wang <ywang@roblox.com> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-01-06 19:58:16 +00:00
Woosuk Kwon	06bfb51963	[V1] Add BlockTable class (#11693 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-01-06 14:24:42 +09:00
Yan Burman	300acb8347	[Core][Bugfix] Use correct device to initialize GPU data during CUDA-graph-capture (#11233 ) Signed-off-by: Yan Burman <yanburman@users.noreply.github.com> Signed-off-by: Ido Asraff <idoa@atero.ai>	2025-01-04 14:50:16 +08:00
Woosuk Kwon	b55ed6ef8a	[V1][Minor] Optimize token_ids_cpu copy (#11692 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-01-02 12:04:58 -07:00
Woosuk Kwon	73001445fb	[V1] Implement Cascade Attention (#11635 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-01-01 21:56:46 +09:00
Roger Wang	e7c7c5e822	[V1][VLM] V1 support for selected single-image models. (#11632 ) Signed-off-by: Roger Wang <ywang@roblox.com> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: Isotr0py <2037008807@qq.com>	2024-12-31 21:17:22 +00:00
sroy745	dcb1a944d4	[V1] Adding min tokens/repetition/presence/frequence penalties to V1 sampler (#10681 ) Signed-off-by: Sourashis Roy <sroy@roblox.com> Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2024-12-26 19:02:58 +09:00
Roger Wang	04139ade59	[V1] Fix profiling for models with merged input processor (#11370 ) Signed-off-by: ywang96 <ywang@roblox.com>	2024-12-20 12:04:21 +00:00
Roger Wang	7379b3d4b2	[V1] Fix multimodal profiling for `Molmo` (#11325 ) Signed-off-by: ywang96 <ywang@example.com> Co-authored-by: ywang96 <ywang@example.com>	2024-12-19 16:27:22 +00:00
Alexander Matveev	fdea8ec167	[V1] VLM - enable processor cache by default (#11305 ) Signed-off-by: Alexander Matveev <alexm@neuralmagic.com>	2024-12-18 18:54:46 -05:00
Roger Wang	59c9b6ebeb	[V1][VLM] Proper memory profiling for image language models (#11210 ) Signed-off-by: Roger Wang <ywang@roblox.com> Co-authored-by: ywang96 <ywang@example.com>	2024-12-16 22:10:57 -08:00
Woosuk Kwon	25ebed2f8c	[V1][Minor] Cache np arange to reduce input preparation overhead (#11214 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2024-12-15 13:33:00 -08:00
Mark McLoughlin	6d917d0eeb	Enable mypy checking on V1 code (#11105 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2024-12-14 09:54:04 -08:00
youkaichao	be39e3cd18	[core] clean up cudagraph batchsize padding logic (#10996 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-12-13 06:57:50 +00:00
Woosuk Kwon	f092153fbe	[V1] Use more persistent buffers to optimize input preparation overheads (#11111 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2024-12-11 23:14:20 -08:00
Woosuk Kwon	d643c2aba1	[V1] Use input_ids as input for text-only models (#11032 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2024-12-11 10:49:23 -08:00
Mor Zusman	ffa48c9146	[Model] PP support for Mamba-like models (#10992 ) Signed-off-by: mzusman <mor.zusmann@gmail.com>	2024-12-10 21:53:37 -05:00
youkaichao	75f89dc44c	[torch.compile] add a flag to track batchsize statistics (#11059 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-12-10 12:40:52 -08:00
Tyler Michael Smith	28b3a1c7e5	[V1] Multiprocessing Tensor Parallel Support for v1 (#9856 ) Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>	2024-12-10 06:28:14 +00:00
youkaichao	1a2f8fb828	[v1] fix use compile sizes (#11000 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-12-09 13:47:24 -08:00
Varun Sundar Rabindranath	25b79d9fd3	[V1] Input Batch Relocation (#10962 ) Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com> Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>	2024-12-09 09:33:41 -08:00
Woosuk Kwon	2a56e1264f	[V1] Fix when max_model_len is not divisible by block_size (#10903 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2024-12-04 16:54:05 -08:00
youkaichao	dc5ce861bf	[torch.compile] remove compilation_context and simplify code (#10838 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-12-03 06:19:02 +00:00
Roger Wang	2f0a0a17a4	[V1] Refactor model executable interface for multimodal models (#10570 ) Signed-off-by: Roger Wang <ywang@roblox.com>	2024-11-26 20:46:11 +00:00
Sage Moore	9a88f89799	custom allreduce + torch.compile (#10121 ) Signed-off-by: youkaichao <youkaichao@gmail.com> Co-authored-by: youkaichao <youkaichao@gmail.com>	2024-11-25 22:00:16 -08:00
youkaichao	eebad39f26	[torch.compile] support all attention backends (#10558 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-11-22 14:04:42 -08:00
Woosuk Kwon	f9310cbd0c	[V1] Fix Compilation config & Enable CUDA graph by default (#10528 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2024-11-21 12:53:39 -08:00
Luka Govedič	8b0fe06c89	[torch.compile] Inductor code caching fix (#10273 ) Signed-off-by: luka <luka@neuralmagic.com> Signed-off-by: Luka Govedic <luka.govedic@gmail.com>	2024-11-20 21:44:57 -08:00
youkaichao	803f37eaaa	[6/N] torch.compile rollout to users (#10437 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-11-19 10:09:03 -08:00
youkaichao	4fd9375028	[2/N][torch.compile] make compilation cfg part of vllm cfg (#10383 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-11-16 18:02:14 -08:00
Cyrus Leung	0b8bb86bf1	[1/N] Initial prototype for multi-modal processor (#10044 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-11-13 12:39:03 +00:00
Woosuk Kwon	bbd3e86926	[V1] Support VLMs with fine-grained scheduling (#9871 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Co-authored-by: Roger Wang <ywang@roblox.com>	2024-11-13 04:53:13 +00:00
Woosuk Kwon	1f55e05713	[V1] Enable Inductor when using piecewise CUDA graphs (#10268 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2024-11-12 13:39:56 -08:00
Woosuk Kwon	9d5b4e4dea	[V1] Enable custom ops with piecewise CUDA graphs (#10228 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2024-11-11 11:58:07 -08:00
Woosuk Kwon	fe15729a2b	[V1] Use custom ops for piecewise CUDA graphs (#10227 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2024-11-11 11:26:48 -08:00
Woosuk Kwon	d7a4f2207b	[V1] Do not use inductor for piecewise CUDA graphs (#10225 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2024-11-11 11:05:57 -08:00
Woosuk Kwon	b5815c8413	[V1] Fix non-cudagraph op name (#10166 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2024-11-08 10:23:04 -08:00
Nick Hill	1fa020c539	[V1][BugFix] Fix Generator construction in greedy + seed case (#10097 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2024-11-07 05:06:57 +00:00
Joe Runde	d58268c56a	[V1] Make v1 more testable (#9888 ) Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>	2024-11-06 11:57:35 -08:00
Woosuk Kwon	4089985552	[V1] Integrate Piecewise CUDA graphs (#10058 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2024-11-05 22:16:04 -08:00
Nick Hill	1f1b6d6eda	[V1] Support per-request seed (#9945 ) Signed-off-by: Nick Hill <nickhill@us.ibm.com>	2024-11-03 09:14:17 -08:00
youkaichao	cea808f325	[3/N] model runner pass the whole config to model (#9958 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-11-02 12:08:49 -07:00
youkaichao	e893795443	[2/N] executor pass the complete config to worker/modelrunner (#9938 ) Signed-off-by: youkaichao <youkaichao@gmail.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2024-11-02 07:35:05 -07:00
Woosuk Kwon	6c5af09b39	[V1] Implement vLLM V1 [1/N] (#9289 )	2024-10-22 01:24:07 -07:00

47 Commits