20231088/vllm - vllm - Luminance Code Repo

20231088/vllm

Author	SHA1	Message	Date
wwl2755	1c2bec0f82	[Doc] add load_format items in docs (#14804 ) Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>	2025-03-21 22:36:43 -07:00
Woosuk Kwon	2b22290ce0	[V1] Add flag to disable cascade attention (#15243 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-03-20 15:24:16 -07:00
Wang Ran (汪然)	bfe2fe0af4	typo: Update config.py (#15189 )	2025-03-19 23:31:21 -07:00
Matt Ritter	a8652f4f0f	Enable CUDA graph support for llama 3.2 vision (#14917 ) Signed-off-by: Matt Ritter <100659061+mritterfigma@users.noreply.github.com>	2025-03-19 23:29:16 -07:00
Russell Bryant	1f16b7fe74	[Core][V0] Add guidance backend for structured output (#14589 ) Signed-off-by: Russell Bryant <rbryant@redhat.com> Co-authored-by: Loc Huynh <lohuynh@microsoft.com> Co-authored-by: Michal Moskal <michal@moskal.me> Co-authored-by: Aaron Pham <contact@aarnphm.xyz>	2025-03-19 21:33:51 -07:00
Alexander Matveev	cfbca8a2f2	[V1] TPU - Tensor parallel MP support (#15059 )	2025-03-20 00:55:18 +00:00
Cyrus Leung	f690372b68	[Core] Update dtype detection and defaults (#14858 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-03-19 13:49:33 +08:00
Jee Jee Li	46c759c165	[Bugfix] Fix LoRA extra vocab size (#15047 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-03-18 09:40:29 -07:00
yury-tokpanov	452e8fd968	[MODEL] Add support for Zamba2 models (#13185 ) Signed-off-by: Yury Tokpanov <yury@zyphra.com> Signed-off-by: Quentin Anthony <qganthony@yahoo.com> Co-authored-by: Quentin Anthony <qganthony@yahoo.com> Co-authored-by: Tyler Michael Smith <tysmith@redhat.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2025-03-18 08:56:21 -07:00
hoshi-hiyouga	414919138b	[Bugfix] torchrun compatibility (#14899 ) Signed-off-by: hiyouga <hiyouga@buaa.edu.cn> Signed-off-by: youkaichao <youkaichao@gmail.com> Co-authored-by: youkaichao <youkaichao@gmail.com>	2025-03-18 05:49:27 -07:00
Robert Shaw	d4d93db2c5	[V1] V1 Enablement Oracle (#13726 ) Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com> Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com> Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com> Co-authored-by: Michael Goin <michael@neuralmagic.com>	2025-03-14 22:02:20 -07:00
Michael Goin	14f301b541	Update to torch==2.6.0 (#12721 ) Signed-off-by: mgoin <michael@neuralmagic.com> Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: luka <luka@neuralmagic.com> Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-03-14 16:58:30 -04:00
Varun Sundar Rabindranath	0b1cfa6180	[Kernel] LoRA - Enable CUDAGraphs for V1 (#14626 ) Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com> Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>	2025-03-13 20:42:04 -07:00
Cyrus Leung	f53a0586b9	[Bugfix] Fix prompt format of GLM4V (#14539 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-03-13 11:37:17 +00:00
Mathis Felardos	1bd32bc8dd	[Config][Disaggregated] Add timeout configuration for the torch.store and add KVTransferConfig.kv_connector_extra_config (#14367 ) Signed-off-by: Mathis Felardos <mathis@mistral.ai>	2025-03-12 20:15:20 -07:00
Woosuk Kwon	53be4a8634	[V1] Allow sliding window + prefix caching (#13069 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-03-12 11:21:19 -07:00
Sage Moore	d9f83d6206	[ROCm] Enable chunked prefill/paged attention in MLA on ROCm (#14316 ) Signed-off-by: Sage Moore <sage@neuralmagic.com>	2025-03-12 15:51:20 +00:00
Woosuk Kwon	c0c25e25fa	[Model] Add support for Gemma 3 (#14660 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Signed-off-by: Roger Wang <ywang@roblox.com> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: Roger Wang <ywang@roblox.com> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-03-12 08:36:33 -07:00
Pavani Majety	debd6bbf09	[Kernel] Add ModelOpt FP4 Checkpoint Support (#12520 ) Signed-off-by: Pavani Majety <pmajety@nvidia.com>	2025-03-12 05:13:11 +00:00
Roger Wang	1fc973c0b5	[V1][Core] Fix memory issue with logits & sampling (#14508 ) Signed-off-by: Roger Wang <ywang@roblox.com> Co-authored-by: Varun Sundar Rabindranath <3337719+varun-sundar-rabindranath@users.noreply.github.com>	2025-03-11 04:03:41 +00:00
Harry Mellor	3b352a2f92	Correct capitalisation: `VLLM` -> `vLLM` (#14562 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-03-10 16:36:21 +00:00
Aaron Pham	0b7f06b447	[Misc] add `use_tqdm_on_load` to reduce logs (#14407 ) Signed-off-by: Aaron Pham <contact@aarnphm.xyz>	2025-03-08 05:57:46 -08:00
Harry Mellor	47512b3200	Default to `generation_config` from model (#12622 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-03-08 14:46:15 +08:00
Cyrus Leung	05fb6718f0	[Bugfix] Clean up multi-modal processors (#14417 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-03-07 10:33:38 +00:00
Tyler Michael Smith	cc2f9b32c8	[Distributed] Add enable_expert_parallel arg (#14305 ) Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>	2025-03-06 18:54:45 +00:00
youkaichao	151b08e0fe	[RLHF] use worker_extension_cls for compatibility with V0 and V1 (#14185 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-03-07 00:32:46 +08:00
Congcong Chen	0a995d5434	[Model] New model support for Phi-4-multimodal-instruct (#14119 )	2025-03-04 20:57:01 -08:00
Harry Mellor	cf069aa8aa	Update deprecated Python 3.8 typing (#13971 )	2025-03-02 17:34:51 -08:00
Ce Gao	bf33700ecd	[v0][structured output] Support reasoning output (#12955 ) Signed-off-by: Ce Gao <cegao@tensorchord.ai>	2025-03-02 14:49:42 -05:00
Luka Govedič	bd56c983d6	[torch.compile] Fix RMSNorm + quant fusion in the non-cutlass-fp8 case, rename RedundantReshapesPass to NoopEliminationPass (#10902 ) Signed-off-by: luka <luka@neuralmagic.com>	2025-02-28 16:20:11 -07:00
Roger Wang	6c85da3a18	[V1]`SupportsV0Only` protocol for model definitions (#13959 ) Signed-off-by: Roger Wang <ywang@roblox.com>	2025-02-27 20:02:15 -05:00
Benjamin Chislett	9804145cac	[Model][Speculative Decoding] Expand DeepSeek MTP code to support k > n_predict (#13626 ) Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai>	2025-02-27 15:28:08 -08:00
Cyrus Leung	a2dd48c386	[VLM] Deprecate legacy input mapper for OOT multimodal models (#13979 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-02-27 19:14:55 +00:00
Szymon Ożóg	7f0be2aa24	[Model] Deepseek GGUF support (#13167 )	2025-02-27 02:08:35 -08:00
Sage Moore	1d35662e6d	[ROCm] Disable chunked prefill/prefix caching when running MLA on non-cuda platforms (#13844 ) Signed-off-by: Sage Moore <sage@neuralmagic.com>	2025-02-26 14:56:58 +08:00
cjackal	51010a1807	[Misc] set single whitespace between log sentences (#13771 ) Signed-off-by: cjackal <44624812+cjackal@users.noreply.github.com>	2025-02-25 10:26:12 +08:00
Robert Shaw	f61528d46d	[Misc][Chore] Clean Up `AsyncOutputProcessing` Logs (#13780 )	2025-02-24 16:39:07 -08:00
Robert Shaw	1f0ae3ed0a	[Misc] Clean Up `EngineArgs.create_engine_config` (#13734 ) Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>	2025-02-24 13:52:21 -05:00
Nicolò Lucchesi	444b0f0f62	[Misc][Docs] Raise error when flashinfer is not installed and `VLLM_ATTENTION_BACKEND` is set (#12513 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-02-24 10:43:21 -05:00
Jongseok Park	781096e385	Expert Parallelism (EP) Support for DeepSeek V2 (#12583 )	2025-02-24 07:33:20 -08:00
youkaichao	eb24dc4a45	[v1] torchrun compatibility (#13642 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-02-23 22:47:24 +08:00
youkaichao	2382ad29d1	[ci] fix linter (#13701 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-02-22 20:28:59 +08:00
youkaichao	3e472d882a	[core] set up data parallel communication (#13591 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-02-22 19:28:59 +08:00
Mark McLoughlin	2cb8c1540e	[Metrics] Add `--show-hidden-metrics-for-version` CLI arg (#13295 )	2025-02-22 00:20:45 -08:00
Mark McLoughlin	1cd981da4f	[V1][Metrics] Support `vllm:cache_config_info` (#13299 )	2025-02-22 00:20:00 -08:00
Lucas Wilkinson	288cc6c234	[Attention] MLA with chunked prefill (#12639 ) Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com> Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Co-authored-by: Patrick Horn <patrick.horn@gmail.com> Co-authored-by: simon-mo <xmo@berkeley.edu> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>	2025-02-21 15:30:12 -08:00
Michael Goin	71face8540	[Bugfix] Fix max_num_batched_tokens for MLA (#13620 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-02-20 17:45:20 -08:00
Joe Runde	bfbc0b32c6	[Frontend] Add backend-specific options for guided decoding (#13505 ) Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>	2025-02-20 15:07:58 -05:00
Yannick Schnider	423330263b	[Feature] Pluggable platform-specific scheduler (#13161 ) Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com> Signed-off-by: Yannick Schnider <Yannick.Schnider1@ibm.com>	2025-02-19 17:16:38 +08:00
Lucia Fang	f525c0be8b	[Model][Speculative Decoding] DeepSeek MTP spec decode (#12755 ) Signed-off-by: Lu Fang <fanglu@fb.com> Co-authored-by: LiuXiaoxuanPKU <lilyliupku@gmail.com>	2025-02-19 17:06:23 +08:00

1 2 3 4 5 ...

452 Commits