20231088/vllm - vllm - Luminance Code Repo

20231088/vllm

Author	SHA1	Message	Date
Russell Bryant	cb84e45ac7	[Core] Upgrade to xgrammar 0.1.18, add cache size limit (#16283 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-04-08 19:13:22 -07:00
rongfu.leng	4716377fbc	[Feature] Estimate max-model-len use available KV cache memory (#16168 ) Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>	2025-04-08 19:12:51 -07:00
rongfu.leng	4e9cf8c1dd	[Bugfix] fix gettid method is not define (#16084 ) Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>	2025-04-08 19:12:44 -07:00
TJian	2976dc27e9	[Bug] [ROCm] Fix Llama 4 Enablement Bug on ROCm: V0 ROCmFlashAttentionImpl and Triton Fused MoE bugs (#16198 ) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com> Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com> Co-authored-by: Hongxia Yang <hongxia.yang@amd.com> Co-authored-by: kliuae <kuanfu.liu@embeddedllm.com>	2025-04-08 19:12:34 -07:00
Chauncey	102bf967f0	[Model] Add smolvlm support (#16017 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2025-04-08 19:12:17 -07:00
yueshen2016	1f4b09b525	Add support to modelopt quantization of Mixtral model (#15961 ) Signed-off-by: Yue <yueshen@nvidia.com>	2025-04-09 01:53:31 +00:00
Jee Jee Li	86c3369eb8	[CI/Build] Fix CI LoRA failure (#16270 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-04-09 09:13:56 +08:00
Russell Bryant	2755c34a8f	[V1] Update structured output offline inference example (#15721 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-04-08 22:34:09 +00:00
Jinzhen Lin	db10422184	[Bugfix] fix deepseek fp16 scale bug (#14809 ) Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com> Co-authored-by: mgoin <mgoin64@gmail.com>	2025-04-08 16:56:09 -04:00
Lucas Wilkinson	e1a2c699dd	[BugFix] Fix Llama4 - Index Error When Single Request Near Max Context (#16209 ) Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>	2025-04-08 18:56:51 +00:00
Harry Mellor	0115ccd5c0	Add warning that content below line in template will be removed (#16276 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-04-08 18:18:40 +00:00
Isotr0py	40b4284fe3	[Bugfix] Handle `process_weights_after_loading` for `QKVCrossParallelLinear` (#15328 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-04-08 10:02:23 -07:00
Cyrus Leung	4ebc0b9640	[Bugfix] Proper input validation for multi-modal encoder-decoder models (#16156 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-04-08 09:45:21 -07:00
Kero Liang	dc96fd54c6	[Misc] Avoid stripping meaningful whitespace from `nvidia-smi topo -m` output in collect_env.py (#16272 ) Signed-off-by: imkero <kerorek@outlook.com>	2025-04-08 16:08:09 +00:00
wang.yuqi	1f5d13ab9f	[New Model]: jinaai/jina-embeddings-v3 (#16120 )	2025-04-08 08:39:12 -07:00
Harry Mellor	90cb44eb02	Update to transformers==4.51.1 (#16257 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-04-08 06:53:39 -07:00
Kebe	e11880deea	[Bugfix] Remove triton do_bench fast_flush arg (#16256 ) Signed-off-by: Kebe <mail@kebe7jun.com>	2025-04-08 13:51:06 +00:00
TY-AMD	9351f91be9	[BugFix][ROCm] Fix GGUF MoE Dispatch Block_Dim for ROCm (#16247 ) Signed-off-by: Tianyuan Wu <Tianyuan.Wu@amd.com>	2025-04-08 05:10:26 -07:00
rongfu.leng	5a1e1c8353	[Model] use AutoWeightsLoader for phimoe,qwen2_moe,qwen3_moe (#16203 ) Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>	2025-04-08 04:05:47 -07:00
Alex Brooks	69ecaa7c79	[Misc] Add warning for multimodal data in LLM.beam_search (#16241 ) Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>	2025-04-08 04:05:27 -07:00
Reid	7f00899ff7	[Misc] format and refactor some examples (#16252 ) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com>	2025-04-08 10:42:32 +00:00
Simon Mo	995e3d1f41	[Docs] Add Slides from Singapore Meetup (#16213 ) Signed-off-by: simon-mo <simon.mo@hey.com>	2025-04-08 07:20:22 +00:00
Kebe	b4ac449a83	[Misc] Merge the logs of pp layers partitions (#16225 ) Signed-off-by: Kebe <mail@kebe7jun.com>	2025-04-08 00:18:15 -07:00
Michael Goin	8e5314a468	[V1] Add `disable_chunked_mm_input` arg to disable partial mm input prefill (#15837 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-04-07 23:24:07 -07:00
Siyuan Liu	87918e40c4	[torch.compile][TPU] Make @support_torch_compile work for XLA backend (#15782 ) Signed-off-by: Siyuan Liu <lsiyuan@google.com> Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: mgoin <mgoin64@gmail.com>	2025-04-08 14:23:53 +08:00
Isotr0py	f6b32efb7f	[Bugfix] Fix and reorganize broken GGUF tests and bump gguf version (#16194 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-04-08 13:38:13 +08:00
Michael Goin	b99733d092	[Bugfix] Do not skip "empty" parts of chats that are parsable (#16219 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-04-08 05:14:15 +00:00
Yong Hoon Shin	05a015d6a5	Add warning for Attention backends that do not support irope yet (#16212 )	2025-04-08 03:59:26 +00:00
zxfan-cpu	ad971af8c7	[Bugfix] fix use-ep bug to enable ep by dp/tp size > 1 (#16161 )	2025-04-07 20:48:47 -07:00
Roger Wang	f2ebb6f541	[V1] Scatter and gather placeholders in the model runner (#16076 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: Roger Wang <ywang@roblox.com> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: mgoin <mgoin64@gmail.com> Co-authored-by: Jennifer Zhao <ai.jenniferzhao@gmail.com>	2025-04-08 10:43:41 +08:00
Satyajith Chilappagari	1d01211264	Update BASE_IMAGE to 2.22 release of Neuron (#16218 )	2025-04-07 19:11:18 -07:00
Miles Williams	f94ab12f79	[Misc] Update compressed-tensors to version 0.9.3 (#16196 ) Signed-off-by: Miles Williams <42222518+mlsw@users.noreply.github.com>	2025-04-07 19:09:06 -07:00
youkaichao	a865bc1ca6	[core] do not send error across process (#16174 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-04-07 19:09:03 -07:00
Michael Goin	21802c4b6d	[ROCm][Bugfix][FP8] Make fp8 quant respect fused modules mapping (#16031 ) Signed-off-by: mgoin <michael@neuralmagic.com>	2025-04-07 21:28:14 -04:00
Driss Guessous	652907b354	Torchao (#14231 ) Signed-off-by: drisspg <drisspguessous@gmail.com>	2025-04-07 19:39:28 -04:00
leon-seidel	24f1c01e0f	[Bugfix][V0] XGrammar structured output supports Enum (#15878 ) Signed-off-by: Leon Seidel <leon.seidel@fau.de>	2025-04-07 22:38:25 +00:00
Reid	fad6e2538e	[Misc] add description attribute in CLI (#15921 ) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com>	2025-04-07 22:30:35 +00:00
Nick Hill	7f6d47c1a2	[V1][BugFix] Exit properly if engine core fails during startup (#16137 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-04-07 15:30:15 -07:00
Benjamin Chislett	3147586ebd	[Bugfix] Fix guidance backend for Qwen models (#16210 ) Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai>	2025-04-07 22:15:43 +00:00
Roger Wang	ed636d99ca	[Misc] Move Llama 4 projector call into encoder execution (#16201 )	2025-04-07 14:02:05 -07:00
Nicolò Lucchesi	090c856d76	[Misc] Human-readable `max-model-len` cli arg (#16181 ) Signed-off-by: NickLucche <nlucches@redhat.com> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-04-07 14:40:58 -04:00
Gregory Shtrasberg	ad434d4cfe	Print the warning only once (#16193 ) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>	2025-04-07 18:30:06 +00:00
Cyrus Leung	66d433b94f	[V1] Revert the default `max_num_seqs` to V0 values for most hardware (#16158 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-04-07 13:54:36 -04:00
Cyrus Leung	027b204ff1	[Bugfix] Re-enable support for `ChatGLMForConditionalGeneration` (#16187 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-04-07 23:15:58 +08:00
Lu Fang	55dcce91df	Upstream Llama4 Support to Main (#16113 ) Signed-off-by: Aston Zhang <22279212+astonzhang@users.noreply.github.com> Signed-off-by: Chris Thi <chris.c.thi@gmail.com> Signed-off-by: drisspg <drisspguessous@gmail.com> Signed-off-by: Jon Swenson <jmswen@gmail.com> Signed-off-by: Keyun Tong <tongkeyun@gmail.com> Signed-off-by: Lu Fang <fanglu@meta.com> Signed-off-by: Xiaodong Wang <xdwang@meta.com> Signed-off-by: Yang Chen <yangche@fb.com> Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com> Signed-off-by: Yong Hoon Shin <yhshin@meta.com> Signed-off-by: Zijing Liu <liuzijing2014@gmail.com> Signed-off-by: Lu Fang <lufang@fb.com> Signed-off-by: Lu Fang <fanglu@fb.com> Signed-off-by: Lucia Fang <fanglu@fb.com> Signed-off-by: Roger Wang <ywang@roblox.com> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: Lu Fang <fanglu@fb.com> Co-authored-by: Roger Wang <ywang@roblox.com> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-04-07 08:06:27 -07:00
Robin	8017c8db7f	[Doc]Update image to latest version (#16186 ) Signed-off-by: WangErXiao <863579016@qq.com>	2025-04-07 14:17:39 +00:00
Reid	dc3529dbf6	[Misc] improve example mlpspeculator and llm_engine_example (#16175 ) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com>	2025-04-07 11:53:52 +00:00
YamPengLi	7699258ef0	[Model] Add Qwen3 and Qwen3MoE (#15289 ) Signed-off-by: YamPengLi <yampayne.lyp@alibaba-inc.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2025-04-07 04:06:41 -07:00
Shanshan Shen	e9ba99f296	[V1][Structured Output] Add `supports_structured_output()` method to Platform (#16148 ) Signed-off-by: shen-shanshan <467638484@qq.com>	2025-04-07 11:06:24 +00:00
Isotr0py	7c80368710	[VLM] Florence-2 supports online serving (#16164 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-04-07 04:04:02 -07:00

1 2 3 4 5 ...

5718 Commits