20231088/vllm - vllm - Luminance Code Repo

20231088/vllm

Author	SHA1	Message	Date
Kebe	e11880deea	[Bugfix] Remove triton do_bench fast_flush arg (#16256 ) Signed-off-by: Kebe <mail@kebe7jun.com>	2025-04-08 13:51:06 +00:00
TY-AMD	9351f91be9	[BugFix][ROCm] Fix GGUF MoE Dispatch Block_Dim for ROCm (#16247 ) Signed-off-by: Tianyuan Wu <Tianyuan.Wu@amd.com>	2025-04-08 05:10:26 -07:00
rongfu.leng	5a1e1c8353	[Model] use AutoWeightsLoader for phimoe,qwen2_moe,qwen3_moe (#16203 ) Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>	2025-04-08 04:05:47 -07:00
Alex Brooks	69ecaa7c79	[Misc] Add warning for multimodal data in LLM.beam_search (#16241 ) Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>	2025-04-08 04:05:27 -07:00
Reid	7f00899ff7	[Misc] format and refactor some examples (#16252 ) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com>	2025-04-08 10:42:32 +00:00
Simon Mo	995e3d1f41	[Docs] Add Slides from Singapore Meetup (#16213 ) Signed-off-by: simon-mo <simon.mo@hey.com>	2025-04-08 07:20:22 +00:00
Kebe	b4ac449a83	[Misc] Merge the logs of pp layers partitions (#16225 ) Signed-off-by: Kebe <mail@kebe7jun.com>	2025-04-08 00:18:15 -07:00
Michael Goin	8e5314a468	[V1] Add `disable_chunked_mm_input` arg to disable partial mm input prefill (#15837 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-04-07 23:24:07 -07:00
Siyuan Liu	87918e40c4	[torch.compile][TPU] Make @support_torch_compile work for XLA backend (#15782 ) Signed-off-by: Siyuan Liu <lsiyuan@google.com> Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: mgoin <mgoin64@gmail.com>	2025-04-08 14:23:53 +08:00
Isotr0py	f6b32efb7f	[Bugfix] Fix and reorganize broken GGUF tests and bump gguf version (#16194 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-04-08 13:38:13 +08:00
Michael Goin	b99733d092	[Bugfix] Do not skip "empty" parts of chats that are parsable (#16219 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-04-08 05:14:15 +00:00
Yong Hoon Shin	05a015d6a5	Add warning for Attention backends that do not support irope yet (#16212 )	2025-04-08 03:59:26 +00:00
zxfan-cpu	ad971af8c7	[Bugfix] fix use-ep bug to enable ep by dp/tp size > 1 (#16161 )	2025-04-07 20:48:47 -07:00
Roger Wang	f2ebb6f541	[V1] Scatter and gather placeholders in the model runner (#16076 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: Roger Wang <ywang@roblox.com> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: mgoin <mgoin64@gmail.com> Co-authored-by: Jennifer Zhao <ai.jenniferzhao@gmail.com>	2025-04-08 10:43:41 +08:00
Satyajith Chilappagari	1d01211264	Update BASE_IMAGE to 2.22 release of Neuron (#16218 )	2025-04-07 19:11:18 -07:00
Miles Williams	f94ab12f79	[Misc] Update compressed-tensors to version 0.9.3 (#16196 ) Signed-off-by: Miles Williams <42222518+mlsw@users.noreply.github.com>	2025-04-07 19:09:06 -07:00
youkaichao	a865bc1ca6	[core] do not send error across process (#16174 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-04-07 19:09:03 -07:00
Michael Goin	21802c4b6d	[ROCm][Bugfix][FP8] Make fp8 quant respect fused modules mapping (#16031 ) Signed-off-by: mgoin <michael@neuralmagic.com>	2025-04-07 21:28:14 -04:00
Driss Guessous	652907b354	Torchao (#14231 ) Signed-off-by: drisspg <drisspguessous@gmail.com>	2025-04-07 19:39:28 -04:00
leon-seidel	24f1c01e0f	[Bugfix][V0] XGrammar structured output supports Enum (#15878 ) Signed-off-by: Leon Seidel <leon.seidel@fau.de>	2025-04-07 22:38:25 +00:00
Reid	fad6e2538e	[Misc] add description attribute in CLI (#15921 ) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com>	2025-04-07 22:30:35 +00:00
Nick Hill	7f6d47c1a2	[V1][BugFix] Exit properly if engine core fails during startup (#16137 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-04-07 15:30:15 -07:00
Benjamin Chislett	3147586ebd	[Bugfix] Fix guidance backend for Qwen models (#16210 ) Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai>	2025-04-07 22:15:43 +00:00
Roger Wang	ed636d99ca	[Misc] Move Llama 4 projector call into encoder execution (#16201 )	2025-04-07 14:02:05 -07:00
Nicolò Lucchesi	090c856d76	[Misc] Human-readable `max-model-len` cli arg (#16181 ) Signed-off-by: NickLucche <nlucches@redhat.com> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-04-07 14:40:58 -04:00
Gregory Shtrasberg	ad434d4cfe	Print the warning only once (#16193 ) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>	2025-04-07 18:30:06 +00:00
Cyrus Leung	66d433b94f	[V1] Revert the default `max_num_seqs` to V0 values for most hardware (#16158 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-04-07 13:54:36 -04:00
Cyrus Leung	027b204ff1	[Bugfix] Re-enable support for `ChatGLMForConditionalGeneration` (#16187 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-04-07 23:15:58 +08:00
Lu Fang	55dcce91df	Upstream Llama4 Support to Main (#16113 ) Signed-off-by: Aston Zhang <22279212+astonzhang@users.noreply.github.com> Signed-off-by: Chris Thi <chris.c.thi@gmail.com> Signed-off-by: drisspg <drisspguessous@gmail.com> Signed-off-by: Jon Swenson <jmswen@gmail.com> Signed-off-by: Keyun Tong <tongkeyun@gmail.com> Signed-off-by: Lu Fang <fanglu@meta.com> Signed-off-by: Xiaodong Wang <xdwang@meta.com> Signed-off-by: Yang Chen <yangche@fb.com> Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com> Signed-off-by: Yong Hoon Shin <yhshin@meta.com> Signed-off-by: Zijing Liu <liuzijing2014@gmail.com> Signed-off-by: Lu Fang <lufang@fb.com> Signed-off-by: Lu Fang <fanglu@fb.com> Signed-off-by: Lucia Fang <fanglu@fb.com> Signed-off-by: Roger Wang <ywang@roblox.com> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: Lu Fang <fanglu@fb.com> Co-authored-by: Roger Wang <ywang@roblox.com> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-04-07 08:06:27 -07:00
Robin	8017c8db7f	[Doc]Update image to latest version (#16186 ) Signed-off-by: WangErXiao <863579016@qq.com>	2025-04-07 14:17:39 +00:00
Reid	dc3529dbf6	[Misc] improve example mlpspeculator and llm_engine_example (#16175 ) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com>	2025-04-07 11:53:52 +00:00
YamPengLi	7699258ef0	[Model] Add Qwen3 and Qwen3MoE (#15289 ) Signed-off-by: YamPengLi <yampayne.lyp@alibaba-inc.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2025-04-07 04:06:41 -07:00
Shanshan Shen	e9ba99f296	[V1][Structured Output] Add `supports_structured_output()` method to Platform (#16148 ) Signed-off-by: shen-shanshan <467638484@qq.com>	2025-04-07 11:06:24 +00:00
Isotr0py	7c80368710	[VLM] Florence-2 supports online serving (#16164 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-04-07 04:04:02 -07:00
yihong	95d63f38c0	doc: fix some typos in doc (#16154 ) Signed-off-by: yihong0618 <zouzou0208@gmail.com>	2025-04-07 05:32:06 +00:00
Roger Wang	bb8dab821e	[CI] Set max transformers version for Ultravox model test (#16149 ) Signed-off-by: Roger Wang <ywang@roblox.com>	2025-04-07 04:37:58 +00:00
Isotr0py	fc0f87768a	[Bugfix] Make dummy encoder prompt padding alternative and add missing warnings (#16129 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-04-07 04:07:15 +00:00
Cyrus Leung	0a57386721	[Misc] Update Mistral-3.1 example (#16147 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-04-07 03:57:37 +00:00
Woosuk Kwon	3749e28774	[V1][Minor] Minor simplification for get_computed_blocks (#16139 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-04-06 20:38:12 -07:00
Kay Yan	86fc2321ff	[Metrics] Add bucket for `request_latency`, `time_to_first_token` and `time_per_output_token` (#15202 ) Signed-off-by: Kay Yan <kay.yan@daocloud.io>	2025-04-06 20:34:51 -07:00
Martin Hoyer	2549c0dfef	Fix requires-python (#16132 )	2025-04-06 19:22:25 -07:00
Woosuk Kwon	b10e519895	[V1][Minor] Optimize get_cached_block (#16135 )	2025-04-06 20:48:14 +00:00
Chengji Yao	9bde5ba127	[TPU] Update PyTorch/XLA (#16130 ) Signed-off-by: Chengji Yao <chengjiyao@google.com>	2025-04-06 18:25:55 +00:00
Reid	72c8f1ad04	[Misc] update requires-python in pyproject.toml (#16116 ) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com>	2025-04-06 14:56:34 +00:00
paolovic	da224daaa9	[Bugfix] add hf_token to EngineArgs (#16093 ) Signed-off-by: paolovic <paul-philipp.luley@uzh.ch> Co-authored-by: paolovic <paul-philipp.luley@uzh.ch>	2025-04-06 14:47:33 +00:00
Varun Sundar Rabindranath	3a100b9278	[Bugfix] LoRA : Fix the order in which the kernels process LoRAs (#16040 ) Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com> Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>	2025-04-06 14:04:50 +00:00
rongfu.leng	242a637aea	[Model] use AutoWeightsLoader for stablelm,starcoder2,zamba2 (#16103 ) Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>	2025-04-06 05:52:01 -07:00
Isotr0py	c2a9671510	[Misc] Improve model redirect to accept json dictionary (#16119 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-04-06 05:51:45 -07:00
Paul Schweigert	d5ae4f7f42	[Doc][Bugfix] Add missing EOF in k8s deploy doc (#16025 )	2025-04-06 12:10:57 +00:00
Reid	b6c502a150	[Misc] refactor example eagle (#16100 ) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com>	2025-04-06 09:42:48 +00:00

... 3 4 5 6 7 ...

5902 Commits