20231088/vllm - vllm - Luminance Code Repo

20231088/vllm

Author	SHA1	Message	Date
Russell Bryant	dc1b4a6f13	[Core][V0] Enable regex support with xgrammar (#13228 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-04-14 10:13:38 +08:00
Jennifer Zhao	63d2705edb	[Benchmark][Bugfix] Fix SonnetDataset default values in benchmark_throughput.py (#16556 )	2025-04-13 17:20:26 -07:00
Michael Goin	d085a44082	Enable PTPC FP8 for CompressedTensorsW8A8Fp8MoEMethod (triton fused_moe) (#16537 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-04-13 14:55:18 +00:00
Lily Liu	f49e5aff11	[V1][Spec Decode] KV cache slots for eagle heads (#16370 ) Signed-off-by: LiuXiaoxuanPKU <lilyliupku@gmail.com>	2025-04-12 19:42:51 -07:00
Ryan McConville	6c11ecf8d3	[Bugfix] Validate logit biases to prevent out of vocab ids crashing engine (#16529 ) Signed-off-by: Ryan McConville <ryan@ryanmcconville.com>	2025-04-12 20:19:19 +00:00
SnowCharm	93e5f3c5fb	[Perf] Optimize Preparing Inputs for GPU Model Runner (#16484 ) Signed-off-by: snowcharm <snowcharmqq@gmail.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-04-12 22:54:37 +08:00
Jie Fu (傅杰)	70363bccfa	Fix syntaxWarning: invalid escape sequence '\s' (#16532 ) Signed-off-by: Jie Fu <jiefu@tencent.com>	2025-04-12 14:39:42 +00:00
Jee Jee Li	3cdc57669f	[Misc] Delete redundant code (#16530 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-04-12 11:21:37 +00:00
Huazhong Ji	68bb122eb4	[MISC] Make GroupCoordinator compatible with out-of-tree devices (#16464 ) Signed-off-by: hzji210@gmail.com <hzji210@gmail.com>	2025-04-12 09:20:25 +00:00
Cyrus Leung	d9fc8cd9da	[V1] Enable multi-input by default (#15799 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-04-12 08:52:39 +00:00
Nicolò Lucchesi	f069f3ea74	[Misc] Openai transcription client example use same Whisper model (#16487 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-04-12 07:27:03 +00:00
Cyrus Leung	c5bc0e7fcc	[Misc] Update chat utils tests (#16520 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-04-12 06:48:43 +00:00
Tianer Zhou	4a3a518722	fix: spelling (#16466 ) Signed-off-by: Tianer Zhou <ezhoureal@gmail.com>	2025-04-11 23:24:22 -07:00
wang.yuqi	fbf722c6e6	[Frontend] support matryoshka representation / support embedding API dimensions (#16331 )	2025-04-11 23:23:10 -07:00
leon-seidel	e92d7085bf	[Feature][V1] Add xgrammar to support minLength, maxLength with test (#16516 ) Signed-off-by: Leon Seidel <leon.seidel@fau.de>	2025-04-11 23:22:07 -07:00
Michael Goin	bd6028d6b0	Optimized topk for topk=1 (Llama-4) (#16512 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-04-12 14:21:08 +08:00
Ye (Charlotte) Qi	802329dee9	[Doc] Update Llama4 Model Names in Supported Models (#16509 ) Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>	2025-04-12 02:53:10 +00:00
Nick Hill	41cc883c29	[BugFix] Handle non-contiguous tensors properly when serializing (#16492 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-04-11 17:54:06 -07:00
Michael Goin	57504a4bcf	[CI][Bugfix] Add mistral_tool_use to Ci (#16517 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-04-11 17:52:38 -07:00
Yuan Tang	ed4792c990	[Doc] Fix link to vLLM blog (#16519 ) Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2025-04-11 17:39:23 -07:00
Michael Goin	87b836ba77	Bugfix for PixtralHF models without spatial_merge_size (#16513 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-04-11 23:32:22 +00:00
rongfu.leng	56c76c2e0e	[Bugfix] clean up duplicated code (#16485 ) Signed-off-by: Gogs <gogs@fake.local> Co-authored-by: Gogs <gogs@fake.local>	2025-04-11 23:19:40 +00:00
Christian Sears	c09632a66c	Update openai_compatible_server.md (#16507 ) Signed-off-by: Christian Sears <csears@redhat.com>	2025-04-11 22:54:58 +00:00
Yong Hoon Shin	a3bf8d4a2b	[Kernel] Add tuned FusedMoE kernel config for Llama4 Scout, TP=8 on H100 (#16488 )	2025-04-12 06:26:55 +08:00
Ye (Charlotte) Qi	16eda8c43a	[Frontend] Added chat templates for LLaMa4 pythonic tool calling (#16463 ) Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com> Co-authored-by: Kai Wu <kaiwu@meta.com>	2025-04-12 06:26:17 +08:00
Harry Mellor	cd77382ac1	Improve configs - `LoadConfig` (#16422 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-04-11 20:27:27 +00:00
Travis Johnson	71b9cde010	[Bugfix] handle alignment of encoder_seq_lens in mllama.py (#14784 ) Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>	2025-04-11 19:59:50 +00:00
Isotr0py	5285589f37	[Doc] Document InternVL3 support (#16495 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-04-11 19:41:09 +00:00
Michael Goin	f41647ee6b	[Kernel] Support W8A8 channel-wise weights and per-token activations in triton fused_moe_kernel (#16366 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-04-11 17:54:08 +00:00
Nicolò Lucchesi	4d022cbc75	[TPU][V1] Make `--disable_chunked_mm_input` mandatory for serving MM models (#16483 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-04-11 17:06:14 +00:00
Richard Zou	70de35a881	Fix erroneous "model doesn't support compile" warning (#16486 ) Signed-off-by: rzou <zou3519@gmail.com>	2025-04-11 16:24:36 +00:00
Tomasz Zielinski	34b2cf3b33	[Hardware][Intel-Gaudi] Multi-step scheduling implementation for HPU (#12779 ) Signed-off-by: Tomasz Zielinski <tomasz.zielinski@intel.com>	2025-04-11 07:38:36 -07:00
chaow-amd	9e90c9f73f	[Bugfix] Fix bugs of running Quark quantized models (#16236 ) Signed-off-by: chaow <chaow@amd.com>	2025-04-11 10:18:32 -04:00
DefTruth	e9528f6dc6	[Kernel] support merge_attn_states CUDA kernel, 3x speedup (#16173 ) Signed-off-by: DefTruth <qiustudent_r@163.com>	2025-04-11 06:50:50 -06:00
Harry Mellor	51baa9c333	Don't install triton on `ppc64le` platform (#16470 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-04-11 10:11:00 +00:00
Reid	35e076b3a8	[Misc] update api_client example (#16459 ) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com>	2025-04-11 10:05:40 +00:00
Jee Jee Li	a26f59ccbc	[Misc] Raise error for V1 not supporting Long LoRA. (#16415 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-04-11 01:51:20 -07:00
Michael Goin	aa3b3d76e0	Enforce valid max_num_batched_tokens when disable_chunked_mm_input=True (#16447 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-04-11 08:09:52 +00:00
Jee Jee Li	f7030df3be	[Core][LoRA][1/N] Add LoRA for EncoderDecoderModelRunner (#15990 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-04-11 15:32:37 +08:00
DefTruth	905e91e9ac	Revert "[Model] use AutoWeightsLoader for deepseek_v2, internlm2" (#16453 )	2025-04-11 06:44:22 +00:00
Alex Brooks	f8f9c0ba62	[Bugfix] Don't set an upper bound on repetition penalty (#16403 ) Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-04-11 14:19:40 +08:00
Li, Jiang	dda811021a	[CPU][Bugfix] Fix CPU docker issues (#16454 ) Signed-off-by: jiang.li <jiang1.li@intel.com>	2025-04-11 14:19:07 +08:00
Isotr0py	93195146ea	[Bugfix][VLM] Fix failing Phi-4-MM multi-images tests and add vision-speech test (#16424 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-04-11 04:57:16 +00:00
Michael Goin	ed37599544	Update supported_hardware.md for TPU INT8 (#16437 )	2025-04-11 12:28:07 +08:00
Yong Hoon Shin	99ef59cf7f	[Llama4] Enable attention temperature tuning by default for long context (>32k) (#16439 ) Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com> Co-authored-by: Ye (Charlotte) Qi <yeq@meta.com>	2025-04-10 21:26:07 -07:00
Chenyaaang	d544d141ec	update benchmark_serving_structured_output to include auto backend (#16438 ) Signed-off-by: Chenyaaang <chenyangli@google.com>	2025-04-11 12:25:52 +08:00
Alexey Belyakov	3e397a9484	check input length of sonnet samples (#16423 ) Signed-off-by: alexey-belyakov <alexey.belyakov@intel.com>	2025-04-11 10:15:06 +08:00
WWW	268c325078	Fix range_ratio Bug in RandomDataset (#16126 ) Signed-off-by: jadewang21 <jadewangcn@outlook.com>	2025-04-10 15:31:17 -07:00
Nicolò Lucchesi	3cc9af88ff	[TPU][V1] Disable per-request seed/Generator (#16172 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-04-10 17:05:44 -04:00
look	7cd0bd7212	[Bugfix] Fix output token length check logic (#16419 ) Signed-off-by: look <eeslook@163.com>	2025-04-10 20:16:48 +00:00

1 2 3 4 5 ...

5817 Commits