20231088/vllm - vllm - Luminance Code Repo

20231088/vllm

Author	SHA1	Message	Date
Michael Goin	bd6028d6b0	Optimized topk for topk=1 (Llama-4) (#16512 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-04-12 14:21:08 +08:00
Ye (Charlotte) Qi	802329dee9	[Doc] Update Llama4 Model Names in Supported Models (#16509 ) Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>	2025-04-12 02:53:10 +00:00
Nick Hill	41cc883c29	[BugFix] Handle non-contiguous tensors properly when serializing (#16492 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-04-11 17:54:06 -07:00
Michael Goin	57504a4bcf	[CI][Bugfix] Add mistral_tool_use to Ci (#16517 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-04-11 17:52:38 -07:00
Yuan Tang	ed4792c990	[Doc] Fix link to vLLM blog (#16519 ) Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2025-04-11 17:39:23 -07:00
Michael Goin	87b836ba77	Bugfix for PixtralHF models without spatial_merge_size (#16513 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-04-11 23:32:22 +00:00
rongfu.leng	56c76c2e0e	[Bugfix] clean up duplicated code (#16485 ) Signed-off-by: Gogs <gogs@fake.local> Co-authored-by: Gogs <gogs@fake.local>	2025-04-11 23:19:40 +00:00
Christian Sears	c09632a66c	Update openai_compatible_server.md (#16507 ) Signed-off-by: Christian Sears <csears@redhat.com>	2025-04-11 22:54:58 +00:00
Yong Hoon Shin	a3bf8d4a2b	[Kernel] Add tuned FusedMoE kernel config for Llama4 Scout, TP=8 on H100 (#16488 )	2025-04-12 06:26:55 +08:00
Ye (Charlotte) Qi	16eda8c43a	[Frontend] Added chat templates for LLaMa4 pythonic tool calling (#16463 ) Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com> Co-authored-by: Kai Wu <kaiwu@meta.com>	2025-04-12 06:26:17 +08:00
Harry Mellor	cd77382ac1	Improve configs - `LoadConfig` (#16422 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-04-11 20:27:27 +00:00
Travis Johnson	71b9cde010	[Bugfix] handle alignment of encoder_seq_lens in mllama.py (#14784 ) Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>	2025-04-11 19:59:50 +00:00
Isotr0py	5285589f37	[Doc] Document InternVL3 support (#16495 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-04-11 19:41:09 +00:00
Michael Goin	f41647ee6b	[Kernel] Support W8A8 channel-wise weights and per-token activations in triton fused_moe_kernel (#16366 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-04-11 17:54:08 +00:00
Nicolò Lucchesi	4d022cbc75	[TPU][V1] Make `--disable_chunked_mm_input` mandatory for serving MM models (#16483 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-04-11 17:06:14 +00:00
Richard Zou	70de35a881	Fix erroneous "model doesn't support compile" warning (#16486 ) Signed-off-by: rzou <zou3519@gmail.com>	2025-04-11 16:24:36 +00:00
Tomasz Zielinski	34b2cf3b33	[Hardware][Intel-Gaudi] Multi-step scheduling implementation for HPU (#12779 ) Signed-off-by: Tomasz Zielinski <tomasz.zielinski@intel.com>	2025-04-11 07:38:36 -07:00
chaow-amd	9e90c9f73f	[Bugfix] Fix bugs of running Quark quantized models (#16236 ) Signed-off-by: chaow <chaow@amd.com>	2025-04-11 10:18:32 -04:00
DefTruth	e9528f6dc6	[Kernel] support merge_attn_states CUDA kernel, 3x speedup (#16173 ) Signed-off-by: DefTruth <qiustudent_r@163.com>	2025-04-11 06:50:50 -06:00
Harry Mellor	51baa9c333	Don't install triton on `ppc64le` platform (#16470 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-04-11 10:11:00 +00:00
Reid	35e076b3a8	[Misc] update api_client example (#16459 ) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com>	2025-04-11 10:05:40 +00:00
Jee Jee Li	a26f59ccbc	[Misc] Raise error for V1 not supporting Long LoRA. (#16415 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-04-11 01:51:20 -07:00
Michael Goin	aa3b3d76e0	Enforce valid max_num_batched_tokens when disable_chunked_mm_input=True (#16447 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-04-11 08:09:52 +00:00
Jee Jee Li	f7030df3be	[Core][LoRA][1/N] Add LoRA for EncoderDecoderModelRunner (#15990 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-04-11 15:32:37 +08:00
DefTruth	905e91e9ac	Revert "[Model] use AutoWeightsLoader for deepseek_v2, internlm2" (#16453 )	2025-04-11 06:44:22 +00:00
Alex Brooks	f8f9c0ba62	[Bugfix] Don't set an upper bound on repetition penalty (#16403 ) Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-04-11 14:19:40 +08:00
Li, Jiang	dda811021a	[CPU][Bugfix] Fix CPU docker issues (#16454 ) Signed-off-by: jiang.li <jiang1.li@intel.com>	2025-04-11 14:19:07 +08:00
Isotr0py	93195146ea	[Bugfix][VLM] Fix failing Phi-4-MM multi-images tests and add vision-speech test (#16424 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-04-11 04:57:16 +00:00
Michael Goin	ed37599544	Update supported_hardware.md for TPU INT8 (#16437 )	2025-04-11 12:28:07 +08:00
Yong Hoon Shin	99ef59cf7f	[Llama4] Enable attention temperature tuning by default for long context (>32k) (#16439 ) Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com> Co-authored-by: Ye (Charlotte) Qi <yeq@meta.com>	2025-04-10 21:26:07 -07:00
Chenyaaang	d544d141ec	update benchmark_serving_structured_output to include auto backend (#16438 ) Signed-off-by: Chenyaaang <chenyangli@google.com>	2025-04-11 12:25:52 +08:00
Alexey Belyakov	3e397a9484	check input length of sonnet samples (#16423 ) Signed-off-by: alexey-belyakov <alexey.belyakov@intel.com>	2025-04-11 10:15:06 +08:00
WWW	268c325078	Fix range_ratio Bug in RandomDataset (#16126 ) Signed-off-by: jadewang21 <jadewangcn@outlook.com>	2025-04-10 15:31:17 -07:00
Nicolò Lucchesi	3cc9af88ff	[TPU][V1] Disable per-request seed/Generator (#16172 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-04-10 17:05:44 -04:00
look	7cd0bd7212	[Bugfix] Fix output token length check logic (#16419 ) Signed-off-by: look <eeslook@163.com>	2025-04-10 20:16:48 +00:00
Cyrus Leung	56d4aefa33	[VLM] Avoid unnecessary dummy multimodal data during processing (#16416 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-04-10 19:32:14 +00:00
Nick Hill	dd143ef541	[V1] Zero-copy tensor/ndarray serialization/transmission (#13790 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-04-10 19:23:14 +00:00
Chih-Chieh Yang	daefed052c	[Model] Reduce redundant computations in mamba2 blocks for Bamba-9B (#15423 ) Signed-off-by: Chih-Chieh-Yang <7364402+cyang49@users.noreply.github.com> Co-authored-by: Yu Chin Fabian Lim <flim@sg.ibm.com>	2025-04-10 19:07:07 +00:00
Chenyaaang	5fbab20e02	[Bugfix] Fix bug when dataset is json (#15899 ) Signed-off-by: Chenyaaang <chenyangli@google.com>	2025-04-10 18:35:41 +00:00
Lily Liu	e8224f3dca	[V1][Spec Decode] Eagle Model loading (#16035 ) Signed-off-by: LiuXiaoxuanPKU <lilyliupku@gmail.com>	2025-04-10 11:21:48 -07:00
Russell Bryant	9665313c39	[V1] Set structured output backend to `auto` by default (#15724 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-04-10 17:53:26 +00:00
Harry Mellor	0c54fc7273	Improve configs - `ParallelConfig` (#16332 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-04-10 17:34:37 +00:00
Nicolò Lucchesi	c1b57855ec	[TPU][V1] Use `language_model` interface for getting text backbone in MM (#16410 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-04-10 17:32:04 +00:00
Cyrus Leung	83b824c8b4	[VLM] Remove `BaseProcessingInfo.get_mm_max_tokens_per_item` (#16408 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-04-10 09:06:58 -07:00
Lu Fang	7678fcd5b6	Fix the torch version parsing logic (#15857 )	2025-04-10 07:37:47 -07:00
wineandchord	8661c0241d	[CI] Add auto update workflow for Dockerfile graph (#11879 ) Signed-off-by: wineandchord <guoqizhou19@gmail.com>	2025-04-10 13:43:05 +00:00
Reid	ce8d6b75fc	[doc] update the wrong link (#16401 ) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com>	2025-04-10 21:02:37 +08:00
Ye (Charlotte) Qi	61de3ef74b	[Model] Remove image mm limit for LLaMa4 (#16365 ) Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>	2025-04-10 09:36:27 +00:00
cyyever	ec1f9c8c91	Update Numba to 0.61.2 (#16376 ) Signed-off-by: cyy <cyyever@outlook.com>	2025-04-10 07:59:37 +00:00
Reid	65e09094c4	[doc] add download model tips (#16389 ) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com>	2025-04-10 07:45:26 +00:00

1 2 3 4 5 ...

5902 Commits