Michael Goin
|
bd6028d6b0
|
Optimized topk for topk=1 (Llama-4) (#16512)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-04-12 14:21:08 +08:00 |
|
Ye (Charlotte) Qi
|
802329dee9
|
[Doc] Update Llama4 Model Names in Supported Models (#16509)
Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>
|
2025-04-12 02:53:10 +00:00 |
|
Nick Hill
|
41cc883c29
|
[BugFix] Handle non-contiguous tensors properly when serializing (#16492)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-04-11 17:54:06 -07:00 |
|
Michael Goin
|
57504a4bcf
|
[CI][Bugfix] Add mistral_tool_use to Ci (#16517)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-04-11 17:52:38 -07:00 |
|
Yuan Tang
|
ed4792c990
|
[Doc] Fix link to vLLM blog (#16519)
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
|
2025-04-11 17:39:23 -07:00 |
|
Michael Goin
|
87b836ba77
|
Bugfix for PixtralHF models without spatial_merge_size (#16513)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-04-11 23:32:22 +00:00 |
|
rongfu.leng
|
56c76c2e0e
|
[Bugfix] clean up duplicated code (#16485)
Signed-off-by: Gogs <gogs@fake.local>
Co-authored-by: Gogs <gogs@fake.local>
|
2025-04-11 23:19:40 +00:00 |
|
Christian Sears
|
c09632a66c
|
Update openai_compatible_server.md (#16507)
Signed-off-by: Christian Sears <csears@redhat.com>
|
2025-04-11 22:54:58 +00:00 |
|
Yong Hoon Shin
|
a3bf8d4a2b
|
[Kernel] Add tuned FusedMoE kernel config for Llama4 Scout, TP=8 on H100 (#16488)
|
2025-04-12 06:26:55 +08:00 |
|
Ye (Charlotte) Qi
|
16eda8c43a
|
[Frontend] Added chat templates for LLaMa4 pythonic tool calling (#16463)
Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>
Co-authored-by: Kai Wu <kaiwu@meta.com>
|
2025-04-12 06:26:17 +08:00 |
|
Harry Mellor
|
cd77382ac1
|
Improve configs - LoadConfig (#16422)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-04-11 20:27:27 +00:00 |
|
Travis Johnson
|
71b9cde010
|
[Bugfix] handle alignment of encoder_seq_lens in mllama.py (#14784)
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
|
2025-04-11 19:59:50 +00:00 |
|
Isotr0py
|
5285589f37
|
[Doc] Document InternVL3 support (#16495)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-04-11 19:41:09 +00:00 |
|
Michael Goin
|
f41647ee6b
|
[Kernel] Support W8A8 channel-wise weights and per-token activations in triton fused_moe_kernel (#16366)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-04-11 17:54:08 +00:00 |
|
Nicolò Lucchesi
|
4d022cbc75
|
[TPU][V1] Make --disable_chunked_mm_input mandatory for serving MM models (#16483)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-04-11 17:06:14 +00:00 |
|
Richard Zou
|
70de35a881
|
Fix erroneous "model doesn't support compile" warning (#16486)
Signed-off-by: rzou <zou3519@gmail.com>
|
2025-04-11 16:24:36 +00:00 |
|
Tomasz Zielinski
|
34b2cf3b33
|
[Hardware][Intel-Gaudi] Multi-step scheduling implementation for HPU (#12779)
Signed-off-by: Tomasz Zielinski <tomasz.zielinski@intel.com>
|
2025-04-11 07:38:36 -07:00 |
|
chaow-amd
|
9e90c9f73f
|
[Bugfix] Fix bugs of running Quark quantized models (#16236)
Signed-off-by: chaow <chaow@amd.com>
|
2025-04-11 10:18:32 -04:00 |
|
DefTruth
|
e9528f6dc6
|
[Kernel] support merge_attn_states CUDA kernel, 3x speedup (#16173)
Signed-off-by: DefTruth <qiustudent_r@163.com>
|
2025-04-11 06:50:50 -06:00 |
|
Harry Mellor
|
51baa9c333
|
Don't install triton on ppc64le platform (#16470)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-04-11 10:11:00 +00:00 |
|
Reid
|
35e076b3a8
|
[Misc] update api_client example (#16459)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-04-11 10:05:40 +00:00 |
|
Jee Jee Li
|
a26f59ccbc
|
[Misc] Raise error for V1 not supporting Long LoRA. (#16415)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-04-11 01:51:20 -07:00 |
|
Michael Goin
|
aa3b3d76e0
|
Enforce valid max_num_batched_tokens when disable_chunked_mm_input=True (#16447)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-04-11 08:09:52 +00:00 |
|
Jee Jee Li
|
f7030df3be
|
[Core][LoRA][1/N] Add LoRA for EncoderDecoderModelRunner (#15990)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-04-11 15:32:37 +08:00 |
|
DefTruth
|
905e91e9ac
|
Revert "[Model] use AutoWeightsLoader for deepseek_v2, internlm2" (#16453)
|
2025-04-11 06:44:22 +00:00 |
|
Alex Brooks
|
f8f9c0ba62
|
[Bugfix] Don't set an upper bound on repetition penalty (#16403)
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
|
2025-04-11 14:19:40 +08:00 |
|
Li, Jiang
|
dda811021a
|
[CPU][Bugfix] Fix CPU docker issues (#16454)
Signed-off-by: jiang.li <jiang1.li@intel.com>
|
2025-04-11 14:19:07 +08:00 |
|
Isotr0py
|
93195146ea
|
[Bugfix][VLM] Fix failing Phi-4-MM multi-images tests and add vision-speech test (#16424)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-04-11 04:57:16 +00:00 |
|
Michael Goin
|
ed37599544
|
Update supported_hardware.md for TPU INT8 (#16437)
|
2025-04-11 12:28:07 +08:00 |
|
Yong Hoon Shin
|
99ef59cf7f
|
[Llama4] Enable attention temperature tuning by default for long context (>32k) (#16439)
Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>
Co-authored-by: Ye (Charlotte) Qi <yeq@meta.com>
|
2025-04-10 21:26:07 -07:00 |
|
Chenyaaang
|
d544d141ec
|
update benchmark_serving_structured_output to include auto backend (#16438)
Signed-off-by: Chenyaaang <chenyangli@google.com>
|
2025-04-11 12:25:52 +08:00 |
|
Alexey Belyakov
|
3e397a9484
|
check input length of sonnet samples (#16423)
Signed-off-by: alexey-belyakov <alexey.belyakov@intel.com>
|
2025-04-11 10:15:06 +08:00 |
|
WWW
|
268c325078
|
Fix range_ratio Bug in RandomDataset (#16126)
Signed-off-by: jadewang21 <jadewangcn@outlook.com>
|
2025-04-10 15:31:17 -07:00 |
|
Nicolò Lucchesi
|
3cc9af88ff
|
[TPU][V1] Disable per-request seed/Generator (#16172)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-04-10 17:05:44 -04:00 |
|
look
|
7cd0bd7212
|
[Bugfix] Fix output token length check logic (#16419)
Signed-off-by: look <eeslook@163.com>
|
2025-04-10 20:16:48 +00:00 |
|
Cyrus Leung
|
56d4aefa33
|
[VLM] Avoid unnecessary dummy multimodal data during processing (#16416)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-04-10 19:32:14 +00:00 |
|
Nick Hill
|
dd143ef541
|
[V1] Zero-copy tensor/ndarray serialization/transmission (#13790)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-04-10 19:23:14 +00:00 |
|
Chih-Chieh Yang
|
daefed052c
|
[Model] Reduce redundant computations in mamba2 blocks for Bamba-9B (#15423)
Signed-off-by: Chih-Chieh-Yang <7364402+cyang49@users.noreply.github.com>
Co-authored-by: Yu Chin Fabian Lim <flim@sg.ibm.com>
|
2025-04-10 19:07:07 +00:00 |
|
Chenyaaang
|
5fbab20e02
|
[Bugfix] Fix bug when dataset is json (#15899)
Signed-off-by: Chenyaaang <chenyangli@google.com>
|
2025-04-10 18:35:41 +00:00 |
|
Lily Liu
|
e8224f3dca
|
[V1][Spec Decode] Eagle Model loading (#16035)
Signed-off-by: LiuXiaoxuanPKU <lilyliupku@gmail.com>
|
2025-04-10 11:21:48 -07:00 |
|
Russell Bryant
|
9665313c39
|
[V1] Set structured output backend to auto by default (#15724)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-04-10 17:53:26 +00:00 |
|
Harry Mellor
|
0c54fc7273
|
Improve configs - ParallelConfig (#16332)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-04-10 17:34:37 +00:00 |
|
Nicolò Lucchesi
|
c1b57855ec
|
[TPU][V1] Use language_model interface for getting text backbone in MM (#16410)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-04-10 17:32:04 +00:00 |
|
Cyrus Leung
|
83b824c8b4
|
[VLM] Remove BaseProcessingInfo.get_mm_max_tokens_per_item (#16408)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-04-10 09:06:58 -07:00 |
|
Lu Fang
|
7678fcd5b6
|
Fix the torch version parsing logic (#15857)
|
2025-04-10 07:37:47 -07:00 |
|
wineandchord
|
8661c0241d
|
[CI] Add auto update workflow for Dockerfile graph (#11879)
Signed-off-by: wineandchord <guoqizhou19@gmail.com>
|
2025-04-10 13:43:05 +00:00 |
|
Reid
|
ce8d6b75fc
|
[doc] update the wrong link (#16401)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-04-10 21:02:37 +08:00 |
|
Ye (Charlotte) Qi
|
61de3ef74b
|
[Model] Remove image mm limit for LLaMa4 (#16365)
Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>
|
2025-04-10 09:36:27 +00:00 |
|
cyyever
|
ec1f9c8c91
|
Update Numba to 0.61.2 (#16376)
Signed-off-by: cyy <cyyever@outlook.com>
|
2025-04-10 07:59:37 +00:00 |
|
Reid
|
65e09094c4
|
[doc] add download model tips (#16389)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-04-10 07:45:26 +00:00 |
|