Jinzhen Lin
|
d06ba4ed3f
|
[Kernel] moe wna16 marlin kernel (#14447)
Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>
Co-authored-by: Michael Goin <michael@neuralmagic.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
|
2025-04-14 20:05:22 -07:00 |
|
Alex Brooks
|
6b40996ae8
|
[Core][Bugfix] Fix Offline MM Beam Search (#16390)
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2025-04-15 10:33:02 +08:00 |
|
Shuqiao Li
|
d2020acac7
|
config check sleep mode support oot platforms (#16562)
|
2025-04-14 16:31:50 -07:00 |
|
Chengji Yao
|
1eb3c2ed48
|
[DOC][TPU] Add core idea about avoiding recompilation after warmup (#16614)
Signed-off-by: Chengji Yao <chengjiyao@google.com>
|
2025-04-14 21:56:06 +00:00 |
|
Siyuan Liu
|
c64ee87267
|
[Hardware][TPU] Add torchvision to tpu dependency file (#16616)
Signed-off-by: Siyuan Liu <lsiyuan@google.com>
|
2025-04-14 17:50:46 -04:00 |
|
courage17340
|
b1308b84a3
|
[Model][VLM] Add Kimi-VL model support (#16387)
Signed-off-by: courage17340 <courage17340@163.com>
|
2025-04-14 21:41:48 +00:00 |
|
Nishan Acharya
|
7b5ecf79bd
|
s390x: Fix PyArrow build and add CPU test script for Buildkite CI (#16036)
Signed-off-by: Nishan Acharya <Nishan.Acharya@ibm.com>
|
2025-04-14 10:55:32 -07:00 |
|
Harry Mellor
|
9883a18859
|
Fix triton install condition on CPU (#16600)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-04-14 17:06:01 +00:00 |
|
Nicolò Lucchesi
|
b3f2fddd17
|
[TPU][V1] Fix exponential padding when max-num-batched-tokens is not a power of 2 (#16596)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-04-14 17:01:05 +00:00 |
|
Cyrus Leung
|
aa29841ede
|
[Bugfix] Multi-modal caches not acting like LRU caches (#16593)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-04-14 09:24:16 -07:00 |
|
Md. Shafi Hussain
|
6bf27affb6
|
[fix]: Dockerfile.ppc64le fixes for opencv-python and hf-xet (#16048)
Signed-off-by: Md. Shafi Hussain <Md.Shafi.Hussain@ibm.com>
|
2025-04-14 17:08:39 +01:00 |
|
shangmingc
|
1dd23386ec
|
[Misc] Update usage with mooncake lib for kv transfer (#16523)
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>
|
2025-04-14 11:31:37 +00:00 |
|
Reid
|
7cbfc10943
|
[Misc] refactor examples (#16563)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-04-14 09:59:15 +00:00 |
|
DefTruth
|
ce4ddd2d1a
|
[Misc] remove warning if triton>=3.2.0 (#16553)
Signed-off-by: DefTruth <qiustudent_r@163.com>
|
2025-04-14 02:39:47 -07:00 |
|
Harry Mellor
|
e51929ebca
|
Improve configs - SchedulerConfig (#16533)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-04-14 17:24:16 +08:00 |
|
Russell Bryant
|
dc1b4a6f13
|
[Core][V0] Enable regex support with xgrammar (#13228)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-04-14 10:13:38 +08:00 |
|
Jennifer Zhao
|
63d2705edb
|
[Benchmark][Bugfix] Fix SonnetDataset default values in benchmark_throughput.py (#16556)
|
2025-04-13 17:20:26 -07:00 |
|
Michael Goin
|
d085a44082
|
Enable PTPC FP8 for CompressedTensorsW8A8Fp8MoEMethod (triton fused_moe) (#16537)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-04-13 14:55:18 +00:00 |
|
Lily Liu
|
f49e5aff11
|
[V1][Spec Decode] KV cache slots for eagle heads (#16370)
Signed-off-by: LiuXiaoxuanPKU <lilyliupku@gmail.com>
|
2025-04-12 19:42:51 -07:00 |
|
Ryan McConville
|
6c11ecf8d3
|
[Bugfix] Validate logit biases to prevent out of vocab ids crashing engine (#16529)
Signed-off-by: Ryan McConville <ryan@ryanmcconville.com>
|
2025-04-12 20:19:19 +00:00 |
|
SnowCharm
|
93e5f3c5fb
|
[Perf] Optimize Preparing Inputs for GPU Model Runner (#16484)
Signed-off-by: snowcharm <snowcharmqq@gmail.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
|
2025-04-12 22:54:37 +08:00 |
|
Jie Fu (傅杰)
|
70363bccfa
|
Fix syntaxWarning: invalid escape sequence '\s' (#16532)
Signed-off-by: Jie Fu <jiefu@tencent.com>
|
2025-04-12 14:39:42 +00:00 |
|
Jee Jee Li
|
3cdc57669f
|
[Misc] Delete redundant code (#16530)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-04-12 11:21:37 +00:00 |
|
Huazhong Ji
|
68bb122eb4
|
[MISC] Make GroupCoordinator compatible with out-of-tree devices (#16464)
Signed-off-by: hzji210@gmail.com <hzji210@gmail.com>
|
2025-04-12 09:20:25 +00:00 |
|
Cyrus Leung
|
d9fc8cd9da
|
[V1] Enable multi-input by default (#15799)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-04-12 08:52:39 +00:00 |
|
Nicolò Lucchesi
|
f069f3ea74
|
[Misc] Openai transcription client example use same Whisper model (#16487)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-04-12 07:27:03 +00:00 |
|
Cyrus Leung
|
c5bc0e7fcc
|
[Misc] Update chat utils tests (#16520)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-04-12 06:48:43 +00:00 |
|
Tianer Zhou
|
4a3a518722
|
fix: spelling (#16466)
Signed-off-by: Tianer Zhou <ezhoureal@gmail.com>
|
2025-04-11 23:24:22 -07:00 |
|
wang.yuqi
|
fbf722c6e6
|
[Frontend] support matryoshka representation / support embedding API dimensions (#16331)
|
2025-04-11 23:23:10 -07:00 |
|
leon-seidel
|
e92d7085bf
|
[Feature][V1] Add xgrammar to support minLength, maxLength with test (#16516)
Signed-off-by: Leon Seidel <leon.seidel@fau.de>
|
2025-04-11 23:22:07 -07:00 |
|
Michael Goin
|
bd6028d6b0
|
Optimized topk for topk=1 (Llama-4) (#16512)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-04-12 14:21:08 +08:00 |
|
Ye (Charlotte) Qi
|
802329dee9
|
[Doc] Update Llama4 Model Names in Supported Models (#16509)
Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>
|
2025-04-12 02:53:10 +00:00 |
|
Nick Hill
|
41cc883c29
|
[BugFix] Handle non-contiguous tensors properly when serializing (#16492)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-04-11 17:54:06 -07:00 |
|
Michael Goin
|
57504a4bcf
|
[CI][Bugfix] Add mistral_tool_use to Ci (#16517)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-04-11 17:52:38 -07:00 |
|
Yuan Tang
|
ed4792c990
|
[Doc] Fix link to vLLM blog (#16519)
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
|
2025-04-11 17:39:23 -07:00 |
|
Michael Goin
|
87b836ba77
|
Bugfix for PixtralHF models without spatial_merge_size (#16513)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-04-11 23:32:22 +00:00 |
|
rongfu.leng
|
56c76c2e0e
|
[Bugfix] clean up duplicated code (#16485)
Signed-off-by: Gogs <gogs@fake.local>
Co-authored-by: Gogs <gogs@fake.local>
|
2025-04-11 23:19:40 +00:00 |
|
Christian Sears
|
c09632a66c
|
Update openai_compatible_server.md (#16507)
Signed-off-by: Christian Sears <csears@redhat.com>
|
2025-04-11 22:54:58 +00:00 |
|
Yong Hoon Shin
|
a3bf8d4a2b
|
[Kernel] Add tuned FusedMoE kernel config for Llama4 Scout, TP=8 on H100 (#16488)
|
2025-04-12 06:26:55 +08:00 |
|
Ye (Charlotte) Qi
|
16eda8c43a
|
[Frontend] Added chat templates for LLaMa4 pythonic tool calling (#16463)
Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>
Co-authored-by: Kai Wu <kaiwu@meta.com>
|
2025-04-12 06:26:17 +08:00 |
|
Harry Mellor
|
cd77382ac1
|
Improve configs - LoadConfig (#16422)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-04-11 20:27:27 +00:00 |
|
Travis Johnson
|
71b9cde010
|
[Bugfix] handle alignment of encoder_seq_lens in mllama.py (#14784)
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
|
2025-04-11 19:59:50 +00:00 |
|
Isotr0py
|
5285589f37
|
[Doc] Document InternVL3 support (#16495)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-04-11 19:41:09 +00:00 |
|
Michael Goin
|
f41647ee6b
|
[Kernel] Support W8A8 channel-wise weights and per-token activations in triton fused_moe_kernel (#16366)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-04-11 17:54:08 +00:00 |
|
Nicolò Lucchesi
|
4d022cbc75
|
[TPU][V1] Make --disable_chunked_mm_input mandatory for serving MM models (#16483)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-04-11 17:06:14 +00:00 |
|
Richard Zou
|
70de35a881
|
Fix erroneous "model doesn't support compile" warning (#16486)
Signed-off-by: rzou <zou3519@gmail.com>
|
2025-04-11 16:24:36 +00:00 |
|
Tomasz Zielinski
|
34b2cf3b33
|
[Hardware][Intel-Gaudi] Multi-step scheduling implementation for HPU (#12779)
Signed-off-by: Tomasz Zielinski <tomasz.zielinski@intel.com>
|
2025-04-11 07:38:36 -07:00 |
|
chaow-amd
|
9e90c9f73f
|
[Bugfix] Fix bugs of running Quark quantized models (#16236)
Signed-off-by: chaow <chaow@amd.com>
|
2025-04-11 10:18:32 -04:00 |
|
DefTruth
|
e9528f6dc6
|
[Kernel] support merge_attn_states CUDA kernel, 3x speedup (#16173)
Signed-off-by: DefTruth <qiustudent_r@163.com>
|
2025-04-11 06:50:50 -06:00 |
|
Harry Mellor
|
51baa9c333
|
Don't install triton on ppc64le platform (#16470)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-04-11 10:11:00 +00:00 |
|