Travis Johnson
|
71b9cde010
|
[Bugfix] handle alignment of encoder_seq_lens in mllama.py (#14784)
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
|
2025-04-11 19:59:50 +00:00 |
|
Isotr0py
|
5285589f37
|
[Doc] Document InternVL3 support (#16495)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-04-11 19:41:09 +00:00 |
|
Michael Goin
|
f41647ee6b
|
[Kernel] Support W8A8 channel-wise weights and per-token activations in triton fused_moe_kernel (#16366)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-04-11 17:54:08 +00:00 |
|
Nicolò Lucchesi
|
4d022cbc75
|
[TPU][V1] Make --disable_chunked_mm_input mandatory for serving MM models (#16483)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-04-11 17:06:14 +00:00 |
|
Richard Zou
|
70de35a881
|
Fix erroneous "model doesn't support compile" warning (#16486)
Signed-off-by: rzou <zou3519@gmail.com>
|
2025-04-11 16:24:36 +00:00 |
|
Tomasz Zielinski
|
34b2cf3b33
|
[Hardware][Intel-Gaudi] Multi-step scheduling implementation for HPU (#12779)
Signed-off-by: Tomasz Zielinski <tomasz.zielinski@intel.com>
|
2025-04-11 07:38:36 -07:00 |
|
chaow-amd
|
9e90c9f73f
|
[Bugfix] Fix bugs of running Quark quantized models (#16236)
Signed-off-by: chaow <chaow@amd.com>
|
2025-04-11 10:18:32 -04:00 |
|
DefTruth
|
e9528f6dc6
|
[Kernel] support merge_attn_states CUDA kernel, 3x speedup (#16173)
Signed-off-by: DefTruth <qiustudent_r@163.com>
|
2025-04-11 06:50:50 -06:00 |
|
Harry Mellor
|
51baa9c333
|
Don't install triton on ppc64le platform (#16470)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-04-11 10:11:00 +00:00 |
|
Reid
|
35e076b3a8
|
[Misc] update api_client example (#16459)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-04-11 10:05:40 +00:00 |
|
Jee Jee Li
|
a26f59ccbc
|
[Misc] Raise error for V1 not supporting Long LoRA. (#16415)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-04-11 01:51:20 -07:00 |
|
Michael Goin
|
aa3b3d76e0
|
Enforce valid max_num_batched_tokens when disable_chunked_mm_input=True (#16447)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-04-11 08:09:52 +00:00 |
|
Jee Jee Li
|
f7030df3be
|
[Core][LoRA][1/N] Add LoRA for EncoderDecoderModelRunner (#15990)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-04-11 15:32:37 +08:00 |
|
DefTruth
|
905e91e9ac
|
Revert "[Model] use AutoWeightsLoader for deepseek_v2, internlm2" (#16453)
|
2025-04-11 06:44:22 +00:00 |
|
Alex Brooks
|
f8f9c0ba62
|
[Bugfix] Don't set an upper bound on repetition penalty (#16403)
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
|
2025-04-11 14:19:40 +08:00 |
|
Li, Jiang
|
dda811021a
|
[CPU][Bugfix] Fix CPU docker issues (#16454)
Signed-off-by: jiang.li <jiang1.li@intel.com>
|
2025-04-11 14:19:07 +08:00 |
|
Isotr0py
|
93195146ea
|
[Bugfix][VLM] Fix failing Phi-4-MM multi-images tests and add vision-speech test (#16424)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-04-11 04:57:16 +00:00 |
|
Michael Goin
|
ed37599544
|
Update supported_hardware.md for TPU INT8 (#16437)
|
2025-04-11 12:28:07 +08:00 |
|
Yong Hoon Shin
|
99ef59cf7f
|
[Llama4] Enable attention temperature tuning by default for long context (>32k) (#16439)
Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>
Co-authored-by: Ye (Charlotte) Qi <yeq@meta.com>
|
2025-04-10 21:26:07 -07:00 |
|
Chenyaaang
|
d544d141ec
|
update benchmark_serving_structured_output to include auto backend (#16438)
Signed-off-by: Chenyaaang <chenyangli@google.com>
|
2025-04-11 12:25:52 +08:00 |
|
Alexey Belyakov
|
3e397a9484
|
check input length of sonnet samples (#16423)
Signed-off-by: alexey-belyakov <alexey.belyakov@intel.com>
|
2025-04-11 10:15:06 +08:00 |
|
WWW
|
268c325078
|
Fix range_ratio Bug in RandomDataset (#16126)
Signed-off-by: jadewang21 <jadewangcn@outlook.com>
|
2025-04-10 15:31:17 -07:00 |
|
Nicolò Lucchesi
|
3cc9af88ff
|
[TPU][V1] Disable per-request seed/Generator (#16172)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-04-10 17:05:44 -04:00 |
|
look
|
7cd0bd7212
|
[Bugfix] Fix output token length check logic (#16419)
Signed-off-by: look <eeslook@163.com>
|
2025-04-10 20:16:48 +00:00 |
|
Cyrus Leung
|
56d4aefa33
|
[VLM] Avoid unnecessary dummy multimodal data during processing (#16416)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-04-10 19:32:14 +00:00 |
|
Nick Hill
|
dd143ef541
|
[V1] Zero-copy tensor/ndarray serialization/transmission (#13790)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-04-10 19:23:14 +00:00 |
|
Chih-Chieh Yang
|
daefed052c
|
[Model] Reduce redundant computations in mamba2 blocks for Bamba-9B (#15423)
Signed-off-by: Chih-Chieh-Yang <7364402+cyang49@users.noreply.github.com>
Co-authored-by: Yu Chin Fabian Lim <flim@sg.ibm.com>
|
2025-04-10 19:07:07 +00:00 |
|
Chenyaaang
|
5fbab20e02
|
[Bugfix] Fix bug when dataset is json (#15899)
Signed-off-by: Chenyaaang <chenyangli@google.com>
|
2025-04-10 18:35:41 +00:00 |
|
Lily Liu
|
e8224f3dca
|
[V1][Spec Decode] Eagle Model loading (#16035)
Signed-off-by: LiuXiaoxuanPKU <lilyliupku@gmail.com>
|
2025-04-10 11:21:48 -07:00 |
|
Russell Bryant
|
9665313c39
|
[V1] Set structured output backend to auto by default (#15724)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-04-10 17:53:26 +00:00 |
|
Harry Mellor
|
0c54fc7273
|
Improve configs - ParallelConfig (#16332)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-04-10 17:34:37 +00:00 |
|
Nicolò Lucchesi
|
c1b57855ec
|
[TPU][V1] Use language_model interface for getting text backbone in MM (#16410)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-04-10 17:32:04 +00:00 |
|
Cyrus Leung
|
83b824c8b4
|
[VLM] Remove BaseProcessingInfo.get_mm_max_tokens_per_item (#16408)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-04-10 09:06:58 -07:00 |
|
Lu Fang
|
7678fcd5b6
|
Fix the torch version parsing logic (#15857)
|
2025-04-10 07:37:47 -07:00 |
|
wineandchord
|
8661c0241d
|
[CI] Add auto update workflow for Dockerfile graph (#11879)
Signed-off-by: wineandchord <guoqizhou19@gmail.com>
|
2025-04-10 13:43:05 +00:00 |
|
Reid
|
ce8d6b75fc
|
[doc] update the wrong link (#16401)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-04-10 21:02:37 +08:00 |
|
Ye (Charlotte) Qi
|
61de3ef74b
|
[Model] Remove image mm limit for LLaMa4 (#16365)
Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>
|
2025-04-10 09:36:27 +00:00 |
|
cyyever
|
ec1f9c8c91
|
Update Numba to 0.61.2 (#16376)
Signed-off-by: cyy <cyyever@outlook.com>
|
2025-04-10 07:59:37 +00:00 |
|
Reid
|
65e09094c4
|
[doc] add download model tips (#16389)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-04-10 07:45:26 +00:00 |
|
Michael Goin
|
c70cf0fe06
|
[Kernel] Use moe_wna16 kernel for compressed tensors wna16 moe models (#16038)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-04-10 15:08:47 +08:00 |
|
Cyrus Leung
|
a5d11a54dc
|
[Bugfix] Fix validation error for text-only Mllama 3.2 (#16377)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-04-10 14:19:42 +08:00 |
|
Cyrus Leung
|
3d4c87758e
|
[Misc] Update transformers version limits of multi-modal tests (#16381)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-04-09 23:03:33 -07:00 |
|
Aaron Ang
|
a9bd832fc5
|
[Model] use AutoWeightsLoader for deepseek_v2, internlm2 (#16383)
Signed-off-by: Aaron Ang <aaron.angyd@gmail.com>
|
2025-04-09 23:01:00 -07:00 |
|
Chenyaaang
|
417bcefbae
|
fix sonnet dataset sample when prefix len is very small (#16379)
Signed-off-by: Chenyaaang <chenyangli@google.com>
|
2025-04-10 05:35:07 +00:00 |
|
Michael Goin
|
baada0e737
|
[Bugfix][TPU] Fix TPU validate_request (#16369)
Signed-off-by: Michael Goin <mgoin64@gmail.com>
|
2025-04-10 12:55:12 +08:00 |
|
Benjamin Kitor
|
82eb61dd4c
|
[misc] use tqdm.auto where appropriate (#16290)
Signed-off-by: Benjamin Kitor <bkitor@gigaio.com>
|
2025-04-09 21:54:54 -07:00 |
|
Roger Wang
|
0d4d06fe2f
|
[CI][Bugfix] Pin triton version for CPU (#16384)
Signed-off-by: Roger Wang <ywang@roblox.com>
|
2025-04-10 04:35:00 +00:00 |
|
Jintao
|
4aed0ca6a2
|
[bugfix] Avoid the time consumption caused by creating dummy videos. (#16371)
|
2025-04-10 04:30:05 +00:00 |
|
Chengji Yao
|
1621b25288
|
[TPU] Fix dummy loading OOM (#16372)
Signed-off-by: Chengji Yao <chengjiyao@google.com>
|
2025-04-10 04:06:16 +00:00 |
|
Aaron Ang
|
a564797151
|
[Model] use AutoWeightsLoader for granite, granitemoe, granitemoeshared, grok1, mixtral (#16325)
Signed-off-by: Aaron Ang <aaron.angyd@gmail.com>
|
2025-04-09 20:07:40 -07:00 |
|