5359 Commits

Author SHA1 Message Date
youkaichao
9606d572ed
[distributed] fix dp group (#15355)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-03-24 14:54:27 +00:00
Cyrus Leung
cbcdf2c609
[Bugfix] Fix chat template loading (#15143)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Roger Wang <ywang@roblox.com>
Co-authored-by: chaunceyjiang <chaunceyjiang@gmail.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
2025-03-24 13:50:09 +00:00
Russell Bryant
038de04d7b
Fix zmq IPv6 URL format error (#15341)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2025-03-24 09:30:41 -04:00
Jinzhen Lin
6b3cc75be0
[Kernel] allow non-contiguous input for marlin kernel (#14658)
Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>
2025-03-24 09:21:33 -04:00
Simon Mo
7ffcccfa5c
Revert "[CI/Build] Use uv python for docker rather than ppa:deadsnakess/ppa (#13569)" (#15377)
Signed-off-by: simon-mo <simon.mo@hey.com>
2025-03-24 05:53:10 -07:00
sfbemerk
cc8accfd53
[Misc] Update guided decoding logs to debug (#15310)
Signed-off-by: Benjamin Merkel <benjamin.merkel@tngtech.com>
Co-authored-by: Benjamin Merkel <benjamin.merkel@tngtech.com>
2025-03-24 04:25:20 -07:00
ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟
948ab03e7e
[Bugfix][V1] Avoid importing PreTrainedModel (#15366)
Signed-off-by: Hollow Man <hollowman@opensuse.org>
2025-03-24 10:33:12 +00:00
Rui Qiao
5797fb97e9
[Misc] Remove ignore_reinit_error for ray.init() (#15373) 2025-03-24 07:41:53 +00:00
Jee Jee Li
3892e58ad7
[Misc] Upgrade BNB version (#15183) 2025-03-24 05:51:42 +00:00
Qubitium-ModelCloud
d20e261199
Fix non-contiguous input passed to Marlin kernel (#15319) 2025-03-24 03:09:44 +00:00
Luka Govedič
f622dbcf39
[Fix] [torch.compile] Improve UUID system for custom passes (#15249)
Signed-off-by: luka <luka@neuralmagic.com>
2025-03-24 01:54:07 +00:00
Lucas Wilkinson
dccf535f8e
[V1] Enable V1 Fp8 cache for FA3 in the oracle (#15191)
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-03-23 15:07:04 -07:00
Roger Wang
9c5c81b0da
[Misc][Doc] Add note regarding loading generation_config by default (#15281)
Signed-off-by: Roger Wang <ywang@roblox.com>
2025-03-23 14:00:55 -07:00
Robin
d6cd59f122
[Frontend] Support tool calling and reasoning parser (#14511)
Signed-off-by: WangErXiao <863579016@qq.com>
2025-03-23 14:00:07 -07:00
Woosuk Kwon
bc8ed3c4ba
[V1][Spec Decode] Use better defaults for N-gram (#15358)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-03-23 10:52:30 -07:00
Woosuk Kwon
b9bd76ca14
[V1][Spec Decode] Respect prompt_lookup_max (#15348)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-03-23 10:41:44 -07:00
DefTruth
6ebaf9ac71
[Bugfix] consider related env vars for torch.compiled cache hash (#14953)
Signed-off-by: DefTruth <31974251+DefTruth@users.noreply.github.com>
2025-03-23 15:53:09 +00:00
DefTruth
f90d34b498
[Misc] Add tuned R1 w8a8 and MoE configs for NVIDIA L20 (#15322)
Signed-off-by: DefTruth <qiustudent_r@163.com>
2025-03-23 01:10:10 -07:00
youkaichao
f68cce8e64
[ci/build] fix broken tests in LLM.collective_rpc (#15350)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-03-23 14:49:48 +08:00
youkaichao
09b6a95551
[ci/build] update torch nightly version for GH200 (#15135)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-03-23 14:04:13 +08:00
shangmingc
50c9636d87
[V1][Usage] Refactor speculative decoding configuration and tests (#14434)
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>
2025-03-22 19:28:10 -10:00
hijkzzz
0661cfef7a
Fix v1 supported oracle for worker-cls and worker-extension-cls (#15324)
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
2025-03-23 10:23:35 +08:00
Chen Zhang
a827aa815d
[doc] Add back previous news (#15331)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
2025-03-22 17:38:33 -07:00
Russell Bryant
b877031d80
Remove openvino support in favor of external plugin (#15339)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2025-03-22 14:06:39 -07:00
Wang Ran (汪然)
dd861b992f
[BugFix][Typing] Fix Imprecise Type Annotations (#15208)
Signed-off-by: Wang Ran (汪然) <wrran@outlook.com>
2025-03-22 09:05:03 -07:00
Russell Bryant
eb63ea1e18
[V1] Add disable-any-whitespace option support for xgrammar (#15316)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2025-03-22 15:56:17 +00:00
Naitong Yu
2f4bd358f1
[Model] Support Tele-FLM Model (#15023)
Signed-off-by: Naitong Yu <ntyu@baai.ac.cn>
Signed-off-by: jiangxin <horizon94@outlook.com>
Co-authored-by: Jason Fang <jasonfang3900@gmail.com>
Co-authored-by: jiangxin <horizon94@outlook.com>
2025-03-22 02:04:44 -07:00
Varun Sundar Rabindranath
8a8b30eac1
[Bugfix] LoRA V0 - Fix case where max_num_seqs is between cudagraph capture sizes (#15308)
Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
2025-03-22 02:03:32 -07:00
Jee Jee Li
2fa0e1396b
[Bugfix] Fix torch.compile raise FileNotFoundError (#15278)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-03-22 13:49:34 +08:00
wwl2755
1c2bec0f82
[Doc] add load_format items in docs (#14804)
Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>
2025-03-21 22:36:43 -07:00
TJian
ec870fba9a
[FEAT] [ROCm]: Add AITER RMS Norm (Layer Norm) Feature (#14959)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
2025-03-21 22:36:14 -07:00
Andy Lo
df1430265c
[Bugfix][V0] Multi-sequence logprobs streaming edge case (#15259)
Signed-off-by: Andy Lo <andy@mistral.ai>
2025-03-21 22:35:37 -07:00
Rui Qiao
4c69e228b3
[Misc] Increase RayDistributedExecutor RAY_CGRAPH_get_timeout (#15301)
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
2025-03-21 22:25:43 -07:00
Russell Bryant
790b79750b
[Build/CI] Fix env var typo (#15305)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2025-03-21 22:28:46 +00:00
Nicolò Lucchesi
cfbb8c930f
[TPU][V1] MHA Pallas backend (#15288)
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-03-21 08:50:39 -07:00
Cyrus Leung
baec0d4de9
Revert "[Feature] specify model in config.yaml (#14855)" (#15293)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-03-21 08:30:23 -07:00
Mengqing Cao
c21b99b912
[Bugfix][VLM] fix llava processor (#15285)
Signed-off-by: Mengqing Cao <cmq0113@163.com>
2025-03-21 05:14:36 -07:00
Chen Zhang
93a00d7dde
[v1] Refactor KVCacheConfig (#14079)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
2025-03-21 04:56:27 -07:00
Russell Bryant
61e8c18350
[Misc] Add cProfile helpers (#15074)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2025-03-21 04:56:09 -07:00
Isotr0py
8afcd0f633
[Bugfix] Fix broken kernel test due to missing rename for v1 Triton backend (#15282)
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-03-21 11:42:06 +00:00
Lehua Ding
91ca929dc7
[V1] Fix wrong import path of get_flash_attn_version (#15280)
Signed-off-by: Lehua Ding <lehuading@tencent.com>
2025-03-21 03:54:11 -07:00
Isotr0py
84e00adc8a
[Bugfix] Fix incorrect resolving order for transformers fallback (#15279)
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-03-21 03:54:08 -07:00
Isotr0py
47c7126213
[Misc] Add attention mask pre-computation optimization back to Qwen2.5-VL (#15273)
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-03-21 10:32:33 +00:00
Shanshan Shen
a989ca2bf6
[Bugfix] Add int8 torch dtype for KVCache (#15260)
Signed-off-by: shen-shanshan <467638484@qq.com>
2025-03-21 08:58:28 +00:00
Wei Zeng
0fa3970deb
[Feature] specify model in config.yaml (#14855)
Signed-off-by: weizeng <weizeng@roblox.com>
2025-03-21 00:26:03 -07:00
Nick Hill
da6ea29f7a
[V1] Avoid redundant input processing in n>1 case (#14985)
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-03-20 22:24:10 -07:00
Edwin Hernandez
7297941b38
[Doc] Update LWS docs (#15163)
Signed-off-by: Edwinhr716 <Edandres249@gmail.com>
2025-03-20 21:18:47 -07:00
Isotr0py
f8a08cb90d
[V1] Enable Triton(ROCm) Attention backend for Nvidia GPUs (#14071)
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-03-21 03:14:19 +00:00
Siyuan Liu
b15fd2be2a
[Hardware][TPU] Add check for no additional graph compilation during runtime (#14710)
Signed-off-by: Siyuan Liu <lsiyuan@google.com>
2025-03-21 03:05:28 +00:00
Woosuk Kwon
e588ac237c
Add an example for reproducibility (#15262)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-03-20 19:55:47 -07:00