Yuxuan Zhang
|
1e44ffc3ff
|
Add GLM-4-0414 support (#16338)
Signed-off-by: lvfei.lv <lvfei.lv@alibaba-inc.com>
Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: yihong0618 <zouzou0208@gmail.com>
Signed-off-by: Lu Fang <fanglu@fb.com>
Signed-off-by: Ajay Vohra <ajayvohr@amazon.com>
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: Guillaume Calmettes <gcalmettes@scaleway.com>
Co-authored-by: Accelerator1996 <lvfei.lv@alibaba-inc.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
Co-authored-by: Michael Goin <michael@neuralmagic.com>
Co-authored-by: yihong <zouzou0208@gmail.com>
Co-authored-by: Lucia Fang <116399278+luccafong@users.noreply.github.com>
Co-authored-by: ajayvohra2005 <ajayvohr@amazon.com>
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>
Co-authored-by: Guillaume Calmettes <gcalmettes@scaleway.com>
|
2025-04-10 09:19:42 +08:00 |
|
Chengji Yao
|
a454748544
|
[TPU][V1] Refine tpu_model_runner to mitigate future recompilation issues (#16275)
Signed-off-by: Chengji Yao <chengjiyao@google.com>
|
2025-04-09 18:51:51 -06:00 |
|
Reid
|
1bff42c4b7
|
[Misc] refactor Structured Outputs example (#16322)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-04-09 23:32:42 +00:00 |
|
Joe Runde
|
cb391d85dc
|
[Hardware] add platform-specific request validation api (#16291)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
|
2025-04-09 12:50:01 -07:00 |
|
Russell Bryant
|
fee5b8d37f
|
[Build/CI] Add tracing deps to vllm container image (#15224)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-04-09 19:14:06 +00:00 |
|
Michael Goin
|
b2ce859bd2
|
Fix benchmark_throughput.py --backend=hf (#16352)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-04-09 19:09:28 +00:00 |
|
Chendi.Xue
|
566f10a929
|
[CI]Fix hpu docker and numpy version for CI (#16355)
Signed-off-by: Chendi Xue <chendi.xue@intel.com>
|
2025-04-09 17:52:26 +00:00 |
|
Guillaume Calmettes
|
c3b5189137
|
[Bugfix] catch AssertionError in MistralTokenizer as ValueError (#16344)
Signed-off-by: Guillaume Calmettes <gcalmettes@scaleway.com>
|
2025-04-09 17:33:24 +00:00 |
|
zh Wang
|
a25866ac8d
|
[Bugfix] Fix profiling.py (#16202)
Signed-off-by: zh Wang <rekind133@outlook.com>
|
2025-04-09 17:03:34 +00:00 |
|
Michael Goin
|
098900d7c2
|
Revert "Update label-tpu mergify and remove removal bot" (#16350)
|
2025-04-09 07:59:36 -07:00 |
|
Guillaume Calmettes
|
98d01d3ce2
|
[Bugfix][Frontend] respect provided default guided decoding backend (#15476)
Signed-off-by: Guillaume Calmettes <gcalmettes@scaleway.com>
|
2025-04-09 05:11:10 -07:00 |
|
Nicolò Lucchesi
|
d55244df31
|
[Model] Add SupportsMultiModal.get_language_model interface (#16007)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-04-09 04:12:54 -07:00 |
|
yihong
|
04149cce27
|
[BugFix] fix some typos found by typos. (#16314)
Signed-off-by: yihong0618 <zouzou0208@gmail.com>
|
2025-04-09 03:43:59 -07:00 |
|
ajayvohra2005
|
24834f4894
|
update neuron config (#16289)
Signed-off-by: Ajay Vohra <ajayvohr@amazon.com>
|
2025-04-09 03:43:22 -07:00 |
|
Lucia Fang
|
ec7da6fcf3
|
[BugFix] llama4 qknorm should be not shared across head (#16311)
Signed-off-by: Lu Fang <fanglu@fb.com>
|
2025-04-09 00:59:14 -07:00 |
|
yihong
|
819d548e8a
|
[BugFix] logger is not callable (#16312)
Signed-off-by: yihong0618 <zouzou0208@gmail.com>
|
2025-04-09 00:59:02 -07:00 |
|
Michael Goin
|
477d2a8aa2
|
Update label-tpu mergify and remove removal bot (#16298)
|
2025-04-09 07:56:25 +00:00 |
|
Cyrus Leung
|
e484e02857
|
[Bugfix] Avoid transferring cached multi-modal items from P0 to P1 (#16273)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-04-09 00:51:27 -07:00 |
|
Accelerator1996
|
24f6b9a713
|
[Misc] Fix test_sharded_state_loader.py(#16004) (#16005)
Signed-off-by: lvfei.lv <lvfei.lv@alibaba-inc.com>
|
2025-04-09 14:47:30 +08:00 |
|
Luka Govedič
|
9cdde47289
|
[BugFix] Fix fusion test and add them to CI (#16287)
Signed-off-by: luka <luka@neuralmagic.com>
|
2025-04-08 23:46:45 -07:00 |
|
Chengji Yao
|
b1eb4ca152
|
[TPU] Update PyTorch/XLA (#16288)
Signed-off-by: Chengji Yao <chengjiyao@google.com>
|
2025-04-09 14:46:32 +08:00 |
|
Michael Goin
|
87b4ac56c2
|
[CI][Bugfix] Fix bad tolerance for test_batch_base64_embedding (#16221)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-04-09 04:14:46 +00:00 |
|
Russell Bryant
|
cb84e45ac7
|
[Core] Upgrade to xgrammar 0.1.18, add cache size limit (#16283)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-04-08 19:13:22 -07:00 |
|
rongfu.leng
|
4716377fbc
|
[Feature] Estimate max-model-len use available KV cache memory (#16168)
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>
|
2025-04-08 19:12:51 -07:00 |
|
rongfu.leng
|
4e9cf8c1dd
|
[Bugfix] fix gettid method is not define (#16084)
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>
|
2025-04-08 19:12:44 -07:00 |
|
TJian
|
2976dc27e9
|
[Bug] [ROCm] Fix Llama 4 Enablement Bug on ROCm: V0 ROCmFlashAttentionImpl and Triton Fused MoE bugs (#16198)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com>
Co-authored-by: Hongxia Yang <hongxia.yang@amd.com>
Co-authored-by: kliuae <kuanfu.liu@embeddedllm.com>
|
2025-04-08 19:12:34 -07:00 |
|
Chauncey
|
102bf967f0
|
[Model] Add smolvlm support (#16017)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2025-04-08 19:12:17 -07:00 |
|
yueshen2016
|
1f4b09b525
|
Add support to modelopt quantization of Mixtral model (#15961)
Signed-off-by: Yue <yueshen@nvidia.com>
|
2025-04-09 01:53:31 +00:00 |
|
Jee Jee Li
|
86c3369eb8
|
[CI/Build] Fix CI LoRA failure (#16270)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-04-09 09:13:56 +08:00 |
|
Russell Bryant
|
2755c34a8f
|
[V1] Update structured output offline inference example (#15721)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-04-08 22:34:09 +00:00 |
|
Jinzhen Lin
|
db10422184
|
[Bugfix] fix deepseek fp16 scale bug (#14809)
Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
|
2025-04-08 16:56:09 -04:00 |
|
Lucas Wilkinson
|
e1a2c699dd
|
[BugFix] Fix Llama4 - Index Error When Single Request Near Max Context (#16209)
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
|
2025-04-08 18:56:51 +00:00 |
|
Harry Mellor
|
0115ccd5c0
|
Add warning that content below line in template will be removed (#16276)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-04-08 18:18:40 +00:00 |
|
Isotr0py
|
40b4284fe3
|
[Bugfix] Handle process_weights_after_loading for QKVCrossParallelLinear (#15328)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-04-08 10:02:23 -07:00 |
|
Cyrus Leung
|
4ebc0b9640
|
[Bugfix] Proper input validation for multi-modal encoder-decoder models (#16156)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-04-08 09:45:21 -07:00 |
|
Kero Liang
|
dc96fd54c6
|
[Misc] Avoid stripping meaningful whitespace from nvidia-smi topo -m output in collect_env.py (#16272)
Signed-off-by: imkero <kerorek@outlook.com>
|
2025-04-08 16:08:09 +00:00 |
|
wang.yuqi
|
1f5d13ab9f
|
[New Model]: jinaai/jina-embeddings-v3 (#16120)
|
2025-04-08 08:39:12 -07:00 |
|
Harry Mellor
|
90cb44eb02
|
Update to transformers==4.51.1 (#16257)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-04-08 06:53:39 -07:00 |
|
Kebe
|
e11880deea
|
[Bugfix] Remove triton do_bench fast_flush arg (#16256)
Signed-off-by: Kebe <mail@kebe7jun.com>
|
2025-04-08 13:51:06 +00:00 |
|
TY-AMD
|
9351f91be9
|
[BugFix][ROCm] Fix GGUF MoE Dispatch Block_Dim for ROCm (#16247)
Signed-off-by: Tianyuan Wu <Tianyuan.Wu@amd.com>
|
2025-04-08 05:10:26 -07:00 |
|
rongfu.leng
|
5a1e1c8353
|
[Model] use AutoWeightsLoader for phimoe,qwen2_moe,qwen3_moe (#16203)
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>
|
2025-04-08 04:05:47 -07:00 |
|
Alex Brooks
|
69ecaa7c79
|
[Misc] Add warning for multimodal data in LLM.beam_search (#16241)
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
|
2025-04-08 04:05:27 -07:00 |
|
Reid
|
7f00899ff7
|
[Misc] format and refactor some examples (#16252)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-04-08 10:42:32 +00:00 |
|
Simon Mo
|
995e3d1f41
|
[Docs] Add Slides from Singapore Meetup (#16213)
Signed-off-by: simon-mo <simon.mo@hey.com>
|
2025-04-08 07:20:22 +00:00 |
|
Kebe
|
b4ac449a83
|
[Misc] Merge the logs of pp layers partitions (#16225)
Signed-off-by: Kebe <mail@kebe7jun.com>
|
2025-04-08 00:18:15 -07:00 |
|
Michael Goin
|
8e5314a468
|
[V1] Add disable_chunked_mm_input arg to disable partial mm input prefill (#15837)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-04-07 23:24:07 -07:00 |
|
Siyuan Liu
|
87918e40c4
|
[torch.compile][TPU] Make @support_torch_compile work for XLA backend (#15782)
Signed-off-by: Siyuan Liu <lsiyuan@google.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
|
2025-04-08 14:23:53 +08:00 |
|
Isotr0py
|
f6b32efb7f
|
[Bugfix] Fix and reorganize broken GGUF tests and bump gguf version (#16194)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-04-08 13:38:13 +08:00 |
|
Michael Goin
|
b99733d092
|
[Bugfix] Do not skip "empty" parts of chats that are parsable (#16219)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-04-08 05:14:15 +00:00 |
|
Yong Hoon Shin
|
05a015d6a5
|
Add warning for Attention backends that do not support irope yet (#16212)
|
2025-04-08 03:59:26 +00:00 |
|