Robin
8017c8db7f
[Doc]Update image to latest version ( #16186 )
...
Signed-off-by: WangErXiao <863579016@qq.com>
2025-04-07 14:17:39 +00:00
Reid
dc3529dbf6
[Misc] improve example mlpspeculator and llm_engine_example ( #16175 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
2025-04-07 11:53:52 +00:00
YamPengLi
7699258ef0
[Model] Add Qwen3 and Qwen3MoE ( #15289 )
...
Signed-off-by: YamPengLi <yampayne.lyp@alibaba-inc.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-04-07 04:06:41 -07:00
Shanshan Shen
e9ba99f296
[V1][Structured Output] Add supports_structured_output()
method to Platform ( #16148 )
...
Signed-off-by: shen-shanshan <467638484@qq.com>
2025-04-07 11:06:24 +00:00
Isotr0py
7c80368710
[VLM] Florence-2 supports online serving ( #16164 )
...
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-04-07 04:04:02 -07:00
yihong
95d63f38c0
doc: fix some typos in doc ( #16154 )
...
Signed-off-by: yihong0618 <zouzou0208@gmail.com>
2025-04-07 05:32:06 +00:00
Roger Wang
bb8dab821e
[CI] Set max transformers version for Ultravox model test ( #16149 )
...
Signed-off-by: Roger Wang <ywang@roblox.com>
2025-04-07 04:37:58 +00:00
Isotr0py
fc0f87768a
[Bugfix] Make dummy encoder prompt padding alternative and add missing warnings ( #16129 )
...
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-04-07 04:07:15 +00:00
Cyrus Leung
0a57386721
[Misc] Update Mistral-3.1 example ( #16147 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-04-07 03:57:37 +00:00
Woosuk Kwon
3749e28774
[V1][Minor] Minor simplification for get_computed_blocks ( #16139 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-04-06 20:38:12 -07:00
Kay Yan
86fc2321ff
[Metrics] Add bucket for request_latency
, time_to_first_token
and time_per_output_token
( #15202 )
...
Signed-off-by: Kay Yan <kay.yan@daocloud.io>
2025-04-06 20:34:51 -07:00
Martin Hoyer
2549c0dfef
Fix requires-python ( #16132 )
2025-04-06 19:22:25 -07:00
Woosuk Kwon
b10e519895
[V1][Minor] Optimize get_cached_block ( #16135 )
2025-04-06 20:48:14 +00:00
Chengji Yao
9bde5ba127
[TPU] Update PyTorch/XLA ( #16130 )
...
Signed-off-by: Chengji Yao <chengjiyao@google.com>
2025-04-06 18:25:55 +00:00
Reid
72c8f1ad04
[Misc] update requires-python in pyproject.toml ( #16116 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
2025-04-06 14:56:34 +00:00
paolovic
da224daaa9
[Bugfix] add hf_token to EngineArgs ( #16093 )
...
Signed-off-by: paolovic <paul-philipp.luley@uzh.ch>
Co-authored-by: paolovic <paul-philipp.luley@uzh.ch>
2025-04-06 14:47:33 +00:00
Varun Sundar Rabindranath
3a100b9278
[Bugfix] LoRA : Fix the order in which the kernels process LoRAs ( #16040 )
...
Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
2025-04-06 14:04:50 +00:00
rongfu.leng
242a637aea
[Model] use AutoWeightsLoader for stablelm,starcoder2,zamba2 ( #16103 )
...
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>
2025-04-06 05:52:01 -07:00
Isotr0py
c2a9671510
[Misc] Improve model redirect to accept json dictionary ( #16119 )
...
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-04-06 05:51:45 -07:00
Paul Schweigert
d5ae4f7f42
[Doc][Bugfix] Add missing EOF in k8s deploy doc ( #16025 )
2025-04-06 12:10:57 +00:00
Reid
b6c502a150
[Misc] refactor example eagle ( #16100 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
2025-04-06 09:42:48 +00:00
Roger Wang
9ca710e525
[CI][V1] Fix passing tokenizer
as kwarg to validate_guidance_grammar
( #16117 )
...
Signed-off-by: Roger Wang <ywang@roblox.com>
2025-04-06 16:18:00 +08:00
Ben Jackson
eb07c8cb5b
[Frontend] Fix typo in tool chat templates for llama3.2 and toolace ( #14501 )
...
Signed-off-by: Ben Jackson <ben@ben.com>
2025-04-06 07:44:36 +00:00
Hyesoo Yang
ba10801961
[Benchmark] Add sampling parameters to benchmark_serving. ( #16022 )
...
Signed-off-by: Hyesoo Yang <hyeygit@gmail.com>
2025-04-06 12:30:35 +08:00
Lucia Fang
620fc2d09e
[Model] fix model testing for TeleChat2ForCausalLM and V0 llama4 ( #16112 )
...
Signed-off-by: Lu Fang <fanglu@fb.com>
2025-04-05 21:23:40 -07:00
Jonghyun Choe
29283eaa7e
[Model] use AutoWeightsLoader for phi, gemma, deepseek ( #16088 )
...
Signed-off-by: Jonghyun Choe <andy.choe729@gmail.com>
2025-04-05 20:34:38 -07:00
Jinzhen Lin
2fa66ef713
[Bugfix] fix use_atomic_add support of marlin kernel when using v1 engine ( #15946 )
...
Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>
2025-04-05 20:04:22 -07:00
Chauncey
13affc432d
[Misc] Remove redundant code ( #16098 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-04-05 20:03:50 -07:00
Reid
d8f094a92a
[Misc] format output for encoder_decoder.py ( #16095 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
2025-04-05 19:57:18 -07:00
Harry Mellor
97ae6d777f
Fix some capitalisations in generated examples doc titles ( #16094 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-04-05 13:44:03 +00:00
yihong
6baeee70d1
Revert "doc: add info for macos clang errors ( #16049 )" ( #16091 )
...
Signed-off-by: yihong0618 <zouzou0208@gmail.com>
2025-04-05 11:51:51 +00:00
Reid
d2517a4939
[doc] fix 404 ( #16082 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
2025-04-05 11:39:18 +00:00
yihong
6342adc438
fix: support clang17 for macos and fix the real libomp ( #16086 )
...
Signed-off-by: yihong0618 <zouzou0208@gmail.com>
2025-04-05 11:00:12 +00:00
Kevin H. Luu
0adba91547
[CI] Fix benchmark script level ( #16089 )
2025-04-05 03:36:01 -07:00
Tristan Leclercq
4285e423a6
[Misc] Auto detect bitsandbytes pre-quantized models ( #16027 )
...
Signed-off-by: Tristan Leclercq <tristanleclercq@gmail.com>
2025-04-04 23:30:45 -07:00
Woosuk Kwon
63375f0cdb
[V1][Spec Decode] Update N-gram Proposer Interface ( #15750 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-04-04 16:32:54 -07:00
Michael Goin
70ad3f9e98
[Bugfix][TPU] Fix V1 TPU worker for sliding window ( #16059 )
...
Signed-off-by: Michael Goin <mgoin64@gmail.com>
2025-04-04 23:31:19 +00:00
bnellnm
d6fc629f4d
[Kernel][Minor] Re-fuse triton moe weight application ( #16071 )
...
Signed-off-by: Bill Nell <bnell@redhat.com>
2025-04-04 23:27:34 +00:00
Roger Wang
af51d80fa1
Revert "[V1] Scatter and gather placeholders in the model runner" ( #16075 )
2025-04-04 14:50:57 -07:00
Cyrus Leung
f5722a5052
[V1] Scatter and gather placeholders in the model runner ( #15712 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: Roger Wang <ywang@roblox.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
2025-04-04 21:26:44 +00:00
Nick Hill
651cf0fec1
[V1] DP scale-out (1/N): Use zmq ROUTER/DEALER sockets for input queue ( #15906 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-04-04 12:56:43 -07:00
Kevin H. Luu
4dc52e1c53
[CI] Reorganize .buildkite directory ( #16001 )
...
Signed-off-by: kevin <kevin@anyscale.com>
2025-04-04 12:16:20 -07:00
Michael Goin
4708f13a9c
[Bugfix] Fix default behavior/fallback for pp in v1 ( #16057 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-04-04 17:58:08 +00:00
Gregory Shtrasberg
a6d042df0a
[ROCm][Bugfix] Bring back fallback to eager mode removed in #14917 , but for ROCm only ( #15413 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
2025-04-04 09:40:37 -07:00
Gregory Shtrasberg
40a36ccfeb
[ROCm][Bugfix] Use platform specific FP8 dtype ( #15717 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
2025-04-04 09:40:20 -07:00
Ilya Markov
ef608c37a7
[Distributed] [ROCM] Fix custom allreduce enable checks ( #16010 )
...
Signed-off-by: ilmarkov <imarkov@redhat.com>
Co-authored-by: ilmarkov <imarkov@redhat.com>
2025-04-04 09:39:08 -07:00
Li, Jiang
2386803f2a
[CPU] Change default block_size for CPU backend ( #16002 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2025-04-04 09:39:05 -07:00
Ziji Shi (Steven)
95862f7b4d
[Benchmark][Doc] Update throughput benchmark and README ( #15998 )
...
Signed-off-by: StevenShi-23 <shi.ziji.sm@gmail.com>
Signed-off-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
2025-04-04 09:39:02 -07:00
Isotr0py
230b131b54
[Bugfix][kernels] Fix half2float conversion in gguf kernels ( #15995 )
...
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-04-04 09:38:58 -07:00
liuzhenwei
0812d8dd41
[Hardware][Gaudi][BugFix] fix arguments of hpu fused moe ( #15945 )
...
Signed-off-by: zhenwei <zhenweiliu@habana.ai>
2025-04-04 09:38:55 -07:00