Kevin H. Luu
|
4dc52e1c53
|
[CI] Reorganize .buildkite directory (#16001)
Signed-off-by: kevin <kevin@anyscale.com>
|
2025-04-04 12:16:20 -07:00 |
|
Michael Goin
|
4708f13a9c
|
[Bugfix] Fix default behavior/fallback for pp in v1 (#16057)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-04-04 17:58:08 +00:00 |
|
Gregory Shtrasberg
|
a6d042df0a
|
[ROCm][Bugfix] Bring back fallback to eager mode removed in #14917, but for ROCm only (#15413)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
|
2025-04-04 09:40:37 -07:00 |
|
Gregory Shtrasberg
|
40a36ccfeb
|
[ROCm][Bugfix] Use platform specific FP8 dtype (#15717)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
|
2025-04-04 09:40:20 -07:00 |
|
Ilya Markov
|
ef608c37a7
|
[Distributed] [ROCM] Fix custom allreduce enable checks (#16010)
Signed-off-by: ilmarkov <imarkov@redhat.com>
Co-authored-by: ilmarkov <imarkov@redhat.com>
|
2025-04-04 09:39:08 -07:00 |
|
Li, Jiang
|
2386803f2a
|
[CPU] Change default block_size for CPU backend (#16002)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2025-04-04 09:39:05 -07:00 |
|
Ziji Shi (Steven)
|
95862f7b4d
|
[Benchmark][Doc] Update throughput benchmark and README (#15998)
Signed-off-by: StevenShi-23 <shi.ziji.sm@gmail.com>
Signed-off-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
|
2025-04-04 09:39:02 -07:00 |
|
Isotr0py
|
230b131b54
|
[Bugfix][kernels] Fix half2float conversion in gguf kernels (#15995)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-04-04 09:38:58 -07:00 |
|
liuzhenwei
|
0812d8dd41
|
[Hardware][Gaudi][BugFix] fix arguments of hpu fused moe (#15945)
Signed-off-by: zhenwei <zhenweiliu@habana.ai>
|
2025-04-04 09:38:55 -07:00 |
|
Jonghyun Choe
|
bf7e3c51ae
|
[Model] use AutoWeightsLoader for baichuan, gpt-neox, mpt (#15939)
Signed-off-by: Jonghyun Choe <andy.choe729@gmail.com>
|
2025-04-04 09:38:52 -07:00 |
|
Mark McLoughlin
|
a35a8a8392
|
[V1][Spec Decode] Avoid logging useless nan metrics (#16023)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
|
2025-04-04 08:52:41 -07:00 |
|
yihong
|
4ef0bb1fcf
|
doc: add info for macos clang errors (#16049)
Signed-off-by: yihong0618 <zouzou0208@gmail.com>
|
2025-04-04 14:58:16 +00:00 |
|
Chengji Yao
|
fadc59c0e6
|
[TPU][V1] Remove ragged attention kernel parameter hard coding (#16041)
Signed-off-by: Chengji Yao <chengjiyao@google.com>
|
2025-04-04 07:48:50 -04:00 |
|
Reid
|
86cbd2eee9
|
[Misc] improve gguf check (#15974)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-04-04 01:33:36 +00:00 |
|
Huy Do
|
092475f738
|
[ROCm] Tweak the benchmark script to run on ROCm (#14252)
|
2025-04-03 17:12:48 -07:00 |
|
bnellnm
|
dcc56d62da
|
[Bugfix] Fix function names in test_block_fp8.py (#16033)
Signed-off-by: Bill Nell <bnell@redhat.com>
|
2025-04-03 23:01:34 +00:00 |
|
Robert Shaw
|
f15e70d906
|
[TPU] Switch Test to Non-Sliding Window (#15981)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
|
2025-04-03 14:28:45 -07:00 |
|
iefgnoix
|
b6be6f8d1e
|
[TPU] Support sliding window and logit soft capping in the paged attention kernel for TPU. (#15732)
Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>
|
2025-04-03 14:23:28 -07:00 |
|
Alexei-V-Ivanov-AMD
|
03a70eacaf
|
Re-enable the AMD Testing for the passing tests. (#15586)
Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com>
|
2025-04-03 11:05:17 -07:00 |
|
yarongmu-google
|
45b1ff7a25
|
[Misc][Performance] Advance tpu.txt to the most recent nightly torch … (#16024)
|
2025-04-03 17:32:54 +00:00 |
|
bnellnm
|
15ba07ef25
|
[Minor] Fused experts refactor (#15914)
Signed-off-by: Bill Nell <bnell@redhat.com>
|
2025-04-03 10:19:38 -07:00 |
|
Liangfu Chen
|
d2b58ca203
|
[Neuron][kernel] Fuse kv cache into a single tensor (#15911)
Signed-off-by: Liangfu Chen <liangfc@amazon.com>
|
2025-04-03 09:51:32 -07:00 |
|
Kyle Sayers
|
82e7e19a6e
|
[SupportsQuant] Chameleon, Chatglm, Commandr (#15952)
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
|
2025-04-03 08:25:22 -07:00 |
|
Kyle Sayers
|
421c462948
|
[SupportsQuant] Bert, Blip, Blip2, Bloom (#15573)
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
|
2025-04-03 08:23:19 -07:00 |
|
yihong
|
84884cd9ac
|
fix: tiny fix make format.sh excutable (#16015)
Signed-off-by: yihong0618 <zouzou0208@gmail.com>
|
2025-04-03 15:18:05 +00:00 |
|
Reid
|
a43aa183dc
|
[doc] update contribution link (#15922)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-04-03 10:47:31 +00:00 |
|
wwl2755
|
463bbb1835
|
[Bugfix][V1] Fix bug from putting llm_engine.model_executor in a background process (#15367)
Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>
|
2025-04-03 07:32:10 +00:00 |
|
youkaichao
|
5e125e74d1
|
[misc] improve error message for "Failed to infer device type" (#15994)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-04-03 14:45:03 +08:00 |
|
Ziji Shi (Steven)
|
06f21ce7a5
|
[Benchmark] Add AIMO Dataset to Benchmark (#15955)
Signed-off-by: Ziji Shi <shi.ziji.sm@gmail.com>
Signed-off-by: StevenShi-23 <shi.ziji.sm@gmail.com>
|
2025-04-03 06:09:18 +00:00 |
|
Aleksandr Malyshev
|
57a810db9c
|
[ROCM][V0] PA kennel selection when no sliding window provided (#15982)
Signed-off-by: Aleksandr Malyshev <maleksan@amd.com>
Co-authored-by: Aleksandr Malyshev <maleksan@amd.com>
|
2025-04-03 05:28:44 +00:00 |
|
youkaichao
|
8b664706aa
|
[bugfix] add seed in torchrun_example.py (#15980)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-04-03 12:25:01 +08:00 |
|
yihong
|
37bfee92bf
|
fix: better error message for get_config close #13889 (#15943)
Signed-off-by: yihong0618 <zouzou0208@gmail.com>
|
2025-04-03 03:53:19 +00:00 |
|
Aleksandr Malyshev
|
e73ff24e31
|
[ROCM][KERNEL] Paged attention for V1 (#15720)
Signed-off-by: Aleksandr Malyshev <maleksan@amd.com>
Signed-off-by: root <root@banff-cyxtera-s65-4.amd.com>
Co-authored-by: Aleksandr Malyshev <maleksan@amd.com>
Co-authored-by: root <root@banff-cyxtera-s65-4.amd.com>
|
2025-04-02 19:48:00 -07:00 |
|
Nicolò Lucchesi
|
bd7599d34a
|
[V1][TPU] Do not compile sampling more than needed (#15883)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-04-03 01:36:01 +00:00 |
|
Chengji Yao
|
01b6113659
|
[TPU] optimize the all-reduce performance (#15903)
Signed-off-by: Chengji Yao <chengjiyao@google.com>
|
2025-04-03 00:25:14 +00:00 |
|
Hyesoo Yang
|
1b84eff03a
|
[V1][TPU] TPU-optimized top-p implementation (avoids scattering). (#15736)
Signed-off-by: Hyesoo Yang <hyeygit@gmail.com>
Co-authored-by: root <root@t1v-n-822696b7-w-0.us-central2-b.c.tpu-prod-env-large-adhoc.internal>
|
2025-04-02 17:18:08 -07:00 |
|
Harry Mellor
|
55acf86bf8
|
Fix huggingface-cli[hf-xet] -> huggingface-cli[hf_xet] (#15969)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-04-02 23:37:30 +00:00 |
|
Michael Goin
|
f021b97993
|
[V1] Support Mistral3 in V1 (#15950)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-04-02 15:36:24 -07:00 |
|
youkaichao
|
1cab43c2d2
|
[misc] instruct pytorch to use nvml-based cuda check (#15951)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-04-03 01:02:58 +08:00 |
|
Nishidha
|
8bd651b318
|
Restricted cmake to be less than version 4 as 4.x breaks the build of… (#15859)
Signed-off-by: Nishidha Panpaliya <nishidha.panpaliya@partner.ibm.com>
|
2025-04-02 16:19:39 +00:00 |
|
Jee Jee Li
|
58e234a754
|
[Misc] V1 LoRA support CPU offload (#15843)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-04-02 23:04:43 +08:00 |
|
rongfu.leng
|
e86c414d6a
|
[Model] use AutoWeightsLoader in model load_weights (#15770)
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>
|
2025-04-02 07:47:31 -07:00 |
|
Li, Jiang
|
550b2801ad
|
[CPU][Bugfix] Using custom allreduce for CPU backend (#15934)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2025-04-02 07:46:47 -07:00 |
|
Matthias Matt
|
cefb9e5a28
|
[Frontend] Implement Tool Calling with tool_choice='required' (#13483)
Signed-off-by: Liangfu Chen <liangfc@amazon.com>
Signed-off-by: Matt, Matthias <matthias.matt@tuwien.ac.at>
Co-authored-by: Liangfu Chen <liangfc@amazon.com>
Co-authored-by: mgoin <michael@neuralmagic.com>
|
2025-04-02 07:45:45 -07:00 |
|
Mark McLoughlin
|
98d7367b61
|
[Metrics] Hide deprecated metrics (#15458)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
|
2025-04-02 07:37:19 -07:00 |
|
Chauncey
|
594a8b9030
|
[Bugfix] Fix the issue where the model name is empty string, causing no response with the model name. (#15938)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2025-04-02 06:33:52 -07:00 |
|
Kay Yan
|
44f990515b
|
[CI] Remove duplicate entrypoints-test (#15940)
Signed-off-by: Kay Yan <kay.yan@daocloud.io>
|
2025-04-02 02:44:01 -07:00 |
|
Brayden Zhong
|
252937806c
|
[Bugfix][Benchmarks] Ensure async_request_deepspeed_mii uses the OpenAI choices key (#15926)
Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca>
|
2025-04-02 02:19:35 -07:00 |
|
Harry Mellor
|
51826d51fa
|
Add minimum version for huggingface_hub to enable Xet downloads (#15873)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-04-02 02:03:36 -07:00 |
|
Russell Bryant
|
14e53ed11f
|
[V1] Fix json_object support with xgrammar (#15488)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-04-02 02:00:08 -07:00 |
|