Liangfu Chen
|
d2b58ca203
|
[Neuron][kernel] Fuse kv cache into a single tensor (#15911)
Signed-off-by: Liangfu Chen <liangfc@amazon.com>
|
2025-04-03 09:51:32 -07:00 |
|
Kyle Sayers
|
82e7e19a6e
|
[SupportsQuant] Chameleon, Chatglm, Commandr (#15952)
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
|
2025-04-03 08:25:22 -07:00 |
|
Kyle Sayers
|
421c462948
|
[SupportsQuant] Bert, Blip, Blip2, Bloom (#15573)
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
|
2025-04-03 08:23:19 -07:00 |
|
yihong
|
84884cd9ac
|
fix: tiny fix make format.sh excutable (#16015)
Signed-off-by: yihong0618 <zouzou0208@gmail.com>
|
2025-04-03 15:18:05 +00:00 |
|
Reid
|
a43aa183dc
|
[doc] update contribution link (#15922)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-04-03 10:47:31 +00:00 |
|
wwl2755
|
463bbb1835
|
[Bugfix][V1] Fix bug from putting llm_engine.model_executor in a background process (#15367)
Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>
|
2025-04-03 07:32:10 +00:00 |
|
youkaichao
|
5e125e74d1
|
[misc] improve error message for "Failed to infer device type" (#15994)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-04-03 14:45:03 +08:00 |
|
Ziji Shi (Steven)
|
06f21ce7a5
|
[Benchmark] Add AIMO Dataset to Benchmark (#15955)
Signed-off-by: Ziji Shi <shi.ziji.sm@gmail.com>
Signed-off-by: StevenShi-23 <shi.ziji.sm@gmail.com>
|
2025-04-03 06:09:18 +00:00 |
|
Aleksandr Malyshev
|
57a810db9c
|
[ROCM][V0] PA kennel selection when no sliding window provided (#15982)
Signed-off-by: Aleksandr Malyshev <maleksan@amd.com>
Co-authored-by: Aleksandr Malyshev <maleksan@amd.com>
|
2025-04-03 05:28:44 +00:00 |
|
youkaichao
|
8b664706aa
|
[bugfix] add seed in torchrun_example.py (#15980)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-04-03 12:25:01 +08:00 |
|
yihong
|
37bfee92bf
|
fix: better error message for get_config close #13889 (#15943)
Signed-off-by: yihong0618 <zouzou0208@gmail.com>
|
2025-04-03 03:53:19 +00:00 |
|
Aleksandr Malyshev
|
e73ff24e31
|
[ROCM][KERNEL] Paged attention for V1 (#15720)
Signed-off-by: Aleksandr Malyshev <maleksan@amd.com>
Signed-off-by: root <root@banff-cyxtera-s65-4.amd.com>
Co-authored-by: Aleksandr Malyshev <maleksan@amd.com>
Co-authored-by: root <root@banff-cyxtera-s65-4.amd.com>
|
2025-04-02 19:48:00 -07:00 |
|
Nicolò Lucchesi
|
bd7599d34a
|
[V1][TPU] Do not compile sampling more than needed (#15883)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-04-03 01:36:01 +00:00 |
|
Chengji Yao
|
01b6113659
|
[TPU] optimize the all-reduce performance (#15903)
Signed-off-by: Chengji Yao <chengjiyao@google.com>
|
2025-04-03 00:25:14 +00:00 |
|
Hyesoo Yang
|
1b84eff03a
|
[V1][TPU] TPU-optimized top-p implementation (avoids scattering). (#15736)
Signed-off-by: Hyesoo Yang <hyeygit@gmail.com>
Co-authored-by: root <root@t1v-n-822696b7-w-0.us-central2-b.c.tpu-prod-env-large-adhoc.internal>
|
2025-04-02 17:18:08 -07:00 |
|
Harry Mellor
|
55acf86bf8
|
Fix huggingface-cli[hf-xet] -> huggingface-cli[hf_xet] (#15969)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-04-02 23:37:30 +00:00 |
|
Michael Goin
|
f021b97993
|
[V1] Support Mistral3 in V1 (#15950)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-04-02 15:36:24 -07:00 |
|
youkaichao
|
1cab43c2d2
|
[misc] instruct pytorch to use nvml-based cuda check (#15951)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-04-03 01:02:58 +08:00 |
|
Nishidha
|
8bd651b318
|
Restricted cmake to be less than version 4 as 4.x breaks the build of… (#15859)
Signed-off-by: Nishidha Panpaliya <nishidha.panpaliya@partner.ibm.com>
|
2025-04-02 16:19:39 +00:00 |
|
Jee Jee Li
|
58e234a754
|
[Misc] V1 LoRA support CPU offload (#15843)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-04-02 23:04:43 +08:00 |
|
rongfu.leng
|
e86c414d6a
|
[Model] use AutoWeightsLoader in model load_weights (#15770)
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>
|
2025-04-02 07:47:31 -07:00 |
|
Li, Jiang
|
550b2801ad
|
[CPU][Bugfix] Using custom allreduce for CPU backend (#15934)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2025-04-02 07:46:47 -07:00 |
|
Matthias Matt
|
cefb9e5a28
|
[Frontend] Implement Tool Calling with tool_choice='required' (#13483)
Signed-off-by: Liangfu Chen <liangfc@amazon.com>
Signed-off-by: Matt, Matthias <matthias.matt@tuwien.ac.at>
Co-authored-by: Liangfu Chen <liangfc@amazon.com>
Co-authored-by: mgoin <michael@neuralmagic.com>
|
2025-04-02 07:45:45 -07:00 |
|
Mark McLoughlin
|
98d7367b61
|
[Metrics] Hide deprecated metrics (#15458)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
|
2025-04-02 07:37:19 -07:00 |
|
Chauncey
|
594a8b9030
|
[Bugfix] Fix the issue where the model name is empty string, causing no response with the model name. (#15938)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2025-04-02 06:33:52 -07:00 |
|
Kay Yan
|
44f990515b
|
[CI] Remove duplicate entrypoints-test (#15940)
Signed-off-by: Kay Yan <kay.yan@daocloud.io>
|
2025-04-02 02:44:01 -07:00 |
|
Brayden Zhong
|
252937806c
|
[Bugfix][Benchmarks] Ensure async_request_deepspeed_mii uses the OpenAI choices key (#15926)
Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca>
|
2025-04-02 02:19:35 -07:00 |
|
Harry Mellor
|
51826d51fa
|
Add minimum version for huggingface_hub to enable Xet downloads (#15873)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-04-02 02:03:36 -07:00 |
|
Russell Bryant
|
14e53ed11f
|
[V1] Fix json_object support with xgrammar (#15488)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-04-02 02:00:08 -07:00 |
|
Eric Tang
|
ddb94c2605
|
[core] Add tags parameter to wake_up() (#15500)
Signed-off-by: Eric <erictang000@gmail.com>
|
2025-04-02 01:59:27 -07:00 |
|
LukasBluebaum
|
90969fb39a
|
[Kernel] Add more dtype support for GGUF dequantization (#15879)
Signed-off-by: lukas.bluebaum <lukas.bluebaum@aleph-alpha.com>
|
2025-04-02 01:58:48 -07:00 |
|
Chris Thi
|
101f1481f9
|
[Build/CI] Update lm-eval to 0.4.8 (#15912)
Signed-off-by: Chris Thi <chris.c.thi@gmail.com>
|
2025-04-02 01:47:57 -07:00 |
|
Thien Tran
|
2edc87b161
|
[Bugfix] Fix cache block size calculation for CPU MLA (#15848)
Signed-off-by: Thien Tran <gau.nernst@yahoo.com.sg>
|
2025-04-02 01:45:02 -07:00 |
|
Jee Jee Li
|
4203926f10
|
[CI/Build] Further clean up LoRA tests (#15920)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-04-02 01:39:09 -07:00 |
|
Chauncey
|
cdb57015a7
|
[Misc] Replace print with logger (#15923)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2025-04-02 01:37:38 -07:00 |
|
Li Wang
|
aa557e6422
|
[Benchmark]Fix error message (#15866)
Signed-off-by: wangli <wangli858794774@gmail.com>
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
|
2025-04-02 01:32:24 -07:00 |
|
Roger Wang
|
0e00d40e4f
|
[V1][Bugfix] Fix typo in MoE TPU checking (#15927)
Signed-off-by: Roger Wang <ywang@roblox.com>
|
2025-04-01 23:46:42 -07:00 |
|
chun
|
c920e01242
|
[Doc] Update rocm.inc.md (#15917)
Signed-off-by: chun37 <chun.jb.37@gmail.com>
|
2025-04-01 23:38:26 -07:00 |
|
Woosuk Kwon
|
274d8e8818
|
[V1][Minor] Enhance SpecDecoding Metrics Log in V1 (#15902)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-04-01 23:38:02 -07:00 |
|
Thien Tran
|
2039c6305b
|
[Bugfix] Fix imports for MoE on CPU (#15841)
Signed-off-by: Thien Tran <gau.nernst@yahoo.com.sg>
|
2025-04-02 03:33:55 +00:00 |
|
Brayden Zhong
|
6efb195a6e
|
[V1] Fix: make sure k_index is int64 for apply_top_k_only (#15907)
Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca>
|
2025-04-01 19:06:44 -07:00 |
|
Ekagra Ranjan
|
24b7fb455a
|
[Spec Decode] Fix input triton kernel for eagle (#15909)
|
2025-04-01 18:15:14 -07:00 |
|
Simon Mo
|
58f5a59769
|
[Docs] Add Intel as Sponsor (#15913)
Signed-off-by: simon-mo <simon.mo@hey.com>
|
2025-04-01 17:16:55 -07:00 |
|
Simon Mo
|
db9dfcfa6a
|
[Docs] Add Ollama meetup slides (#15905)
Signed-off-by: simon-mo <simon.mo@hey.com>
|
2025-04-01 13:58:59 -07:00 |
|
Gerald
|
9ef98d527e
|
[Model][MiniMaxText01] Support MiniMaxText01 model inference (#13454)
Signed-off-by: qscqesze <475517977@qq.com>
Co-authored-by: qingjun <qingjun@minimaxi.com>
Co-authored-by: qscqesze <475517977@qq.com>
|
2025-04-01 16:23:55 -04:00 |
|
yihong
|
93491aefc7
|
[BugFix] make sure socket close (#15875)
Signed-off-by: yihong0618 <zouzou0208@gmail.com>
|
2025-04-01 13:10:24 -07:00 |
|
Simon Mo
|
7acd539cd7
|
[Docs] update usage stats language (#15898)
Signed-off-by: simon-mo <simon.mo@hey.com>
|
2025-04-01 12:54:13 -07:00 |
|
Woosuk Kwon
|
e75a6301bd
|
[V1][Spec Decode] Implement Eagle Proposer [1/N] (#15729)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-04-01 12:33:16 -07:00 |
|
Mark McLoughlin
|
a79cc68b3a
|
[V1][Metrics] Initial speculative decoding metrics (#15151)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
|
2025-04-01 10:45:04 -07:00 |
|
Roger Wang
|
7e3f7a4ee7
|
[CI] Disable flaky structure decoding test temporarily. (#15892)
Signed-off-by: Roger Wang <ywang@roblox.com>
|
2025-04-01 17:42:34 +00:00 |
|