Harry Mellor
|
e6e3c55ef2
|
Move dockerfiles into their own directory (#14549)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-03-31 13:47:32 -07:00 |
|
Mark McLoughlin
|
f98a4920f9
|
[V1][Core] Remove unused speculative config from scheduler (#15818)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
|
2025-03-31 19:15:21 +00:00 |
|
Harry Mellor
|
d4bfc23ef0
|
Fix Transformers backend compatibility check (#15290)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-03-31 10:27:07 -07:00 |
|
Alexander Matveev
|
9a2160fa55
|
[V1] TPU CI - Add basic perf regression test (#15414)
Signed-off-by: Alexander Matveev <amatveev@redhat.com>
|
2025-03-31 13:25:20 -04:00 |
|
yihong
|
2de4118243
|
fix: change GB to GiB in logging close #14979 (#15807)
Signed-off-by: yihong0618 <zouzou0208@gmail.com>
|
2025-03-31 10:00:50 -07:00 |
|
shangmingc
|
239b7befdd
|
[V1][Spec Decode] Remove deprecated spec decode config params (#15466)
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>
|
2025-03-31 09:19:35 -07:00 |
|
Cyrus Leung
|
09e974d483
|
[Bugfix] Check dimensions of multimodal embeddings in V1 (#15816)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-03-31 09:01:35 -07:00 |
|
Harry Mellor
|
e5ef4fa99a
|
Upgrade transformers to v4.50.3 (#13905)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-03-31 08:59:37 -07:00 |
|
Mrm
|
037bcd942c
|
[Bugfix] Fix missing return value in load_weights method of adapters.py (#15542)
Signed-off-by: noc-turne <2270929247@qq.com>
|
2025-03-31 06:56:42 -07:00 |
|
Alex Brooks
|
c2e7507ad4
|
[Bugfix] Fix Crashing When Loading Modules With Batchnorm Stats (#15813)
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
|
2025-03-31 13:23:53 +00:00 |
|
Naveassaf
|
3aa2b6a637
|
[Model] Update support for NemotronNAS models (#15008)
Signed-off-by: Nave Assaf <nassaf@nvidia.com>
|
2025-03-31 20:35:14 +08:00 |
|
youkaichao
|
555aa21905
|
[V1] Fully Transparent Implementation of CPU Offloading (#15354)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-03-31 20:22:34 +08:00 |
|
yihong
|
e7ae3bf3d6
|
fix: better install requirement for install in setup.py (#15796)
Signed-off-by: yihong0618 <zouzou0208@gmail.com>
|
2025-03-31 05:13:32 -07:00 |
|
Harry Mellor
|
b932c048ac
|
Recommend developing with Python 3.12 in developer guide (#15811)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2025-03-31 11:54:49 +00:00 |
|
Charlie Fu
|
e85829450d
|
[Feature][ROCm]Enable fusion pass for torch.compile on ROCm (#15050)
Signed-off-by: charlifu <charlifu@amd.com>
|
2025-03-31 04:42:18 -07:00 |
|
Jennifer Zhao
|
effc5d24fa
|
[Benchmark] Update Vision Arena Dataset and HuggingFaceDataset Setup (#15748)
Signed-off-by: Jennifer Zhao <ai.jenniferzhao@gmail.com>
|
2025-03-31 15:38:58 +08:00 |
|
Chengyang LIU
|
18ed3132d2
|
[Misc] update the comments (#15780)
Signed-off-by: chengyang liu <lcy4869@gmail.com>
Co-authored-by: chengyang liu <lcy4869@gmail.com>
|
2025-03-30 19:39:56 -07:00 |
|
Woosuk Kwon
|
9b459eca88
|
[V1][Scheduler] Avoid calling _try_schedule_encoder_inputs for every request (#15778)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-03-30 14:10:42 -07:00 |
|
yihong
|
70fedd0f79
|
fix: Comments to English for better dev experience (#15768)
Signed-off-by: yihong0618 <zouzou0208@gmail.com>
|
2025-03-30 10:47:57 -07:00 |
|
kYLe
|
bb103b29bf
|
[Bugfix] Added embed_is_patch mask for fuyu model (#15731)
Signed-off-by: Kyle Huang <kylhuang@nvidia.com>
|
2025-03-30 03:45:08 -07:00 |
|
yihong
|
248e76c4df
|
fix: lint fix a ruff checkout syntax error (#15767)
Signed-off-by: yihong0618 <zouzou0208@gmail.com>
|
2025-03-30 03:36:02 -07:00 |
|
Cyrus Leung
|
803d5c35f3
|
[V1] Override mm_counts for dummy data creation (#15703)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-03-30 03:20:42 -07:00 |
|
pansicheng
|
7fd8c0f85c
|
fix test_phi3v (#15321)
Signed-off-by: pansicheng <sicheng.pan.chn@gmail.com>
|
2025-03-30 02:01:34 -07:00 |
|
Reid
|
44c3a5abc3
|
[doc] update conda to usage link in installation (#15761)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-03-30 08:12:13 +00:00 |
|
Julien Denize
|
6909a76201
|
[Bugfix] Fix Mistral guided generation using xgrammar (#15704)
Signed-off-by: Julien Denize <julien.denize@mistral.ai>
|
2025-03-29 20:20:19 -07:00 |
|
Chauncey
|
045533716b
|
[CI] xgrammar structured output supports Enum. (#15757)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2025-03-29 20:20:02 -07:00 |
|
Isotr0py
|
3c0ff914ac
|
[Bugfix] Fix Mllama interleaved images input support (#15564)
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Chen Zhang <zhangch99@outlook.com>
|
2025-03-29 18:11:15 +00:00 |
|
Woosuk Kwon
|
2bc4be4e32
|
[V1][Minor] Simplify rejection sampler's parse_output (#15741)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-03-29 09:25:17 -07:00 |
|
Roger Wang
|
c67abd614f
|
[V1] Support interleaved modality items (#15605)
Signed-off-by: Roger Wang <ywang@roblox.com>
|
2025-03-29 06:30:09 -07:00 |
|
shangmingc
|
6fa7cd3dbc
|
[Feature][Disaggregated] Support XpYd disaggregated prefill with MooncakeStore (#12957)
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>
|
2025-03-29 04:01:46 -07:00 |
|
wwl2755
|
94744ba41a
|
[V1] [Feature] Collective RPC (#15444)
Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>
|
2025-03-29 03:39:14 -07:00 |
|
TJian
|
4965ec42d2
|
[FEAT] [ROCm] Add AITER int8 scaled gemm kernel (#15433)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
|
2025-03-29 03:33:56 -07:00 |
|
Reid
|
73aa7041bf
|
[doc] update doc (#15740)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-03-29 04:27:22 +00:00 |
|
yarongmu-google
|
7c1f760024
|
[Kernel][TPU][ragged-paged-attn] vLLM code change for PR#8896 (#15659)
Signed-off-by: Yarong Mu <ymu@google.com>
|
2025-03-28 21:13:15 -07:00 |
|
Nicolò Lucchesi
|
da461f3cbf
|
[TPU][V1][Bugfix] Fix w8a8 recompiilation with GSM8K (#15714)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-03-28 21:13:06 -07:00 |
|
Jinzhen Lin
|
5b800f0932
|
[Bugfix] set VLLM_WORKER_MULTIPROC_METHOD=spawn for vllm.entrypoionts.openai.api_server (#15700)
Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>
|
2025-03-28 21:12:26 -07:00 |
|
cyyever
|
8427f70493
|
Use numba 0.61 for python 3.10+ to support numpy>=2 (#15692)
Signed-off-by: cyy <cyyever@outlook.com>
|
2025-03-29 12:11:51 +08:00 |
|
Russell Bryant
|
7a7992085b
|
[CI] Speed up V1 structured output tests (#15718)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-03-28 21:10:45 -07:00 |
|
Varun Sundar Rabindranath
|
1286211f57
|
[Bugfix] LoRA V1: add and fix entrypoints tests (#15715)
Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
|
2025-03-28 21:10:41 -07:00 |
|
Nick Hill
|
6d531ad7b8
|
[Misc][V1] Misc code streamlining (#15723)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-03-28 20:59:47 -07:00 |
|
Ce Gao
|
762b424a52
|
[Docs] Document v0 engine support in reasoning outputs (#15739)
Signed-off-by: Ce Gao <cegao@tensorchord.ai>
|
2025-03-29 03:46:57 +00:00 |
|
pengyuange
|
de1cb38769
|
[Model] Support Skywork-R1V (#15397)
Signed-off-by: jiacai.liu <932997367@qq.com>
Co-authored-by: jiacai.liu <932997367@qq.com>
|
2025-03-28 20:39:21 -07:00 |
|
Gregory Shtrasberg
|
c802f5430d
|
[ROCm][AMD][Build] Update AMD supported arch list (#15632)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
|
2025-03-28 20:39:18 -07:00 |
|
simpx
|
cff8991a50
|
[Docs][V1] Optimize diagrams in prefix caching design (#15716)
|
2025-03-29 03:33:58 +00:00 |
|
daniel-salib
|
f3f8d8fff4
|
implement prometheus fast-api-instrumentor for http service metrics (#15657)
|
2025-03-29 00:12:02 +00:00 |
|
Reid
|
26df46ee59
|
[Misc] cli auto show default value (#15582)
Signed-off-by: reidliu41 <reid201711@gmail.com>
|
2025-03-28 22:23:00 +00:00 |
|
Alexander Matveev
|
c3f687ac22
|
[V1] TPU - Fix the chunked prompt bug (#15713)
Signed-off-by: Alexander Matveev <amatveev@redhat.com>
|
2025-03-28 20:19:04 +00:00 |
|
Luka Govedič
|
04437e313d
|
[Bugfix] [torch.compile] Add Dynamo metrics context during compilation (#15639)
Signed-off-by: luka <luka@neuralmagic.com>
|
2025-03-28 14:01:09 -06:00 |
|
Robert Shaw
|
038bededba
|
[TPU] [Perf] Improve Memory Usage Estimation (#15671)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
|
2025-03-28 17:37:52 +00:00 |
|
shangmingc
|
d03308be0c
|
[Misc] Remove stale func in KVTransferConfig (#14746)
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>
|
2025-03-28 17:33:32 +00:00 |
|