Russell Bryant
08a1a1121d
benchmarks: simplify test jsonschema ( #14567 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2025-03-11 13:39:30 +00:00
Isotr0py
1477ffc381
[VLM] Cleanup siglip legacy code and fix broken paligemma multimodal processor ( #14602 )
...
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-03-11 11:27:36 +00:00
yexin(叶鑫)
70b808fe1a
[Perf]:Optimize qwen2-vl to reduce cudaMemcpyAsync ( #14377 )
...
Signed-off-by: cynthieye <987073381@qq.com>
2025-03-11 07:39:56 +00:00
Isotr0py
63d635d179
[Misc] Correct deepseek-vl2 chat template ( #14558 )
...
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-03-11 04:37:11 +00:00
Roger Wang
1fc973c0b5
[V1][Core] Fix memory issue with logits & sampling ( #14508 )
...
Signed-off-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Varun Sundar Rabindranath <3337719+varun-sundar-rabindranath@users.noreply.github.com>
2025-03-11 04:03:41 +00:00
Concurrensee
c982ac5722
[Bugfix] Fix FP16 overflow for DeepSeek V2 ( #13232 )
...
Signed-off-by: Yida Wu <yida.wu@amd.com>
2025-03-10 20:46:59 -07:00
Cody Yu
4290b704ff
[V1][PP] Do not block engine core when no requests to schedule ( #14585 )
...
Signed-off-by: Cody Yu <hao.yu.cody@gmail.com>
2025-03-10 19:48:24 -07:00
Liangfu Chen
c91b64f749
[neuron] add reshape_and_cache ( #14391 )
2025-03-10 18:37:29 -07:00
gnovack
d6123170d5
[Neuron] Add Neuron device communicator for vLLM v1 ( #14085 )
2025-03-10 18:37:04 -07:00
Cody Yu
485afdd3cb
[MISC][V1] Handle exception of current_platform.get_device_name() in arg_utils ( #14379 )
...
Signed-off-by: Cody Yu <hao.yu.cody@gmail.com>
2025-03-10 20:42:11 -04:00
Jinzhen Lin
90e88ab756
[Kernel] moe wna16 cuda kernel ( #13321 )
...
Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
2025-03-10 20:12:40 -04:00
Russell Bryant
04421dff8a
[V1] Prevent xgrammar from breaking TPU support ( #14575 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2025-03-10 23:06:19 +00:00
Russell Bryant
432d6dad15
Fix typo in benchmark_serving_structured_output.py ( #14566 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2025-03-10 14:58:58 -07:00
Varun Sundar Rabindranath
5ff0d32580
[V1] LoRA - Add triton kernels for V1 ( #13096 )
...
Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
2025-03-10 17:27:53 -04:00
Woosuk Kwon
0967110e42
[Minor] Update the tqdm bar for parallel sampling ( #14571 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-03-10 14:23:48 -07:00
Simon Mo
fb0acb6c72
[Perf] Improve MLA on V1 ( #14540 )
...
Signed-off-by: simon-mo <simon.mo@hey.com>
2025-03-10 12:06:58 -07:00
Chauncey
92b0ce2ac7
[Bugfix][v1] fixed llava-hf/llava-1.5-7b-hf is broken on V1 ( #14554 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-03-10 18:24:51 +00:00
Harry Mellor
bc2d4473bf
[Docs] Make installation URLs nicer ( #14556 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-03-10 10:43:08 -07:00
Harry Mellor
3b352a2f92
Correct capitalisation: VLLM
-> vLLM
( #14562 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-03-10 16:36:21 +00:00
Roger Wang
dea985aef0
[V1][Bugfix] Fix handing of second_per_grid_ts
for Qwen2-VL & Qwen2.5-VL ( #14548 )
...
Signed-off-by: Roger Wang <ywang@roblox.com>
2025-03-10 16:03:11 +00:00
Harry Mellor
39be30351f
Correct capitalisation: Github
-> GitHub
( #14561 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-03-10 15:53:33 +00:00
Cyrus Leung
001a9c7b0d
[Doc] Update PaliGemma note to a warning ( #14565 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-03-10 15:02:28 +00:00
Szymon Ożóg
89cdaa83e7
[Kernel] Add more dtype support for GGUF kernels ( #14043 )
...
Signed-off-by: SzymonOzog <szymon.ozog@aleph-alpha.com>
Signed-off-by: SzymonOzog <szymon.ozog@gmail.com>
2025-03-10 07:30:04 -07:00
Chauncey
b0746fae3d
[Frontend] support image embeds ( #13955 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-03-10 12:36:03 +00:00
Harry Mellor
60a98b2de5
[Docs] Mention model_impl
arg when explaining Transformers fallback ( #14552 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-03-10 12:13:10 +00:00
Chauncey
460f553a6d
[Misc] Add log information for handle_process_request. ( #14130 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-03-10 08:40:50 +00:00
Jennifer Zhao
1253b15774
[Feature] Consolidate performance benchmark datasets ( #14036 )
...
Signed-off-by: Jennifer Zhao <7443418+JenZhao@users.noreply.github.com>
Signed-off-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Jennifer Zhao <7443418+JenZhao@users.noreply.github.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
2025-03-10 07:23:11 +00:00
Martin Hoyer
dc74613fa2
[Bugfix] Wrong requirements path - rocm ( #14527 )
...
Signed-off-by: Martin Hoyer <mhoyer@redhat.com>
2025-03-10 02:49:46 +00:00
Yanyi Liu
a21076ed3a
[Misc] Ensure out-of-tree quantization method recognize by cli args ( #14328 )
...
Signed-off-by: liuyanyi <wolfsonliu@163.com>
2025-03-09 12:13:31 +00:00
Chengji Yao
212007b168
[Hardware][TPU] Fix the recompiling issue in logits processor after warmup ( #14510 )
...
Signed-off-by: Chengji Yao <chengjiyao@google.com>
2025-03-09 05:44:39 -04:00
Isotr0py
fb16eea48b
[Bugfix] Revert QKVCrossParallelLinear usage in Mllama to keep BNB quantization work ( #14498 )
...
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-03-09 04:47:45 +00:00
Yuchen Yan
73ae0b44e9
[Bugfix] Fix tqdm progress bar when SamplingParams.n > 1 ( #12428 )
...
Signed-off-by: Yuchen Yan <740987012@qq.com>
2025-03-08 20:14:53 -08:00
Jiayi Yao
6d7f037748
[Feat] Support chunked prefill for LMCache connector ( #14505 )
...
Signed-off-by: YaoJiayi <120040070@link.cuhk.edu.cn>
2025-03-08 19:30:06 -08:00
iefgnoix
10f7552789
[V1][TPU] Remove unnecessary padding for running on TPU. ( #14467 )
2025-03-08 21:56:04 -05:00
Lucas Wilkinson
b0d541947a
[Attention] Default to FlashMLA backend for MLA ( #14451 )
...
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
2025-03-08 18:18:39 -08:00
Robert Shaw
5f0b53c6ea
Revert "[V1][Core] Fix memory issue with logits & sampling" ( #14504 )
...
Signed-off-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
2025-03-08 17:43:37 -08:00
22quinn
eb8b5eb183
[V1] Support bad_words in sampler ( #13376 )
...
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
2025-03-08 14:50:26 -08:00
Cyrus Leung
9513290032
[Misc] Upgrade to Python 3.9 typing for additional directories ( #14492 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-03-08 17:35:50 +00:00
Russell Bryant
0d5e73d30e
Update CODEOWNERS for structured output ( #14496 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2025-03-08 17:19:51 +00:00
Isotr0py
609ef61fea
[Bugfix] Fix profiling OOM and decouple encoder multimodal profiling ( #14361 )
...
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-03-08 16:52:34 +00:00
Lucas Wilkinson
db84f5eb3b
[Bugfix] DeepSeek Accuracy ( #14476 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-03-08 16:47:03 +00:00
Harry Mellor
206e2577fa
Move requirements into their own directory ( #12547 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-03-08 16:44:35 +00:00
Cyrus Leung
e02883c400
[Misc] Don't run ruff at all on 3rd party libs ( #14493 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-03-08 07:16:40 -08:00
Russell Bryant
9085aabd62
[benchmarks] Add option to use unique jsonschema for each request ( #14457 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2025-03-08 06:36:39 -08:00
Roger Wang
8d5aa466fb
[V1][Core] Fix memory issue with logits & sampling ( #13776 )
...
Signed-off-by: Roger Wang <ywang@roblox.com>
2025-03-08 06:11:04 -08:00
Aaron Pham
0b7f06b447
[Misc] add use_tqdm_on_load
to reduce logs ( #14407 )
...
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
2025-03-08 05:57:46 -08:00
Isotr0py
03fe18ae0f
[VLM] Add TP support for Phi-4-MM ( #14453 )
...
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-03-08 05:57:14 -08:00
Alexander Matveev
cb8bdfade2
[V1] TPU - Add tensor parallel support via Ray ( #13618 )
...
Signed-off-by: Alexander Matveev <amatveev@redhat.com>
2025-03-08 08:19:38 -05:00
Cyrus Leung
33f227e16b
[CI/Build] Use a fixed seed to avoid flaky tests ( #14480 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-03-08 11:30:09 +00:00
Harry Mellor
cfd0ae8234
Add RLHF document ( #14482 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-03-08 09:51:39 +00:00