Beim
41bf5612f5
[Misc] fix typo: add missing space in lora adapter error message ( #12564 )
...
Signed-off-by: Beim <beim2015@outlook.com>
2025-01-30 15:39:22 +00:00
Harry Mellor
a2769032ca
Set ?device={device}
when changing tab in installation guides ( #12560 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-01-30 00:05:42 -08:00
Mark McLoughlin
f17f1d4608
[V1][Metrics] Add GPU cache usage % gauge ( #12561 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
2025-01-29 18:31:01 -08:00
Divakar Verma
1c1bb0bbf2
[Misc][MoE] add Deepseek-V3 moe tuning support ( #12558 )
...
Signed-off-by: Divakar Verma <divakar.verma@amd.com>
2025-01-30 00:47:30 +00:00
Woosuk Kwon
e0cc5f259a
[V1][BugFix] Free encoder cache for aborted requests ( #12545 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-01-29 13:47:33 -08:00
Tyler Michael Smith
73aa6cfdf7
Revert "[Build/CI] Fix libcuda.so linkage" ( #12552 )
2025-01-29 21:12:24 +00:00
Jinzhen Lin
27b78c73ca
[Kernel] add triton fused moe kernel for gptq/awq ( #12185 )
2025-01-29 09:07:09 -05:00
Pavani Majety
b02fd288b2
[Hardware][NV] Fix Modelopt model loading for k-v-scales for Llama models. ( #11787 )
...
Signed-off-by: Pavani Majety <pmajety@nvidia.com>
Co-authored-by: mgoin <michael@neuralmagic.com>
2025-01-29 01:46:12 -08:00
Yanyi Liu
ff7424f491
[Frontend] Support override generation config in args ( #12409 )
...
Signed-off-by: liuyanyi <wolfsonliu@163.com>
2025-01-29 01:41:01 -08:00
Alphi
d93bf4da85
[Model] Refactoring of MiniCPM-V and add MiniCPM-o-2.6 support for vLLM ( #12069 )
...
Signed-off-by: hzh <hezhihui_thu@163.com>
Signed-off-by: Sungjae Lee <33976427+llsj14@users.noreply.github.com>
Signed-off-by: shaochangxu.scx <shaochangxu.scx@antgroup.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: Isotr0py <2037008807@qq.com>
Signed-off-by: Roger Wang <ywang@roblox.com>
Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Oleg Mosalov <oleg@krai.ai>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
Signed-off-by: Yida Wu <yidawu@alumni.cmu.edu>
Signed-off-by: Chenguang Li <757486878@qq.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com>
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Shanshan Shen <467638484@qq.com>
Signed-off-by: elijah <f1renze.142857@gmail.com>
Signed-off-by: Yikun <yikunkero@gmail.com>
Signed-off-by: mgoin <michael@neuralmagic.com>
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Signed-off-by: Konrad Zawora <kzawora@habana.ai>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
Co-authored-by: Sungjae Lee <33976427+llsj14@users.noreply.github.com>
Co-authored-by: shaochangxu <85155497+shaochangxu@users.noreply.github.com>
Co-authored-by: shaochangxu.scx <shaochangxu.scx@antgroup.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>
Co-authored-by: sixgod <evethwillbeok@outlook.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
Co-authored-by: Rafael Vasquez <rafvasq21@gmail.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Akshat Tripathi <Akshat.tripathi6568@gmail.com>
Co-authored-by: Oleg Mosalov <oleg@krai.ai>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Avshalom Manevich <12231371+avshalomman@users.noreply.github.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com>
Co-authored-by: Yangcheng Li <liyangcheng.lyc@alibaba-inc.com>
Co-authored-by: Siyuan Li <94890248+liaoyanqing666@users.noreply.github.com>
Co-authored-by: Concurrensee <yida.wu@amd.com>
Co-authored-by: Chenguang Li <757486878@qq.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: Alex Brooks <alex.brooks@ibm.com>
Co-authored-by: Chen Zhang <zhangch99@outlook.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Shanshan Shen <467638484@qq.com>
Co-authored-by: elijah <30852919+e1ijah1@users.noreply.github.com>
Co-authored-by: Yikun Jiang <yikunkero@gmail.com>
Co-authored-by: Steve Luo <36296769+SunflowerAries@users.noreply.github.com>
Co-authored-by: mgoin <michael@neuralmagic.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: Konrad Zawora <kzawora@habana.ai>
Co-authored-by: TJian <tunjian1996@gmail.com>
Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com>
Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com>
Co-authored-by: maang-h <55082429+maang-h@users.noreply.github.com>
Co-authored-by: Elfie Guo <164945471+elfiegg@users.noreply.github.com>
Co-authored-by: Rui Qiao <161574667+ruisearch42@users.noreply.github.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
2025-01-29 09:24:59 +00:00
Travis Johnson
036ca94c25
[Bugfix] handle alignment of arguments in convert_sparse_cross_attention_mask_to_dense ( #12347 )
...
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Signed-off-by: Wallas Santos <wallashss@ibm.com>
Co-authored-by: Wallas Santos <wallashss@ibm.com>
2025-01-29 08:54:35 +00:00
Maximilien de Bayser
ef001d98ef
Fix the pydantic logging validator ( #12420 )
...
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
2025-01-29 07:53:13 +00:00
Robert Shaw
5f671cb4c3
[V1] Improve Error Message for Unsupported Config ( #12535 )
...
Co-authored-by: Michael Goin <michael@neuralmagic.com>
2025-01-29 04:56:56 +00:00
Michael Goin
bd02164cf9
Bugfix for whisper quantization due to fake k_proj bias ( #12524 )
...
Signed-off-by: mgoin <michael@neuralmagic.com>
2025-01-29 04:49:03 +00:00
Mark McLoughlin
46fb056749
[V1][Metrics] Add TTFT and TPOT histograms ( #12530 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
2025-01-29 04:11:16 +00:00
Harry Mellor
dd6a3a02cb
[Doc] Convert docs to use colon fences ( #12471 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-01-29 11:38:29 +08:00
Ce Gao
a7e3eba66f
[Frontend] Support reasoning content for deepseek r1 ( #12473 )
...
Signed-off-by: Ce Gao <cegao@tensorchord.ai>
Co-authored-by: Rafael Vasquez <rafvasq21@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Michael Goin <mgoin@redhat.com>
2025-01-29 11:38:08 +08:00
Michael Goin
fbb5bd4cef
[TPU] Add example for profiling TPU inference ( #12531 )
...
Signed-off-by: mgoin <mgoin@redhat.com>
2025-01-29 03:16:47 +00:00
fenghuizhang
80fcc3ed1c
[Kernel] Pipe attn_logits_soft_cap through paged attention TPU kernels ( #12482 )
...
Signed-off-by: Fenghui Zhang <fhzhang@google.com>
2025-01-28 22:36:44 +00:00
Mark McLoughlin
c386c43ca3
[V1][Metrics] Add per-request prompt/generation_tokens histograms ( #12516 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
2025-01-28 22:07:22 +00:00
Harry Mellor
f26d790718
Do not run suggestion
pre-commit
hook multiple times ( #12521 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-01-28 20:05:27 +00:00
Michael Goin
0f657bdc52
Replace missed warning_once for rerank API ( #12472 )
...
Signed-off-by: mgoin <michael@neuralmagic.com>
2025-01-28 19:06:32 +00:00
Mark McLoughlin
3fd1fb63ef
[V1][Metrics] Hook up IterationStats for Prometheus metrics ( #12478 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
2025-01-28 16:38:38 +00:00
Jun Duan
925d2f1908
[Doc] Fix typo for x86 CPU installation ( #12514 )
...
Signed-off-by: Jun Duan <jun.duan.phd@outlook.com>
2025-01-28 16:37:10 +00:00
Cyrus Leung
8f58a51358
[VLM] Merged multi-modal processor and V1 support for Qwen-VL ( #12504 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-01-28 16:25:05 +00:00
Sebastian Schoennenbeck
2079e43bee
[Core] Make raw_request optional in ServingCompletion ( #12503 )
...
Signed-off-by: Sebastian Schönnenbeck <sebastian.schoennenbeck@comma-soft.com>
2025-01-28 10:56:45 +00:00
Robert Shaw
e29d4358ef
[V1] Include Engine Version in Logs ( #12496 )
...
Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
2025-01-28 08:27:41 +00:00
Roger Wang
8cbc424975
Update README.md with V1 alpha release ( #12495 )
2025-01-28 08:22:41 +00:00
Mengqing Cao
dd66fd2b01
[CI] fix pre-commit error ( #12494 )
...
Signed-off-by: Mengqing Cao <cmq0113@163.com>
2025-01-28 06:11:05 +00:00
Gabriel Marinho
0f465ab533
[FEATURE] Enables offline /score for embedding models ( #12021 )
...
Signed-off-by: Gabriel Marinho <gmarinho@ibm.com>
2025-01-28 11:30:13 +08:00
Hossein Sarshar
23a7cbc88b
[CI/Build] Fixed the xla nightly issue report in #12451 ( #12453 )
2025-01-28 11:18:07 +08:00
Michael Goin
426a5c3625
Fix bad path in prometheus example ( #12481 )
...
Signed-off-by: mgoin <michael@neuralmagic.com>
2025-01-27 18:56:31 -07:00
Liangfu Chen
ddee88d0ff
[Neuron][Kernel] NKI-based flash-attention kernel with paged KV cache ( #11277 )
...
Signed-off-by: Liangfu Chen <liangfc@amazon.com>
Co-authored-by: Jiangfei Duan <jfduan@outlook.com>
2025-01-27 17:31:16 -08:00
Harry Mellor
823ab79633
Update pre-commit
hooks ( #12475 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-01-27 17:23:08 -07:00
Nicolò Lucchesi
6116ca8cd7
[Feature] [Spec decode]: Enable MLPSpeculator/Medusa and prompt_logprobs
with ChunkedPrefill ( #10132 )
...
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: wallashss <wallashss@ibm.com>
Co-authored-by: wallashss <wallashss@ibm.com>
2025-01-27 13:38:35 -08:00
Bowen Wang
2bc3fbba0c
[FlashInfer] Upgrade to 0.2.0 ( #11194 )
...
Signed-off-by: Bowen Wang <abmfy@icloud.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
2025-01-27 18:19:24 +00:00
Woosuk Kwon
3f1fc7425a
[V1][CI/Test] Do basic test for top-p & top-k sampling ( #12469 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-01-27 09:40:04 -08:00
Mark McLoughlin
01ba927040
[V1][Metrics] Add initial Prometheus logger ( #12416 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
2025-01-27 12:26:28 -05:00
Lucas Wilkinson
103bd17ac5
[Build] Only build 9.0a for scaled_mm and sparse kernels ( #12339 )
...
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
2025-01-27 10:40:00 -05:00
Isotr0py
ce69f7f754
[Bugfix] Fix gpt2 GGUF inference ( #12467 )
...
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-01-27 18:31:49 +08:00
Woosuk Kwon
624a1e4711
[V1][Minor] Minor optimizations for update_from_output ( #12454 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-01-27 01:09:27 -08:00
Isotr0py
372bf0890b
[Bugfix] Fix missing seq_start_loc in xformers prefill metadata ( #12464 )
...
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-01-27 07:25:30 +00:00
Cyrus Leung
5204ff5c3f
[Bugfix] Fix Granite 3.0 MoE model loading ( #12446 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-01-26 21:26:44 -08:00
Pooya Davoodi
0cc6b383d7
[Frontend] Support scores endpoint in run_batch ( #12430 )
...
Signed-off-by: Pooya Davoodi <pooya.davoodi@parasail.io>
2025-01-27 04:30:17 +00:00
Woosuk Kwon
28e0750847
[V1] Avoid list creation in input preparation ( #12457 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-01-26 19:57:56 -08:00
Yuan Tang
582cf78798
[DOC] Add link to vLLM blog ( #12460 )
...
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2025-01-27 03:46:19 +00:00
Kyle Mistele
0034b09ceb
[Frontend] Rerank API (Jina- and Cohere-compatible API) ( #12376 )
...
Signed-off-by: Kyle Mistele <kyle@mistele.com>
2025-01-26 19:58:45 -07:00
Tyler Michael Smith
72bac73067
[Build/CI] Fix libcuda.so linkage ( #12424 )
...
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
2025-01-26 21:18:19 +00:00
Lucas Wilkinson
68f11149d8
[Bugfix][Kernel] Fix perf regression caused by PR #12405 ( #12434 )
...
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
2025-01-26 11:09:34 -08:00
Tyler Michael Smith
72f4880425
[Bugfix/CI] Fix broken kernels/test_mha.py ( #12450 )
2025-01-26 10:39:03 -08:00