97 Commits

Author SHA1 Message Date
Isotr0py
03fe18ae0f
[VLM] Add TP support for Phi-4-MM (#14453)
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-03-08 05:57:14 -08:00
Jee Jee Li
952a074980
[Misc] Add Phi4-MM example (#14343)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-03-07 17:28:52 +00:00
Tyler Michael Smith
cc2f9b32c8
[Distributed] Add enable_expert_parallel arg (#14305)
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
2025-03-06 18:54:45 +00:00
youkaichao
151b08e0fe
[RLHF] use worker_extension_cls for compatibility with V0 and V1 (#14185)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-03-07 00:32:46 +08:00
Vincent
a4f1ee35d6
Deprecate best_of Sampling Parameter in anticipation for vLLM V1 (#13997)
Signed-off-by: vincent-4 <vincentzhongy+githubvincent4@gmail.com>
Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-03-05 20:22:43 +00:00
Isotr0py
f71b00a19e
[Bugfix] Fix broken vision language example (#14292)
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-03-05 15:57:10 +00:00
Tyler Michael Smith
72c62eae5f
[V1] EP/TP MoE + DP Attention (#13931) 2025-03-04 21:27:26 -08:00
lkchen
b3cf368d79
[V1][Molmo] Fix get_multimodal_embeddings() in molmo.py (#14161) 2025-03-04 15:43:59 +00:00
Harry Mellor
cf069aa8aa
Update deprecated Python 3.8 typing (#13971) 2025-03-02 17:34:51 -08:00
Isotr0py
fdcc405346
[Doc] Consolidate whisper and florence2 examples (#14050) 2025-02-28 22:49:15 -08:00
Isotr0py
edf309ebbe
[VLM] Support multimodal inputs for Florence-2 models (#13320) 2025-02-27 02:06:41 -08:00
Chauncey
d08b285adf
[Misc] fixed qwen_vl_utils parameter error (#13906) 2025-02-26 08:31:53 -08:00
Jiayi Yao
2f42a4888c
[Feature] Support KV cache offloading and disagg prefill with LMCache connector. (#12953) 2025-02-25 00:38:42 -08:00
youkaichao
2382ad29d1
[ci] fix linter (#13701)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-02-22 20:28:59 +08:00
youkaichao
3e472d882a
[core] set up data parallel communication (#13591)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-02-22 19:28:59 +08:00
Harry Mellor
992e5c3d34
Merge similar examples in offline_inference into single basic example (#12737) 2025-02-20 04:53:51 -08:00
Cyrus Leung
377d10bd14
[VLM][Bugfix] Pass processor kwargs properly on init (#13516)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-02-19 13:13:50 +00:00
Roger Wang
b7d309860e
[V1] Update doc and examples for H2O-VL (#13349)
Signed-off-by: Roger Wang <ywang@roblox.com>
2025-02-16 10:35:54 +00:00
XiaobingZhang
84683fa271
[Bugfix] Offline example of disaggregated prefill (#13214) 2025-02-13 20:20:47 -08:00
Cyrus Leung
1bc3b5e71b
[VLM] Separate text-only and vision variants of the same model architecture (#13157) 2025-02-13 06:19:15 -08:00
Michael Goin
d88c8666a1
[Bugfix][Example] Fix GCed profiling server for TPU (#12792)
Signed-off-by: mgoin <michael@neuralmagic.com>
2025-02-13 11:52:11 +08:00
Christian Pinto
974dfd4971
[Model] IBM/NASA Prithvi Geospatial model (#12830) 2025-02-11 20:34:30 -08:00
ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟
6c4dbe23eb
[BugFix] Pop instead of del CUDA_VISIBLE_DEVICES (#12962)
Signed-off-by: Hollow Man <hollowman@opensuse.org>
2025-02-12 00:21:50 +08:00
Farzad Abdolhosseini
08b2d845d6
[Model] Ultravox Model: Support v0.5 Release (#12912)
Signed-off-by: Farzad Abdolhosseini <farzad@fixie.ai>
2025-02-10 22:02:48 +00:00
youkaichao
aa0ca5ebb7
[core][rlhf] add colocate example for RLHF (#12984)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-02-10 10:28:59 +08:00
Jee Jee Li
86222a3dab
[VLM] Merged multi-modal processor for GLM4V (#12449)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-02-08 20:32:16 +00:00
Cyrus Leung
8a69e0e20e
[CI/Build] Auto-fix Markdown files (#12941) 2025-02-08 04:25:15 -08:00
Shaoting
e31498bdcb
[Misc] Add offline test for disaggregated prefill (#12418) 2025-02-08 08:38:20 +00:00
Roger Wang
bf3b79efb8
[VLM] Qwen2.5-VL 2025-02-05 13:31:38 -08:00
youkaichao
bc1bdecebf
[core][distributed] exact ray placement control (#12732)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-02-06 02:03:19 +08:00
Thomas Parnell
bb392af434
[Doc] Replace ibm-fms with ibm-ai-platform (#12709)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
2025-02-04 07:05:04 +00:00
Russell Bryant
e489ad7a21
[Misc] Add SPDX-License-Identifier headers to python source files (#12628)
- **Add SPDX license headers to python source files**
- **Check for SPDX headers using pre-commit**

commit 9d7ef44c3cfb72ca4c32e1c677d99259d10d4745
Author: Russell Bryant <rbryant@redhat.com>
Date:   Fri Jan 31 14:18:24 2025 -0500

    Add SPDX license headers to python source files
    
This commit adds SPDX license headers to python source files as
recommended to
the project by the Linux Foundation. These headers provide a concise way
that is
both human and machine readable for communicating license information
for each
source file. It helps avoid any ambiguity about the license of the code
and can
    also be easily used by tools to help manage license compliance.
    
The Linux Foundation runs license scans against the codebase to help
ensure
    we are in compliance with the licenses of the code we use, including
dependencies. Having these headers in place helps that tool do its job.
    
    More information can be found on the SPDX site:
    
    - https://spdx.dev/learn/handling-license-info/
    
    Signed-off-by: Russell Bryant <rbryant@redhat.com>

commit 5a1cf1cb3b80759131c73f6a9dddebccac039dea
Author: Russell Bryant <rbryant@redhat.com>
Date:   Fri Jan 31 14:36:32 2025 -0500

    Check for SPDX headers using pre-commit
    
    Signed-off-by: Russell Bryant <rbryant@redhat.com>

---------

Signed-off-by: Russell Bryant <rbryant@redhat.com>
2025-02-02 11:58:18 -08:00
Alphi
d93bf4da85
[Model] Refactoring of MiniCPM-V and add MiniCPM-o-2.6 support for vLLM (#12069)
Signed-off-by: hzh <hezhihui_thu@163.com>
Signed-off-by: Sungjae Lee <33976427+llsj14@users.noreply.github.com>
Signed-off-by: shaochangxu.scx <shaochangxu.scx@antgroup.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: Isotr0py <2037008807@qq.com>
Signed-off-by: Roger Wang <ywang@roblox.com>
Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Oleg Mosalov <oleg@krai.ai>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
Signed-off-by: Yida Wu <yidawu@alumni.cmu.edu>
Signed-off-by: Chenguang Li <757486878@qq.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com>
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Shanshan Shen <467638484@qq.com>
Signed-off-by: elijah <f1renze.142857@gmail.com>
Signed-off-by: Yikun <yikunkero@gmail.com>
Signed-off-by: mgoin <michael@neuralmagic.com>
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Signed-off-by: Konrad Zawora <kzawora@habana.ai>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
Co-authored-by: Sungjae Lee <33976427+llsj14@users.noreply.github.com>
Co-authored-by: shaochangxu <85155497+shaochangxu@users.noreply.github.com>
Co-authored-by: shaochangxu.scx <shaochangxu.scx@antgroup.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>
Co-authored-by: sixgod <evethwillbeok@outlook.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
Co-authored-by: Rafael Vasquez <rafvasq21@gmail.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Akshat Tripathi <Akshat.tripathi6568@gmail.com>
Co-authored-by: Oleg Mosalov <oleg@krai.ai>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Avshalom Manevich <12231371+avshalomman@users.noreply.github.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com>
Co-authored-by: Yangcheng Li <liyangcheng.lyc@alibaba-inc.com>
Co-authored-by: Siyuan Li <94890248+liaoyanqing666@users.noreply.github.com>
Co-authored-by: Concurrensee <yida.wu@amd.com>
Co-authored-by: Chenguang Li <757486878@qq.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: Alex Brooks <alex.brooks@ibm.com>
Co-authored-by: Chen Zhang <zhangch99@outlook.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Shanshan Shen <467638484@qq.com>
Co-authored-by: elijah <30852919+e1ijah1@users.noreply.github.com>
Co-authored-by: Yikun Jiang <yikunkero@gmail.com>
Co-authored-by: Steve Luo <36296769+SunflowerAries@users.noreply.github.com>
Co-authored-by: mgoin <michael@neuralmagic.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: Konrad Zawora <kzawora@habana.ai>
Co-authored-by: TJian <tunjian1996@gmail.com>
Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com>
Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com>
Co-authored-by: maang-h <55082429+maang-h@users.noreply.github.com>
Co-authored-by: Elfie Guo <164945471+elfiegg@users.noreply.github.com>
Co-authored-by: Rui Qiao <161574667+ruisearch42@users.noreply.github.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
2025-01-29 09:24:59 +00:00
Michael Goin
fbb5bd4cef
[TPU] Add example for profiling TPU inference (#12531)
Signed-off-by: mgoin <mgoin@redhat.com>
2025-01-29 03:16:47 +00:00
Pooya Davoodi
0cc6b383d7
[Frontend] Support scores endpoint in run_batch (#12430)
Signed-off-by: Pooya Davoodi <pooya.davoodi@parasail.io>
2025-01-27 04:30:17 +00:00
zhou fan
528dbcac7d
[Model][Bugfix]: correct Aria model output (#12309)
Signed-off-by: xffxff <1247714429@qq.com>
2025-01-22 11:39:19 +00:00
youkaichao
c222f47992
[core][bugfix] configure env var during import vllm (#12209)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-01-20 19:35:59 +08:00
Cyrus Leung
b37d82791e
[Model] Upgrade Aria to transformers 4.48 (#12203)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-01-20 17:58:48 +08:00
Isotr0py
02798ecabe
[Model] Port deepseek-vl2 processor, remove dependency (#12169)
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-01-18 13:59:39 +08:00
Chen Zhang
d06e824006
[Bugfix] Set enforce_eager automatically for mllama (#12127)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
2025-01-16 15:30:08 -05:00
Isotr0py
62b06ba23d
[Model] Add support for deepseek-vl2-tiny model (#12068)
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-01-16 17:14:48 +00:00
youkaichao
92e793d91a
[core] LLM.collective_rpc interface and RLHF example (#12084)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-01-16 20:19:52 +08:00
youkaichao
bf53e0c70b
Support torchrun and SPMD-style offline inference (#12071)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-01-16 19:58:53 +08:00
Isotr0py
d14e98d924
[Model] Support GGUF models newly added in transformers 4.46.0 (#9685)
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-01-13 00:13:44 +00:00
Isotr0py
f967e51f38
[Model] Initialize support for Deepseek-VL2 models (#11578)
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-01-12 00:17:24 -08:00
Harry Mellor
482cdc494e
[Doc] Rename offline inference examples (#11927)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-01-10 23:50:29 +08:00
Harry Mellor
aba8d6ee00
[Doc] Move examples into categories (#11840)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-01-08 13:09:53 +00:00