marko
27df5199d9
Support SHA256 as hash function in prefix caching ( #15297 )
...
Signed-off-by: Marko Rosenmueller <5467316+dr75@users.noreply.github.com>
2025-03-26 11:11:28 -07:00
Alex Brooks
1711b929b6
[Model] Add Reasoning Parser for Granite Models ( #14202 )
...
Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com>
Co-authored-by: Joe Runde <joe@joerun.de>
2025-03-26 14:28:07 +00:00
Harry Mellor
cf5c8f1686
Separate base model from TransformersModel
( #15467 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-03-26 18:13:38 +08:00
Cyrus Leung
997c8811d6
[Model] Support multi-image for Molmo ( #15438 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-03-26 11:26:33 +08:00
Cyrus Leung
3f04a7fbf2
[Doc] Update V1 user guide for multi-modality ( #15460 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-03-25 11:01:58 +00:00
Harry Mellor
97cfa65df7
Add pipeline parallel support to TransformersModel
( #12832 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
2025-03-25 10:41:45 +08:00
Cyrus Leung
6dd55af6c9
[Doc] Update docs on handling OOM ( #15357 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
2025-03-24 14:29:34 -07:00
Yuan Tang
3eb08ed9b1
[DOC] Add Kubernetes deployment guide with CPUs ( #14865 )
2025-03-24 10:48:43 -07:00
Manish Sethi
761702fd19
[Core] Integrate fastsafetensors
loader for loading model weights ( #10647 )
...
Signed-off-by: Manish Sethi <Manish.sethi1@ibm.com>
2025-03-24 08:08:02 -07:00
Jee Jee Li
3892e58ad7
[Misc] Upgrade BNB version ( #15183 )
2025-03-24 05:51:42 +00:00
Roger Wang
9c5c81b0da
[Misc][Doc] Add note regarding loading generation_config
by default ( #15281 )
...
Signed-off-by: Roger Wang <ywang@roblox.com>
2025-03-23 14:00:55 -07:00
Robin
d6cd59f122
[Frontend] Support tool calling and reasoning parser ( #14511 )
...
Signed-off-by: WangErXiao <863579016@qq.com>
2025-03-23 14:00:07 -07:00
shangmingc
50c9636d87
[V1][Usage] Refactor speculative decoding configuration and tests ( #14434 )
...
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>
2025-03-22 19:28:10 -10:00
Russell Bryant
b877031d80
Remove openvino support in favor of external plugin ( #15339 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2025-03-22 14:06:39 -07:00
Naitong Yu
2f4bd358f1
[Model] Support Tele-FLM Model ( #15023 )
...
Signed-off-by: Naitong Yu <ntyu@baai.ac.cn>
Signed-off-by: jiangxin <horizon94@outlook.com>
Co-authored-by: Jason Fang <jasonfang3900@gmail.com>
Co-authored-by: jiangxin <horizon94@outlook.com>
2025-03-22 02:04:44 -07:00
Cyrus Leung
baec0d4de9
Revert "[Feature] specify model in config.yaml ( #14855 )" ( #15293 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-03-21 08:30:23 -07:00
Russell Bryant
61e8c18350
[Misc] Add cProfile helpers ( #15074 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2025-03-21 04:56:09 -07:00
Wei Zeng
0fa3970deb
[Feature] specify model in config.yaml ( #14855 )
...
Signed-off-by: weizeng <weizeng@roblox.com>
2025-03-21 00:26:03 -07:00
Edwin Hernandez
7297941b38
[Doc] Update LWS docs ( #15163 )
...
Signed-off-by: Edwinhr716 <Edandres249@gmail.com>
2025-03-20 21:18:47 -07:00
Harry Mellor
6edbfa924d
Mention extra_body
as a way top pass vLLM only parameters using the OpenAI client ( #15240 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-03-20 19:18:36 -07:00
Jee Jee Li
10f55fe6c5
[Misc] Clean up the BitsAndBytes arguments ( #15140 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-03-20 19:17:12 -07:00
Rui Qiao
4cb1c05c9e
[Doc] Clarify run vllm only on one node in distributed inference ( #15148 )
...
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
2025-03-20 09:55:59 +08:00
Marc-Alexandre Côté
073d1ed354
[Doc] Update tip info on using latest transformers when creating a custom Dockerfile ( #15070 )
2025-03-19 13:33:40 +00:00
Cyrus Leung
61f412187d
[Bugfix] Re-enable Gemma3 for V1 ( #14980 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-03-18 23:58:22 -07:00
Jennifer Zhao
228b768db6
[Doc] Minor v1_user_guide update ( #15064 )
...
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
2025-03-18 16:10:45 -07:00
yury-tokpanov
452e8fd968
[MODEL] Add support for Zamba2 models ( #13185 )
...
Signed-off-by: Yury Tokpanov <yury@zyphra.com>
Signed-off-by: Quentin Anthony <qganthony@yahoo.com>
Co-authored-by: Quentin Anthony <qganthony@yahoo.com>
Co-authored-by: Tyler Michael Smith <tysmith@redhat.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-03-18 08:56:21 -07:00
Patrick von Platen
f863ffc965
[Mistral-Small 3.1] Update docs and tests ( #14977 )
...
Signed-off-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
2025-03-18 03:29:42 -07:00
Shanshan Shen
d1695758b2
[Doc][V1] Fix V1 APC doc ( #14920 )
2025-03-18 08:15:46 +00:00
Roger Wang
37e3806132
[Bugfix] Make Gemma3 MM V0 only for now ( #14971 )
...
Signed-off-by: Roger Wang <ywang@roblox.com>
2025-03-17 10:04:21 -07:00
Vadim Gimpelson
90df7f23aa
[Doc] Add guidance for using ccache
with pip install -e .
in doc ( #14901 )
2025-03-16 23:10:04 +00:00
Bryan Lu
9ed6ee92d6
[Bugfix] EAGLE output norm bug ( #14464 )
...
Signed-off-by: Bryan Lu <yuzhelu@amazon.com>
2025-03-15 06:50:33 +00:00
Jennifer Zhao
aaacf17324
[Doc] V1 user guide ( #13991 )
...
Signed-off-by: Jennifer Zhao <7443418+JenZhao@users.noreply.github.com>
Signed-off-by: Roger Wang <ywang@roblox.com>
Signed-off-by: Jennifer Zhao <ai.jenniferzhao@gmail.com>
Co-authored-by: Jennifer Zhao <7443418+JenZhao@users.noreply.github.com>
Co-authored-by: Jennifer Zhao <JenZhao@users.noreply.github.com>
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-03-14 22:17:59 -07:00
Li, Jiang
a2ae496589
[CPU] Support FP8 KV cache ( #14741 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2025-03-14 22:07:36 -07:00
Simon Mo
877e352262
[Docs] Add new East Coast vLLM Meetup slides to README and meetups.md ( #14852 )
2025-03-14 22:06:38 -07:00
Yuan Tang
54a8804455
[Doc] More neutral K8s deployment guide ( #14084 )
...
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2025-03-14 16:12:36 -07:00
Mark McLoughlin
9d2b4a70f4
[V1][Metrics] Updated list of deprecated metrics in v0.8 ( #14695 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
2025-03-15 00:45:25 +08:00
Cyrus Leung
601bd3268e
[Misc] Clean up type annotation for SupportsMultiModal
( #14794 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-03-14 00:59:56 -07:00
Thien Tran
95d680b862
[Bugfix][IPEX] Add VLLM_CPU_MOE_PREPACK
to allow disabling MoE prepack when CPU does not support it ( #14681 )
...
Signed-off-by: Thien Tran <gau.nernst@yahoo.com.sg>
2025-03-13 20:43:18 -07:00
Chen Zhang
60c872d4b6
[Doc] Fix small typo in Transformers fallback ( #14791 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
2025-03-13 20:33:12 -07:00
yasu52
3fb17d26c8
[Doc] Fix typo in documentation ( #14783 )
...
Signed-off-by: yasu52 <tsuguro4649@gmail.com>
2025-03-13 20:33:09 -07:00
Isotr0py
b1cc4dfef5
[VLM] Support loading InternVideo2.5 models as original InternVLChatModel ( #14738 )
...
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-03-13 03:10:02 -07:00
Cyrus Leung
382403921f
[VLM] Support pan-and-scan for Gemma3 multi-modal processor ( #14672 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: Roger Wang <ywang@roblox.com>
2025-03-13 02:23:12 -07:00
Woosuk Kwon
c0c25e25fa
[Model] Add support for Gemma 3 ( #14660 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Signed-off-by: Roger Wang <ywang@roblox.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: Roger Wang <ywang@roblox.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-03-12 08:36:33 -07:00
Kunshang Ji
c6e14a61ab
[Hardware][Intel GPU] upgrade IPEX dependency to 2.6.10. ( #14564 )
...
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
2025-03-11 17:11:47 +00:00
Dilip Gowda Bhagavan
07964e2f30
docs: Add documentation for s390x cpu implementation ( #14198 )
...
Signed-off-by: Dilip Gowda Bhagavan <dilip.bhagavan@ibm.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-03-11 17:02:17 +00:00
Cyrus Leung
af295e9b01
[Bugfix] Update --hf-overrides
for Alibaba-NLP/gte-Qwen2
( #14609 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-03-11 07:59:43 -07:00
Harry Mellor
bc2d4473bf
[Docs] Make installation URLs nicer ( #14556 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-03-10 10:43:08 -07:00
Harry Mellor
3b352a2f92
Correct capitalisation: VLLM
-> vLLM
( #14562 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-03-10 16:36:21 +00:00
Cyrus Leung
001a9c7b0d
[Doc] Update PaliGemma note to a warning ( #14565 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-03-10 15:02:28 +00:00
Chauncey
b0746fae3d
[Frontend] support image embeds ( #13955 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-03-10 12:36:03 +00:00