771 Commits

Author SHA1 Message Date
Jee Jee Li
10f55fe6c5
[Misc] Clean up the BitsAndBytes arguments (#15140)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-03-20 19:17:12 -07:00
Rui Qiao
4cb1c05c9e
[Doc] Clarify run vllm only on one node in distributed inference (#15148)
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
2025-03-20 09:55:59 +08:00
Marc-Alexandre Côté
073d1ed354
[Doc] Update tip info on using latest transformers when creating a custom Dockerfile (#15070) 2025-03-19 13:33:40 +00:00
Cyrus Leung
61f412187d
[Bugfix] Re-enable Gemma3 for V1 (#14980)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-03-18 23:58:22 -07:00
Jennifer Zhao
228b768db6
[Doc] Minor v1_user_guide update (#15064)
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
2025-03-18 16:10:45 -07:00
yury-tokpanov
452e8fd968
[MODEL] Add support for Zamba2 models (#13185)
Signed-off-by: Yury Tokpanov <yury@zyphra.com>
Signed-off-by: Quentin Anthony <qganthony@yahoo.com>
Co-authored-by: Quentin Anthony <qganthony@yahoo.com>
Co-authored-by: Tyler Michael Smith <tysmith@redhat.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-03-18 08:56:21 -07:00
Patrick von Platen
f863ffc965
[Mistral-Small 3.1] Update docs and tests (#14977)
Signed-off-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
2025-03-18 03:29:42 -07:00
Shanshan Shen
d1695758b2
[Doc][V1] Fix V1 APC doc (#14920) 2025-03-18 08:15:46 +00:00
Roger Wang
37e3806132
[Bugfix] Make Gemma3 MM V0 only for now (#14971)
Signed-off-by: Roger Wang <ywang@roblox.com>
2025-03-17 10:04:21 -07:00
Vadim Gimpelson
90df7f23aa
[Doc] Add guidance for using ccache with pip install -e . in doc (#14901) 2025-03-16 23:10:04 +00:00
Bryan Lu
9ed6ee92d6
[Bugfix] EAGLE output norm bug (#14464)
Signed-off-by: Bryan Lu <yuzhelu@amazon.com>
2025-03-15 06:50:33 +00:00
Jennifer Zhao
aaacf17324
[Doc] V1 user guide (#13991)
Signed-off-by: Jennifer Zhao <7443418+JenZhao@users.noreply.github.com>
Signed-off-by: Roger Wang <ywang@roblox.com>
Signed-off-by: Jennifer Zhao <ai.jenniferzhao@gmail.com>
Co-authored-by: Jennifer Zhao <7443418+JenZhao@users.noreply.github.com>
Co-authored-by: Jennifer Zhao <JenZhao@users.noreply.github.com>
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-03-14 22:17:59 -07:00
Li, Jiang
a2ae496589
[CPU] Support FP8 KV cache (#14741)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2025-03-14 22:07:36 -07:00
Simon Mo
877e352262
[Docs] Add new East Coast vLLM Meetup slides to README and meetups.md (#14852) 2025-03-14 22:06:38 -07:00
Yuan Tang
54a8804455
[Doc] More neutral K8s deployment guide (#14084)
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2025-03-14 16:12:36 -07:00
Mark McLoughlin
9d2b4a70f4
[V1][Metrics] Updated list of deprecated metrics in v0.8 (#14695)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
2025-03-15 00:45:25 +08:00
Cyrus Leung
601bd3268e
[Misc] Clean up type annotation for SupportsMultiModal (#14794)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-03-14 00:59:56 -07:00
Thien Tran
95d680b862
[Bugfix][IPEX] Add VLLM_CPU_MOE_PREPACK to allow disabling MoE prepack when CPU does not support it (#14681)
Signed-off-by: Thien Tran <gau.nernst@yahoo.com.sg>
2025-03-13 20:43:18 -07:00
Chen Zhang
60c872d4b6
[Doc] Fix small typo in Transformers fallback (#14791)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
2025-03-13 20:33:12 -07:00
yasu52
3fb17d26c8
[Doc] Fix typo in documentation (#14783)
Signed-off-by: yasu52 <tsuguro4649@gmail.com>
2025-03-13 20:33:09 -07:00
Isotr0py
b1cc4dfef5
[VLM] Support loading InternVideo2.5 models as original InternVLChatModel (#14738)
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-03-13 03:10:02 -07:00
Cyrus Leung
382403921f
[VLM] Support pan-and-scan for Gemma3 multi-modal processor (#14672)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: Roger Wang <ywang@roblox.com>
2025-03-13 02:23:12 -07:00
Woosuk Kwon
c0c25e25fa
[Model] Add support for Gemma 3 (#14660)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Signed-off-by: Roger Wang <ywang@roblox.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: Roger Wang <ywang@roblox.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-03-12 08:36:33 -07:00
Kunshang Ji
c6e14a61ab
[Hardware][Intel GPU] upgrade IPEX dependency to 2.6.10. (#14564)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
2025-03-11 17:11:47 +00:00
Dilip Gowda Bhagavan
07964e2f30
docs: Add documentation for s390x cpu implementation (#14198)
Signed-off-by: Dilip Gowda Bhagavan <dilip.bhagavan@ibm.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-03-11 17:02:17 +00:00
Cyrus Leung
af295e9b01
[Bugfix] Update --hf-overrides for Alibaba-NLP/gte-Qwen2 (#14609)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-03-11 07:59:43 -07:00
Harry Mellor
bc2d4473bf
[Docs] Make installation URLs nicer (#14556)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-03-10 10:43:08 -07:00
Harry Mellor
3b352a2f92
Correct capitalisation: VLLM -> vLLM (#14562)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-03-10 16:36:21 +00:00
Cyrus Leung
001a9c7b0d
[Doc] Update PaliGemma note to a warning (#14565)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-03-10 15:02:28 +00:00
Chauncey
b0746fae3d
[Frontend] support image embeds (#13955)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-03-10 12:36:03 +00:00
Harry Mellor
60a98b2de5
[Docs] Mention model_impl arg when explaining Transformers fallback (#14552)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-03-10 12:13:10 +00:00
Harry Mellor
206e2577fa
Move requirements into their own directory (#12547)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-03-08 16:44:35 +00:00
Harry Mellor
cfd0ae8234
Add RLHF document (#14482)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-03-08 09:51:39 +00:00
Harry Mellor
be0b399d74
Add training doc signposting to TRL (#14439)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-03-08 07:35:07 +00:00
Robin
c908a07f57
[Doc] Added QwQ-32B to the supported models list in the reasoning out… (#14479)
Signed-off-by: WangErXiao <863579016@qq.com>
2025-03-08 07:07:32 +00:00
Robin
7b6fd6e486
[Doc]add doc for Qwen models tool calling (#14478)
Signed-off-by: WangErXiao <863579016@qq.com>
2025-03-08 06:58:46 +00:00
youkaichao
c6359e8ca6
[v1] torch.compile integration explanation (#14437)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-03-08 01:55:50 +08:00
York-RDWang
f7ebad2307
[Doc] Update prefix_caching.md to match the example image (#14420) 2025-03-07 15:29:00 +00:00
Peng Li
70da0c0748
correct wrong markdown syntax (#14414)
Signed-off-by: vincent-pli <justdoit.pli@gmail.com>
2025-03-07 08:01:18 +00:00
Michael Goin
04222984f8
[Docs] Add nsight guide to profiling docs (#14298)
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-03-06 14:19:58 -08:00
Irina Yuryeva
4f27044aab
[Doc] Correct beam_search using in generative_models.md (#14363) 2025-03-06 15:37:10 +00:00
Yanyi Liu
0ddc991f5c
[Doc] Update reasoning with stream example to use OpenAI library (#14077)
Signed-off-by: liuyanyi <wolfsonliu@163.com>
2025-03-06 13:20:37 +00:00
Nicolò Lucchesi
fa82b93853
[Frontend][Docs] Transcription API streaming (#13301)
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-03-06 10:39:35 +00:00
lkchen
5d802522a7
[V1][VLM][Pixtral-HF] Support Pixtral-HF on V1 (#14275)
Signed-off-by: Linkun Chen <github@lkchen.net>
2025-03-06 08:58:41 +00:00
kYLe
1769928079
[Model] Update Paligemma multimodal processing with PromptUpdate (#14015)
Signed-off-by: Kyle Huang <kylhuang@nvidia.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-03-06 08:31:38 +00:00
Ce Gao
f5f7f00cd9
[Bugfix][Structured Output] Support outlines engine with reasoning outputs for DeepSeek R1 (#14114) 2025-03-06 03:49:20 +00:00
Rui Qiao
abcc61e0af
[misc] Mention ray list nodes command to troubleshoot ray issues (#14318)
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
2025-03-06 02:00:36 +00:00
Simon Mo
ca2ca8de57
[Docs] Add Meta Slides (#14297)
Signed-off-by: simon-mo <simon.mo@hey.com>
2025-03-05 08:30:23 -08:00
DaividFrank
8f808cf86e
prefix_caching.md: Fixed typo (#14293)
Signed-off-by: Daivid Savernin-Frenk <daivid.frank@TurboNext.ai>
2025-03-05 15:43:13 +00:00
Cyrus Leung
7f89a594dd
[Doc] [3/N] Refer code examples for common cases in dev multimodal processor (#14278)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-03-05 12:29:50 +00:00