738 Commits

Author SHA1 Message Date
Harry Mellor
be0b399d74
Add training doc signposting to TRL (#14439)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-03-08 07:35:07 +00:00
Robin
c908a07f57
[Doc] Added QwQ-32B to the supported models list in the reasoning out… (#14479)
Signed-off-by: WangErXiao <863579016@qq.com>
2025-03-08 07:07:32 +00:00
Robin
7b6fd6e486
[Doc]add doc for Qwen models tool calling (#14478)
Signed-off-by: WangErXiao <863579016@qq.com>
2025-03-08 06:58:46 +00:00
youkaichao
c6359e8ca6
[v1] torch.compile integration explanation (#14437)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-03-08 01:55:50 +08:00
York-RDWang
f7ebad2307
[Doc] Update prefix_caching.md to match the example image (#14420) 2025-03-07 15:29:00 +00:00
Peng Li
70da0c0748
correct wrong markdown syntax (#14414)
Signed-off-by: vincent-pli <justdoit.pli@gmail.com>
2025-03-07 08:01:18 +00:00
Michael Goin
04222984f8
[Docs] Add nsight guide to profiling docs (#14298)
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-03-06 14:19:58 -08:00
Irina Yuryeva
4f27044aab
[Doc] Correct beam_search using in generative_models.md (#14363) 2025-03-06 15:37:10 +00:00
Yanyi Liu
0ddc991f5c
[Doc] Update reasoning with stream example to use OpenAI library (#14077)
Signed-off-by: liuyanyi <wolfsonliu@163.com>
2025-03-06 13:20:37 +00:00
Nicolò Lucchesi
fa82b93853
[Frontend][Docs] Transcription API streaming (#13301)
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-03-06 10:39:35 +00:00
lkchen
5d802522a7
[V1][VLM][Pixtral-HF] Support Pixtral-HF on V1 (#14275)
Signed-off-by: Linkun Chen <github@lkchen.net>
2025-03-06 08:58:41 +00:00
kYLe
1769928079
[Model] Update Paligemma multimodal processing with PromptUpdate (#14015)
Signed-off-by: Kyle Huang <kylhuang@nvidia.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-03-06 08:31:38 +00:00
Ce Gao
f5f7f00cd9
[Bugfix][Structured Output] Support outlines engine with reasoning outputs for DeepSeek R1 (#14114) 2025-03-06 03:49:20 +00:00
Rui Qiao
abcc61e0af
[misc] Mention ray list nodes command to troubleshoot ray issues (#14318)
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
2025-03-06 02:00:36 +00:00
Simon Mo
ca2ca8de57
[Docs] Add Meta Slides (#14297)
Signed-off-by: simon-mo <simon.mo@hey.com>
2025-03-05 08:30:23 -08:00
DaividFrank
8f808cf86e
prefix_caching.md: Fixed typo (#14293)
Signed-off-by: Daivid Savernin-Frenk <daivid.frank@TurboNext.ai>
2025-03-05 15:43:13 +00:00
Cyrus Leung
7f89a594dd
[Doc] [3/N] Refer code examples for common cases in dev multimodal processor (#14278)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-03-05 12:29:50 +00:00
Iacopo Poli
961644e6a8
[Doc] Update nginx guide: remove privileged from vllm container run and add target GPU ID (#14217)
Signed-off-by: Iacopo Poli <iacopo@lighton.ai>
2025-03-05 11:44:10 +00:00
Congcong Chen
0a995d5434
[Model] New model support for Phi-4-multimodal-instruct (#14119) 2025-03-04 20:57:01 -08:00
Mark McLoughlin
c2bd2196fc
[v1][Metrics] Add design doc (#12745)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>
2025-03-04 20:36:55 +00:00
Michael Goin
550c7ba3dc
[Docs] Update Dockerfile dependency image (#14215)
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-03-04 20:22:11 +00:00
youkaichao
3610fb4930
[doc] add "Failed to infer device type" to faq (#14200)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-03-04 20:47:06 +08:00
Travis Johnson
c060b71408
[Model] Add support for GraniteMoeShared models (#13313)
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-03-04 08:04:52 +08:00
Qubitium-ModelCloud
cd1d3c3df8
[Docs] Add GPTQModel (#14056)
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
2025-03-03 21:59:09 +00:00
Harry Mellor
98175b2816
Improve the docs for TransformersModel (#14147)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-03-03 17:03:05 +00:00
Harry Mellor
cf069aa8aa
Update deprecated Python 3.8 typing (#13971) 2025-03-02 17:34:51 -08:00
Ce Gao
bf33700ecd
[v0][structured output] Support reasoning output (#12955)
Signed-off-by: Ce Gao <cegao@tensorchord.ai>
2025-03-02 14:49:42 -05:00
qux-bbb
bc6ccb9878
[Doc] Source building add clone step (#14086)
Signed-off-by: qux-bbb <1147635419@qq.com>
2025-03-02 10:59:50 +00:00
Jee Jee Li
cc5e8f6db8
[Model] Add LoRA support for TransformersModel (#13770)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-03-02 09:17:34 +08:00
Kuntai Du
8994dabc22
[Documentation] Add more deployment guide for Kubernetes deployment (#13841)
Signed-off-by: KuntaiDu <kuntai@uchicago.edu>
Signed-off-by: Kuntai Du <kuntai@uchicago.edu>
2025-03-01 06:44:24 +00:00
Brayden Zhong
f64ffa8c25
[Docs] Add pipeline_parallel_size to optimization docs (#14059)
Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca>
2025-03-01 05:43:54 +00:00
Brayden Zhong
2aed2c9fa7
[Doc] Fix ROCm documentation (#14041)
Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca>
2025-02-28 16:42:07 +00:00
Harry Mellor
f58f8b5c96
Update AutoAWQ docs (#14042)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-02-28 15:20:29 +00:00
Cyrus Leung
1088f06242
[Doc] Move multimodal Embedding API example to Online Serving page (#14017)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-02-28 07:12:04 +00:00
Cyrus Leung
f1579b229d
[VLM] Generalized prompt updates for multi-modal processor (#13964)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-02-27 17:44:25 +00:00
王博伟
512d77d582
Update quickstart.md (#13958) 2025-02-27 16:05:11 +00:00
Szymon Ożóg
7f0be2aa24
[Model] Deepseek GGUF support (#13167) 2025-02-27 02:08:35 -08:00
Isotr0py
edf309ebbe
[VLM] Support multimodal inputs for Florence-2 models (#13320) 2025-02-27 02:06:41 -08:00
Michael Goin
ca377cf1b9
Use CUDA 12.4 as default for release and nightly wheels (#12098) 2025-02-26 19:06:37 -08:00
Jee Jee Li
5157338ed9
[Misc] Improve LoRA spelling (#13831) 2025-02-25 23:43:01 -08:00
Michael Goin
07c4353057
[Model] Support Grok1 (#13795)
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-02-26 01:07:12 +00:00
Harry Mellor
cdc1fa12eb
Remove unused kwargs from model definitions (#13555) 2025-02-24 17:13:52 -08:00
Nicolò Lucchesi
444b0f0f62
[Misc][Docs] Raise error when flashinfer is not installed and VLLM_ATTENTION_BACKEND is set (#12513)
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-02-24 10:43:21 -05:00
Cyrus Leung
8354f6640c
[Doc] Dockerfile instructions for optional dependencies and dev transformers (#13699) 2025-02-22 06:04:31 -08:00
Mark McLoughlin
2cb8c1540e
[Metrics] Add --show-hidden-metrics-for-version CLI arg (#13295) 2025-02-22 00:20:45 -08:00
Yuan Tang
8c0dd3d4df
docs: Add a note on full CI run in contributing guide (#13646) 2025-02-21 21:53:59 -08:00
Gabriel Marinho
1c3c975766
[FEATURE] Enables /score endpoint for embedding models (#12846) 2025-02-20 22:09:47 -08:00
Kante Yin
44c33f01f3
Add llmaz as another integration (#13643)
Signed-off-by: kerthcet <kerthcet@gmail.com>
2025-02-21 03:52:40 +00:00
Joe Runde
bfbc0b32c6
[Frontend] Add backend-specific options for guided decoding (#13505)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
2025-02-20 15:07:58 -05:00
Harry Mellor
992e5c3d34
Merge similar examples in offline_inference into single basic example (#12737) 2025-02-20 04:53:51 -08:00