Mor Zusman
ffa48c9146
[Model] PP support for Mamba-like models ( #10992 )
...
Signed-off-by: mzusman <mor.zusmann@gmail.com>
2024-12-10 21:53:37 -05:00
Maxime Fournioux
fe2e10c71b
Add example of helm chart for vllm deployment on k8s ( #9199 )
...
Signed-off-by: Maxime Fournioux <55544262+mfournioux@users.noreply.github.com>
2024-12-10 09:19:27 +00:00
Michael Goin
6d525288c1
[Docs] Add dedicated tool calling page to docs ( #10554 )
...
Signed-off-by: mgoin <michael@neuralmagic.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
2024-12-09 20:15:34 -05:00
Roger Wang
af7c4a92e6
[Doc][V1] Add V1 support column for multimodal models ( #10998 )
...
Signed-off-by: Roger Wang <ywang@roblox.com>
2024-12-08 22:29:16 -08:00
Cyrus Leung
c889d5888b
[Doc] Explicitly state that PP isn't compatible with speculative decoding yet ( #10975 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-12-07 17:20:49 +00:00
Cyrus Leung
39e227c7ae
[Model] Update multi-modal processor to support Mantis(LLaVA) model ( #10711 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-12-07 17:10:05 +00:00
Cyrus Leung
1c768fe537
[Doc] Explicitly state that InternVL 2.5 is supported ( #10978 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-12-07 16:58:02 +00:00
Sam Stoelinga
7406274041
[Doc] add KubeAI to serving integrations ( #10837 )
...
Signed-off-by: Sam Stoelinga <sammiestoel@gmail.com>
2024-12-06 17:03:56 +00:00
Cyrus Leung
aa39a8e175
[Doc] Create a new "Usage" section ( #10827 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-12-05 11:19:35 +08:00
Daniele
e4c34c23de
[CI/Build] improve python-only dev setup ( #9621 )
...
Signed-off-by: Daniele Trifirò <dtrifiro@redhat.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
2024-12-04 21:48:13 +00:00
Kevin H. Luu
c92acb9693
[ci/build] Update vLLM postmerge ECR repo ( #10887 )
2024-12-04 09:01:20 +00:00
Aaron Pham
9323a3153b
[Core][Performance] Add XGrammar support for guided decoding and set it as default ( #10785 )
...
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
Signed-off-by: mgoin <michael@neuralmagic.com>
Co-authored-by: mgoin <michael@neuralmagic.com>
2024-12-03 15:17:00 +08:00
Russell Bryant
ef51831ee8
[Doc] Add github links for source code references ( #10672 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-12-03 06:46:07 +00:00
youkaichao
169a0ff911
[doc] add warning about comparing hf and vllm outputs ( #10805 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-12-01 00:41:38 -08:00
Cyrus Leung
133707123e
[Model] Replace embedding models with pooling adapter ( #10769 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-12-01 08:02:54 +08:00
wangxiyuan
7e4bbda573
[doc] format fix ( #10789 )
...
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2024-11-30 11:38:40 +00:00
Isotr0py
c83919c7a6
[Model] Add Internlm2 LoRA support ( #5064 )
...
Signed-off-by: Isotr0py <2037008807@qq.com>
2024-11-28 17:29:04 +00:00
sixgod
5fc5ce0fe4
[Model] Added GLM-4 series hf format model support vllm==0.6.4 ( #10561 )
...
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2024-11-28 14:53:31 +00:00
罗泽轩
278be671a3
[Doc] Update model in arch_overview.rst to match comment ( #10701 )
...
Signed-off-by: spacewander <spacewanderlzx@gmail.com>
2024-11-27 23:58:39 -08:00
shunxing12345
1209261e93
[Model] Support telechat2 ( #10311 )
...
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: xiangw2 <xiangw2@chinatelecom.cn>
Co-authored-by: Isotr0py <2037008807@qq.com>
2024-11-27 11:32:35 +00:00
Murali Andoorveedu
db66e018ea
[Bugfix] Fix for Spec model TP + Chunked Prefill ( #10232 )
...
Signed-off-by: andoorve <37849411+andoorve@users.noreply.github.com>
Signed-off-by: Sourashis Roy <sroy@roblox.com>
Co-authored-by: Sourashis Roy <sroy@roblox.com>
2024-11-26 09:11:16 -08:00
Sage Moore
9a88f89799
custom allreduce + torch.compile ( #10121 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
2024-11-25 22:00:16 -08:00
Sanket Kale
a6760f6456
[Feature] vLLM ARM Enablement for AARCH64 CPUs ( #9228 )
...
Signed-off-by: Sanket Kale <sanketk.kale@fujitsu.com>
Co-authored-by: Sanket Kale <sanketk.kale@fujitsu.com>
Co-authored-by: mgoin <michael@neuralmagic.com>
2024-11-25 18:32:39 -08:00
Shane A
9db713a1dc
[Model] Add OLMo November 2024 model ( #10503 )
2024-11-25 17:26:40 -05:00
Cyrus Leung
1b583cfefa
[Doc] Fix typos in docs ( #10636 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-11-25 10:15:45 -08:00
zhou fan
b1d920531f
[Model]: Add support for Aria model ( #10514 )
...
Signed-off-by: xffxff <1247714429@qq.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
2024-11-25 18:10:55 +00:00
fzyzcjy
2b0879bfc2
Super tiny little typo fix ( #10633 )
2024-11-25 13:08:30 +00:00
Cyrus Leung
ed46f14321
[Model] Support is_causal
HF config field for Qwen2 model ( #10621 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-11-25 09:51:20 +00:00
Cyrus Leung
a30a605d21
[Doc] Add encoder-based models to Supported Models page ( #10616 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-11-25 06:34:07 +00:00
Maximilien de Bayser
214efc2c3c
Support Cross encoder models ( #10400 )
...
Signed-off-by: Max de Bayser <maxdebayser@gmail.com>
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Signed-off-by: Flavia Beo <flavia.beo@ibm.com>
Co-authored-by: Flavia Beo <flavia.beo@ibm.com>
2024-11-24 18:56:20 -08:00
youkaichao
e4fbb14414
[doc] update the code to add models ( #10603 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2024-11-24 11:21:40 -08:00
Michael Goin
9afa014552
Add small example to metrics.rst ( #10550 )
2024-11-21 23:43:43 +00:00
Li, Jiang
63f1fde277
[Hardware][CPU] Support chunked-prefill and prefix-caching on CPU ( #10355 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2024-11-20 10:57:39 +00:00
wchen61
7629a9c6e5
[CI/Build] Support compilation with local cutlass path ( #10423 ) ( #10424 )
2024-11-19 21:35:50 -08:00
Cyrus Leung
b4be5a8adb
[Bugfix] Enforce no chunked prefill for embedding models ( #10470 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-11-20 05:12:51 +00:00
Russell Bryant
5390d6664f
[Doc] Add the start of an arch overview page ( #10368 )
2024-11-19 09:52:11 +00:00
Michael Goin
74f8c2cf5f
Add openai.beta.chat.completions.parse example to structured_outputs.rst ( #10433 )
2024-11-19 04:37:46 +00:00
Yan Ma
6b2d25efc7
[Hardware][XPU] AWQ/GPTQ support for xpu backend ( #10107 )
...
Signed-off-by: yan ma <yan.ma@intel.com>
2024-11-18 11:18:05 -07:00
ismael-dm
31894a2155
[Doc] Add documentation for Structured Outputs ( #9943 )
...
Signed-off-by: ismael-dm <ismaeldm99@gmail.com>
2024-11-18 09:52:12 -08:00
B-201
4186be8111
[Doc] Update doc for LoRA support in GLM-4V ( #10425 )
...
Signed-off-by: B-201 <Joy25810@foxmail.com>
2024-11-18 15:08:30 +00:00
youkaichao
755b85359b
[doc] add doc for the plugin system ( #10372 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-11-15 21:46:27 -08:00
Cyrus Leung
32e46e000f
[Frontend] Automatic detection of chat content format from AST ( #9919 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-11-16 13:35:40 +08:00
Michael Green
4f168f69a3
[Docs] Misc updates to TPU installation instructions ( #10165 )
2024-11-15 13:26:17 -08:00
Russell Bryant
3e8d14d8a1
[Doc] Move PR template content to docs ( #10159 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2024-11-15 13:20:20 -08:00
Simon Mo
c76ac49d26
[Docs] Add Nebius as sponsors ( #10371 )
...
Signed-off-by: simon-mo <simon.mo@hey.com>
2024-11-15 12:47:40 -08:00
Cyrus Leung
2ac6d0e75b
[Misc] Consolidate pooler config overrides ( #10351 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-11-15 06:59:00 +00:00
Cyrus Leung
b40cf6402e
[Model] Support Qwen2 embeddings and use tags to select model tests ( #10184 )
2024-11-14 20:23:09 -08:00
Woosuk Kwon
1dbae0329c
[Docs] Publish meetup slides ( #10331 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2024-11-14 16:19:38 +00:00
Mike Depinet
f67ce05d0b
[Frontend] Pythonic tool parser ( #9859 )
...
Signed-off-by: Mike Depinet <mike@fixie.ai>
2024-11-14 04:14:34 +00:00
youkaichao
504ac53d18
[misc] error early for old-style class ( #10304 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-11-13 18:55:39 -08:00