Cyrus Leung
|
3f3e92e1f2
|
[Model] Automatic conversion of classification and reward models (#11469)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-24 18:22:22 +00:00 |
|
Cyrus Leung
|
9edca6bf8f
|
[Frontend] Online Pooling API (#11457)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-24 17:54:30 +08:00 |
|
Rafael Vasquez
|
32aa2059ad
|
[Docs] Convert rST to MyST (Markdown) (#11145)
Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>
|
2024-12-23 22:35:38 +00:00 |
|
Yuan Tang
|
2e726680b3
|
[Bugfix] torch nightly version in ROCm installation guide (#11423)
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
|
2024-12-23 17:20:22 +00:00 |
|
youkaichao
|
5d2248d81a
|
[doc] explain nccl requirements for rlhf (#11381)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-12-20 13:00:56 -08:00 |
|
omer-dayan
|
995f56236b
|
[Core] Loading model from S3 using RunAI Model Streamer as optional loader (#10192)
Signed-off-by: OmerD <omer@run.ai>
|
2024-12-20 16:46:24 +00:00 |
|
youkaichao
|
1ecc645b8f
|
[doc] backward compatibility for 0.6.4 (#11359)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-12-19 21:33:53 -08:00 |
|
youkaichao
|
7801f56ed7
|
[ci][gh200] dockerfile clean up (#11351)
Signed-off-by: drikster80 <ed.sealing@gmail.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: drikster80 <ed.sealing@gmail.com>
Co-authored-by: cenzhiyao <2523403608@qq.com>
|
2024-12-19 18:13:06 -08:00 |
|
Yehoshua Cohen
|
6c7f881541
|
[Model] Add JambaForSequenceClassification model (#10860)
Signed-off-by: Yehoshua Cohen <yehoshuaco@ai21.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: Yehoshua Cohen <yehoshuaco@ai21.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-19 22:48:06 +08:00 |
|
Travis Johnson
|
17ca964273
|
[Model] IBM Granite 3.1 (#11307)
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
|
2024-12-19 11:27:24 +08:00 |
|
kYLe
|
66d4b16724
|
[Frontend] Add OpenAI API support for input_audio (#11027)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-16 22:09:58 -08:00 |
|
youkaichao
|
35bae114a8
|
fix gh200 tests on main (#11246)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-12-16 17:22:38 -08:00 |
|
bk-TurbaAI
|
35ffa682b1
|
[Docs] hint to enable use of GPU performance counters in profiling tools for multi-node distributed serving (#11235)
Co-authored-by: Michael Goin <michael@neuralmagic.com>
|
2024-12-16 22:20:39 +00:00 |
|
Jani Monoses
|
bddbbcb132
|
[Model] Support Cohere2ForCausalLM (Cohere R7B) (#11203)
|
2024-12-16 09:56:19 +00:00 |
|
cennn
|
b3b1526f03
|
WIP: [CI/Build] simplify Dockerfile build for ARM64 / GH200 (#11212)
Signed-off-by: drikster80 <ed.sealing@gmail.com>
Co-authored-by: drikster80 <ed.sealing@gmail.com>
|
2024-12-16 09:20:49 +00:00 |
|
AlexHe99
|
da6f409246
|
Update deploying_with_k8s.rst (#10922)
|
2024-12-15 16:33:58 -08:00 |
|
Kuntai Du
|
38e599d6a8
|
[Doc] add documentation for disaggregated prefilling (#11197)
Signed-off-by: Kuntai Du <kuntai@uchicago.edu>
|
2024-12-15 13:31:16 -06:00 |
|
Jee Jee Li
|
15859f2357
|
[[Misc]Upgrade bitsandbytes to the latest version 0.45.0 (#11201)
|
2024-12-15 03:03:06 +00:00 |
|
Russell Bryant
|
4863e5fba5
|
[Core] V1: Use multiprocessing by default (#11074)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2024-12-13 16:27:32 -08:00 |
|
Cyrus Leung
|
0920ab9131
|
[Doc] Reorganize online pooling APIs (#11172)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-14 00:22:22 +08:00 |
|
Cyrus Leung
|
eeec9e3390
|
[Frontend] Separate pooling APIs in offline inference (#11129)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-13 10:40:07 +00:00 |
|
Jani Monoses
|
7cd7409142
|
PaliGemma 2 support (#11142)
|
2024-12-13 07:40:07 +00:00 |
|
Ramon Ziai
|
d4d5291cc2
|
fix(docs): typo in helm install instructions (#11141)
Signed-off-by: Ramon Ziai <ramon.ziai@bettermarks.com>
|
2024-12-12 17:36:32 +00:00 |
|
Pooya Davoodi
|
1da8f0e1dd
|
[Model] Add support for embedding model GritLM (#10816)
Signed-off-by: Pooya Davoodi <pooya.davoodi@parasail.io>
|
2024-12-12 06:39:16 +00:00 |
|
Yuan Tang
|
24a36d6d5f
|
Update link to LlamaStack remote vLLM guide in serving_with_llamastack.rst (#11112)
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
|
2024-12-12 02:39:21 +00:00 |
|
bingps
|
fd22220687
|
[Doc] Installed version of llmcompressor for int8/fp8 quantization (#11103)
Signed-off-by: Guangda Liu <bingps@users.noreply.github.com>
Co-authored-by: Guangda Liu <bingps@users.noreply.github.com>
|
2024-12-11 15:43:24 +00:00 |
|
Cyrus Leung
|
cad5c0a6ed
|
[Doc] Update docs to refer to pooling models (#11093)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-11 13:36:27 +00:00 |
|
Cyrus Leung
|
8f10d5e393
|
[Misc] Split up pooling tasks (#10820)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-11 01:28:00 -08:00 |
|
Mor Zusman
|
ffa48c9146
|
[Model] PP support for Mamba-like models (#10992)
Signed-off-by: mzusman <mor.zusmann@gmail.com>
|
2024-12-10 21:53:37 -05:00 |
|
Maxime Fournioux
|
fe2e10c71b
|
Add example of helm chart for vllm deployment on k8s (#9199)
Signed-off-by: Maxime Fournioux <55544262+mfournioux@users.noreply.github.com>
|
2024-12-10 09:19:27 +00:00 |
|
Michael Goin
|
6d525288c1
|
[Docs] Add dedicated tool calling page to docs (#10554)
Signed-off-by: mgoin <michael@neuralmagic.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2024-12-09 20:15:34 -05:00 |
|
Roger Wang
|
af7c4a92e6
|
[Doc][V1] Add V1 support column for multimodal models (#10998)
Signed-off-by: Roger Wang <ywang@roblox.com>
|
2024-12-08 22:29:16 -08:00 |
|
Cyrus Leung
|
c889d5888b
|
[Doc] Explicitly state that PP isn't compatible with speculative decoding yet (#10975)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-07 17:20:49 +00:00 |
|
Cyrus Leung
|
39e227c7ae
|
[Model] Update multi-modal processor to support Mantis(LLaVA) model (#10711)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-07 17:10:05 +00:00 |
|
Cyrus Leung
|
1c768fe537
|
[Doc] Explicitly state that InternVL 2.5 is supported (#10978)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-07 16:58:02 +00:00 |
|
Sam Stoelinga
|
7406274041
|
[Doc] add KubeAI to serving integrations (#10837)
Signed-off-by: Sam Stoelinga <sammiestoel@gmail.com>
|
2024-12-06 17:03:56 +00:00 |
|
Cyrus Leung
|
aa39a8e175
|
[Doc] Create a new "Usage" section (#10827)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-05 11:19:35 +08:00 |
|
Daniele
|
e4c34c23de
|
[CI/Build] improve python-only dev setup (#9621)
Signed-off-by: Daniele Trifirò <dtrifiro@redhat.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
|
2024-12-04 21:48:13 +00:00 |
|
Kevin H. Luu
|
c92acb9693
|
[ci/build] Update vLLM postmerge ECR repo (#10887)
|
2024-12-04 09:01:20 +00:00 |
|
Aaron Pham
|
9323a3153b
|
[Core][Performance] Add XGrammar support for guided decoding and set it as default (#10785)
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
Signed-off-by: mgoin <michael@neuralmagic.com>
Co-authored-by: mgoin <michael@neuralmagic.com>
|
2024-12-03 15:17:00 +08:00 |
|
Russell Bryant
|
ef51831ee8
|
[Doc] Add github links for source code references (#10672)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-03 06:46:07 +00:00 |
|
youkaichao
|
169a0ff911
|
[doc] add warning about comparing hf and vllm outputs (#10805)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-12-01 00:41:38 -08:00 |
|
Cyrus Leung
|
133707123e
|
[Model] Replace embedding models with pooling adapter (#10769)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-01 08:02:54 +08:00 |
|
wangxiyuan
|
7e4bbda573
|
[doc] format fix (#10789)
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
|
2024-11-30 11:38:40 +00:00 |
|
Isotr0py
|
c83919c7a6
|
[Model] Add Internlm2 LoRA support (#5064)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2024-11-28 17:29:04 +00:00 |
|
sixgod
|
5fc5ce0fe4
|
[Model] Added GLM-4 series hf format model support vllm==0.6.4 (#10561)
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2024-11-28 14:53:31 +00:00 |
|
罗泽轩
|
278be671a3
|
[Doc] Update model in arch_overview.rst to match comment (#10701)
Signed-off-by: spacewander <spacewanderlzx@gmail.com>
|
2024-11-27 23:58:39 -08:00 |
|
shunxing12345
|
1209261e93
|
[Model] Support telechat2 (#10311)
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: xiangw2 <xiangw2@chinatelecom.cn>
Co-authored-by: Isotr0py <2037008807@qq.com>
|
2024-11-27 11:32:35 +00:00 |
|
Murali Andoorveedu
|
db66e018ea
|
[Bugfix] Fix for Spec model TP + Chunked Prefill (#10232)
Signed-off-by: andoorve <37849411+andoorve@users.noreply.github.com>
Signed-off-by: Sourashis Roy <sroy@roblox.com>
Co-authored-by: Sourashis Roy <sroy@roblox.com>
|
2024-11-26 09:11:16 -08:00 |
|
Sage Moore
|
9a88f89799
|
custom allreduce + torch.compile (#10121)
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
|
2024-11-25 22:00:16 -08:00 |
|