305 Commits

Author SHA1 Message Date
Michael Goin
1744cc99ba
[Doc] Add Phi-3-medium to list of supported models (#5788) 2024-06-24 10:48:55 -07:00
Michael Goin
e72dc6cb35
[Doc] Add "Suggest edit" button to doc pages (#5789) 2024-06-24 10:26:17 -07:00
youkaichao
c246212952
[doc][faq] add warning to download models for every nodes (#5783) 2024-06-24 15:37:42 +08:00
Woosuk Kwon
8c00f9c15d
[Docs][TPU] Add installation tip for TPU (#5761) 2024-06-21 23:09:40 -07:00
Michael Goin
5b15bde539
[Doc] Documentation on supported hardware for quantization methods (#5745) 2024-06-21 12:44:29 -04:00
Roger Wang
1b2eaac316
[Bugfix][Doc] FIx Duplicate Explicit Target Name Errors (#5703) 2024-06-19 23:10:47 -07:00
Rafael Vasquez
e83db9e7e3
[Doc] Update docker references (#5614)
Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>
2024-06-19 15:01:45 -07:00
milo157
2bd231a7b7
[Doc] Added cerebrium as Integration option (#5553) 2024-06-18 15:56:59 -07:00
Isotr0py
daef218b55
[Model] Initialize Phi-3-vision support (#4986) 2024-06-17 19:34:33 -07:00
Kunshang Ji
728c4c8a06
[Hardware][Intel GPU] Add Intel GPU(XPU) inference backend (#3814)
Co-authored-by: Jiang Li <jiang1.li@intel.com>
Co-authored-by: Abhilash Majumder <abhilash.majumder@intel.com>
Co-authored-by: Abhilash Majumder <30946547+abhilash1910@users.noreply.github.com>
2024-06-17 11:01:25 -07:00
youkaichao
845a3f26f9
[Doc] add debugging tips for crash and multi-node debugging (#5581) 2024-06-17 10:08:01 +08:00
Sanger Steel
6e2527a7cb
[Doc] Update documentation on Tensorizer (#5471) 2024-06-14 11:27:57 -07:00
Simon Mo
cdab68dcdb
[Docs] Add ZhenFund as a Sponsor (#5548) 2024-06-14 11:17:21 -07:00
Cyrus Leung
0ce7b952f8
[Doc] Update LLaVA docs (#5437)
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-06-13 11:22:07 -07:00
Woosuk Kwon
a65634d3ae
[Docs] Add 4th meetup slides (#5509) 2024-06-13 10:18:26 -07:00
Li, Jiang
80aa7e91fc
[Hardware][Intel] Optimize CPU backend and add more performance tips (#4971)
Co-authored-by: Jianan Gu <jianan.gu@intel.com>
2024-06-13 09:33:14 -07:00
Cyrus Leung
b8d4dfff9c
[Doc] Update debug docs (#5438) 2024-06-12 14:49:31 -07:00
Woosuk Kwon
1a8bfd92d5
[Hardware] Initial TPU integration (#5292) 2024-06-12 11:53:03 -07:00
youkaichao
8f89d72090
[Doc] add common case for long waiting time (#5430) 2024-06-11 11:12:13 -07:00
Nick Hill
99dac099ab
[Core][Doc] Default to multiprocessing for single-node distributed case (#5230)
Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
2024-06-11 11:10:41 -07:00
Cade Daniel
89ec06c33b
[Docs] [Spec decode] Fix docs error in code example (#5427) 2024-06-11 10:31:56 -07:00
Kuntai Du
9fde251bf0
[Doc] Add an automatic prefix caching section in vllm documentation (#5324)
Co-authored-by: simon-mo <simon.mo@hey.com>
2024-06-11 10:24:59 -07:00
Cade Daniel
4c2ffb28ff
[Speculative decoding] Initial spec decode docs (#5400) 2024-06-11 10:15:40 -07:00
SangBin Cho
246598a6b1
[CI] docfix (#5410)
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: ywang96 <ywang@roblox.com>
2024-06-11 01:28:50 -07:00
Roger Wang
3c4cebf751
[Doc][Typo] Fixing Missing Comma (#5403) 2024-06-11 00:20:28 -07:00
youkaichao
d8f31f2f8b
[Doc] add debugging tips (#5409) 2024-06-10 23:21:43 -07:00
Michael Goin
77c87beb06
[Doc] Add documentation for FP8 W8A8 (#5388) 2024-06-10 18:55:12 -06:00
Woosuk Kwon
cb77ad836f
[Docs] Alphabetically sort sponsors (#5386) 2024-06-10 15:17:19 -05:00
Roger Wang
856c990041
[Docs] Add Docs on Limitations of VLM Support (#5383) 2024-06-10 09:53:50 -07:00
Cyrus Leung
6b29d6fe70
[Model] Initial support for LLaVA-NeXT (#4199)
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-06-10 12:47:15 +00:00
Roger Wang
7a9cb294ae
[Frontend] Add OpenAI Vision API Support (#5237)
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-06-07 11:23:32 -07:00
Simon Mo
f270a39537
[Docs] Add Sequoia as sponsors (#5287) 2024-06-05 18:02:56 +00:00
Jie Fu (傅杰)
87d5abef75
[Bugfix] Fix a bug caused by pip install setuptools>=49.4.0 for CPU backend (#5249) 2024-06-04 09:57:51 -07:00
Breno Faria
f775a07e30
[FRONTEND] OpenAI tools support named functions (#5032) 2024-06-03 18:25:29 -05:00
Cyrus Leung
7a64d24aad
[Core] Support image processor (#4197) 2024-06-02 22:56:41 -07:00
Nick Hill
657579113f
[Doc] Add checkmark for GPTBigCodeForCausalLM LoRA support (#5171) 2024-05-31 17:20:19 -07:00
Chansung Park
429d89720e
add doc about serving option on dstack (#3074)
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-05-30 10:11:07 -07:00
Cyrus Leung
a9bcc7afb2
[Doc] Use intersphinx and update entrypoints docs (#5125) 2024-05-30 09:59:23 -07:00
youkaichao
4fbcb0f27e
[Doc][Build] update after removing vllm-nccl (#5103)
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
2024-05-29 23:51:18 +00:00
Cyrus Leung
5ae5ed1e60
[Core] Consolidate prompt arguments to LLM engines (#4328)
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-05-28 13:29:31 -07:00
Simon Mo
290f4ada2b
[Docs] Add Dropbox as sponsors (#5089) 2024-05-28 10:29:09 -07:00
Eric Xihui Lin
8e192ff967
[Kernel][Backend][Model] Blocksparse flash attention kernel and Phi-3-Small model (#4799)
Co-authored-by: beagleski <yunanzhang@microsoft.com>
Co-authored-by: bapatra <bapatra@microsoft.com>
Co-authored-by: Barun Patra <codedecde@users.noreply.github.com>
Co-authored-by: Michael Goin <michael@neuralmagic.com>
2024-05-24 22:00:52 -07:00
youkaichao
6a50f4cafa
[Doc] add ccache guide in doc (#5012)
Co-authored-by: Michael Goin <michael@neuralmagic.com>
2024-05-23 23:21:54 +00:00
Simon Mo
e941f88584
[Docs] Add acknowledgment for sponsors (#4925) 2024-05-21 00:17:25 -07:00
Isotr0py
f12c3b5b3d
[Model] Add Phi-2 LoRA support (#4886) 2024-05-21 14:24:17 +09:00
Kante Yin
8e7fb5d43a
Support to serve vLLM on Kubernetes with LWS (#4829)
Signed-off-by: kerthcet <kerthcet@gmail.com>
2024-05-16 16:37:29 -07:00
Cyrus Leung
dc72402b57
[Bugfix][Doc] Fix CI failure in docs (#4804)
This PR fixes the CI failure introduced by #4798.

The failure originates from having duplicate target names in reST, and is fixed by changing the ref targets to anonymous ones. For more information, see this discussion.

I have also changed the format of the links to be more distinct from each other.
2024-05-15 01:57:08 +09:00
Zhuohan Li
c579b750a0
[Doc] Add meetups to the doc (#4798) 2024-05-13 18:48:00 -07:00
Cyrus Leung
4bfa7e7f75
[Doc] Add API reference for offline inference (#4710) 2024-05-13 17:47:42 -07:00
Zhuohan Li
ac1fbf7fd2
[Doc] Shorten README by removing supported model list (#4796) 2024-05-13 16:23:54 -07:00