Woosuk Kwon
|
1a8bfd92d5
|
[Hardware] Initial TPU integration (#5292)
|
2024-06-12 11:53:03 -07:00 |
|
Kuntai Du
|
9fde251bf0
|
[Doc] Add an automatic prefix caching section in vllm documentation (#5324)
Co-authored-by: simon-mo <simon.mo@hey.com>
|
2024-06-11 10:24:59 -07:00 |
|
Cade Daniel
|
4c2ffb28ff
|
[Speculative decoding] Initial spec decode docs (#5400)
|
2024-06-11 10:15:40 -07:00 |
|
youkaichao
|
d8f31f2f8b
|
[Doc] add debugging tips (#5409)
|
2024-06-10 23:21:43 -07:00 |
|
Michael Goin
|
77c87beb06
|
[Doc] Add documentation for FP8 W8A8 (#5388)
|
2024-06-10 18:55:12 -06:00 |
|
Cyrus Leung
|
7a64d24aad
|
[Core] Support image processor (#4197)
|
2024-06-02 22:56:41 -07:00 |
|
Cyrus Leung
|
5ae5ed1e60
|
[Core] Consolidate prompt arguments to LLM engines (#4328)
Co-authored-by: Roger Wang <ywang@roblox.com>
|
2024-05-28 13:29:31 -07:00 |
|
Simon Mo
|
e941f88584
|
[Docs] Add acknowledgment for sponsors (#4925)
|
2024-05-21 00:17:25 -07:00 |
|
Zhuohan Li
|
c579b750a0
|
[Doc] Add meetups to the doc (#4798)
|
2024-05-13 18:48:00 -07:00 |
|
Cyrus Leung
|
4bfa7e7f75
|
[Doc] Add API reference for offline inference (#4710)
|
2024-05-13 17:47:42 -07:00 |
|
SangBin Cho
|
36fb68f947
|
[Doc] Chunked Prefill Documentation (#4580)
|
2024-05-04 00:18:00 -07:00 |
|
youkaichao
|
2d7bce9cd5
|
[Doc] add env vars to the doc (#4572)
|
2024-05-03 05:13:49 +00:00 |
|
Prashant Gupta
|
b31a1fb63c
|
[Doc] add visualization for multi-stage dockerfile (#4456)
Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
|
2024-04-30 17:41:59 +00:00 |
|
Harry Mellor
|
3d925165f2
|
Add example scripts to documentation (#4225)
Co-authored-by: Harry Mellor <hmellor@oxts.com>
|
2024-04-22 16:36:54 +00:00 |
|
Adrian Abeyta
|
2ff767b513
|
Enable scaled FP8 (e4m3fn) KV cache on ROCm (AMD GPU) (#3290)
Co-authored-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
Co-authored-by: HaiShaw <hixiao@gmail.com>
Co-authored-by: AdrianAbeyta <Adrian.Abeyta@amd.com>
Co-authored-by: Matthew Wong <Matthew.Wong2@amd.com>
Co-authored-by: root <root@gt-pla-u18-08.pla.dcgpu>
Co-authored-by: mawong-amd <156021403+mawong-amd@users.noreply.github.com>
Co-authored-by: ttbachyinsda <ttbachyinsda@outlook.com>
Co-authored-by: guofangze <guofangze@kuaishou.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: jacobthebanana <50071502+jacobthebanana@users.noreply.github.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2024-04-03 14:15:55 -07:00 |
|
bigPYJ1151
|
0e3f06fe9c
|
[Hardware][Intel] Add CPU inference backend (#3634)
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
Co-authored-by: Yuan Zhou <yuan.zhou@intel.com>
|
2024-04-01 22:07:30 -07:00 |
|
yhu422
|
d8658c8cc1
|
Usage Stats Collection (#2852)
|
2024-03-28 22:16:12 -07:00 |
|
Simon Mo
|
ef65dcfa6f
|
[Doc] Add docs about OpenAI compatible server (#3288)
|
2024-03-18 22:05:34 -07:00 |
|
Sherlock Xu
|
b0925b3878
|
docs: Add BentoML deployment doc (#3336)
Signed-off-by: Sherlock113 <sherlockxu07@gmail.com>
|
2024-03-12 10:34:30 -07:00 |
|
Jialun Lyu
|
27a7b070db
|
Add document for vllm paged attention kernel. (#2978)
|
2024-03-04 09:23:34 -08:00 |
|
Liangfu Chen
|
d0fae88114
|
[DOC] add setup document to support neuron backend (#2777)
|
2024-03-04 01:03:51 +00:00 |
|
Yuan Tang
|
49d849b3ab
|
docs: Add tutorial on deploying vLLM model with KServe (#2586)
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
|
2024-03-01 11:04:14 -08:00 |
|
Simon Mo
|
f964493274
|
[CI] Ensure documentation build is checked in CI (#2842)
|
2024-02-12 22:53:07 -08:00 |
|
Philipp Moritz
|
4ca2c358b1
|
Add documentation section about LoRA (#2834)
|
2024-02-12 17:24:45 +01:00 |
|
Zhuohan Li
|
1af090b57d
|
Bump up version to v0.3.0 (#2656)
|
2024-01-31 00:07:07 -08:00 |
|
Jiaxiang
|
6549aef245
|
[DOC] Add additional comments for LLMEngine and AsyncLLMEngine (#1011)
|
2024-01-11 19:26:49 -08:00 |
|
Woosuk Kwon
|
26c52a5ea6
|
[Docs] Add CUDA graph support to docs (#2148)
|
2023-12-17 01:49:20 -08:00 |
|
Woosuk Kwon
|
b81a6a6bb3
|
[Docs] Add supported quantization methods to docs (#2135)
|
2023-12-15 13:29:22 -08:00 |
|
TJian
|
6ccc0bfffb
|
Merge EmbeddedLLM/vllm-rocm into vLLM main (#1836)
Co-authored-by: Philipp Moritz <pcmoritz@gmail.com>
Co-authored-by: Amir Balwel <amoooori04@gmail.com>
Co-authored-by: root <kuanfu.liu@akirakan.com>
Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com>
Co-authored-by: kuanfu <kuanfu.liu@embeddedllm.com>
Co-authored-by: miloice <17350011+kliuae@users.noreply.github.com>
|
2023-12-07 23:16:52 -08:00 |
|
Simon Mo
|
5313c2cb8b
|
Add Production Metrics in Prometheus format (#1890)
|
2023-12-02 16:37:44 -08:00 |
|
Massimiliano Pronesti
|
05a38612b0
|
docs: add instruction for langchain (#1162)
|
2023-11-30 10:57:44 -08:00 |
|
Casper
|
a921d8be9d
|
[DOCS] Add engine args documentation (#1741)
|
2023-11-22 12:31:27 -08:00 |
|
Casper
|
8516999495
|
Add Quantization and AutoAWQ to docs (#1235)
|
2023-11-04 22:43:39 -07:00 |
|
Stephen Krider
|
9cabcb7645
|
Add Dockerfile (#1350)
|
2023-10-31 12:36:47 -07:00 |
|
Tanmay Verma
|
6f2dd6c37e
|
Add documentation to Triton server tutorial (#983)
|
2023-09-20 10:32:40 -07:00 |
|
Woosuk Kwon
|
eda1a7cad3
|
Announce paper release (#1036)
|
2023-09-13 17:38:13 -07:00 |
|
Zhanghao Wu
|
58df2883cb
|
[Doc] Add doc for running vLLM on the cloud (#426)
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
|
2023-07-16 13:37:14 -07:00 |
|
Zhuohan Li
|
2cf1a333b6
|
[Doc] Documentation for distributed inference (#261)
|
2023-06-26 11:34:23 -07:00 |
|
Woosuk Kwon
|
794e578de0
|
[Minor] Fix URLs (#166)
|
2023-06-19 22:57:14 -07:00 |
|
Woosuk Kwon
|
caddfc14c1
|
[Minor] Fix icons in doc (#165)
|
2023-06-19 20:35:38 -07:00 |
|
Woosuk Kwon
|
b7e62d3454
|
Fix repo & documentation URLs (#163)
|
2023-06-19 20:03:40 -07:00 |
|
Woosuk Kwon
|
364536acd1
|
[Docs] Minor fix (#162)
|
2023-06-19 19:58:23 -07:00 |
|
Zhuohan Li
|
a255885f83
|
Add logo and polish readme (#156)
|
2023-06-19 16:31:13 +08:00 |
|
Woosuk Kwon
|
dcda03b4cb
|
Write README and front page of doc (#147)
|
2023-06-18 03:19:38 -07:00 |
|
Zhuohan Li
|
bec7b2dc26
|
Add quickstart guide (#148)
|
2023-06-18 01:26:12 +08:00 |
|
Woosuk Kwon
|
0b98ba15c7
|
Change the name to vLLM (#150)
|
2023-06-17 03:07:40 -07:00 |
|
Woosuk Kwon
|
62ec38ea41
|
Document supported models (#127)
|
2023-06-02 22:35:17 -07:00 |
|
Woosuk Kwon
|
19d2899439
|
Add initial sphinx docs (#120)
|
2023-05-22 17:02:44 -07:00 |
|