151 Commits

Author SHA1 Message Date
youkaichao
2d7bce9cd5
[Doc] add env vars to the doc (#4572) 2024-05-03 05:13:49 +00:00
Frαnçois
e491c7e053
[Doc] update(example model): for OpenAI compatible serving (#4503) 2024-05-01 10:14:16 -07:00
fuchen.ljl
ee37328da0
Unable to find Punica extension issue during source code installation (#4494)
Co-authored-by: Simon Mo <simon.mo@hey.com>
2024-05-01 00:42:09 +00:00
Prashant Gupta
b31a1fb63c
[Doc] add visualization for multi-stage dockerfile (#4456)
Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-04-30 17:41:59 +00:00
SangBin Cho
a88081bf76
[CI] Disable non-lazy string operation on logging (#4326)
Co-authored-by: Danny Guinther <dguinther@neuralmagic.com>
2024-04-26 00:16:58 -07:00
Hongxia Yang
cf29b7eda4
[ROCm][Hardware][AMD][Doc] Documentation update for ROCm (#4376)
Co-authored-by: WoosukKwon <woosuk.kwon@berkeley.edu>
2024-04-25 18:12:25 -07:00
Isotr0py
fbf152d976
[Bugfix][Model] Refactor OLMo model to support new HF format in transformers 4.40.0 (#4324)
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2024-04-25 09:35:56 -07:00
Caio Mendes
96e90fdeb3
[Model] Adds Phi-3 support (#4298) 2024-04-25 03:06:57 +00:00
youkaichao
2768884ac4
[Doc] Add note for docker user (#4340)
Co-authored-by: Simon Mo <simon.mo@hey.com>
2024-04-24 21:09:44 +00:00
Harry Mellor
34128a697e
Fix autodoc directives (#4272)
Co-authored-by: Harry Mellor <hmellor@oxts.com>
2024-04-23 01:53:01 +00:00
Zhanghao Wu
ceaf4ed003
[Doc] Update the SkyPilot doc with serving and Llama-3 (#4276) 2024-04-22 15:34:31 -07:00
Harry Mellor
3d925165f2
Add example scripts to documentation (#4225)
Co-authored-by: Harry Mellor <hmellor@oxts.com>
2024-04-22 16:36:54 +00:00
xiaoji
7f2593b164
[Doc]: Update the doc of adding new models (#4236) 2024-04-21 09:57:08 -07:00
Harry Mellor
fe7d648fe5
Don't show default value for flags in EngineArgs (#4223)
Co-authored-by: Harry Mellor <hmellor@oxts.com>
2024-04-21 09:15:28 -07:00
Harry Mellor
682789d402
Fix missing docs and out of sync EngineArgs (#4219)
Co-authored-by: Harry Mellor <hmellor@oxts.com>
2024-04-19 20:51:33 -07:00
Simon Mo
705578ae14
[Docs] document that Meta Llama 3 is supported (#4175) 2024-04-18 10:55:48 -07:00
Sanger Steel
d619ae2d19
[Doc] Add better clarity for tensorizer usage (#4090)
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
2024-04-15 13:28:25 -07:00
Simon Mo
aceb17cf2d
[Docs] document that mixtral 8x22b is supported (#4073) 2024-04-14 14:35:55 -07:00
Sanger Steel
711a000255
[Frontend] [Core] feat: Add model loading using tensorizer (#3476) 2024-04-13 17:13:01 -07:00
Michael Feil
c2b4a1bce9
[Doc] Add typing hints / mypy types cleanup (#3816)
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
2024-04-11 17:17:21 -07:00
youkaichao
f3d0bf7589
[Doc][Installation] delete python setup.py develop (#3989) 2024-04-11 03:33:02 +00:00
Frαnçois
92cd2e2f21
[Doc] Fix getting stared to use publicly available model (#3963) 2024-04-10 18:05:52 +00:00
youkaichao
e35397468f
[Doc] Add doc to state our model support policy (#3948)
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
2024-04-10 17:03:02 +00:00
ywfang
b4543c8f6b
[Model] add minicpm (#3893) 2024-04-08 18:28:36 +08:00
youkaichao
95baec828f
[Core] enable out-of-tree model register (#3871) 2024-04-06 17:11:41 -07:00
youkaichao
d03d64fd2e
[CI/Build] refactor dockerfile & fix pip cache
[CI/Build] fix pip cache with vllm_nccl & refactor dockerfile to build wheels (#3859)
2024-04-04 21:53:16 -07:00
Sean Gallen
78107fa091
[Doc]Add asynchronous engine arguments to documentation. (#3810)
Co-authored-by: Simon Mo <simon.mo@hey.com>
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
2024-04-04 21:52:01 -07:00
Adrian Abeyta
2ff767b513
Enable scaled FP8 (e4m3fn) KV cache on ROCm (AMD GPU) (#3290)
Co-authored-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
Co-authored-by: HaiShaw <hixiao@gmail.com>
Co-authored-by: AdrianAbeyta <Adrian.Abeyta@amd.com>
Co-authored-by: Matthew Wong <Matthew.Wong2@amd.com>
Co-authored-by: root <root@gt-pla-u18-08.pla.dcgpu>
Co-authored-by: mawong-amd <156021403+mawong-amd@users.noreply.github.com>
Co-authored-by: ttbachyinsda <ttbachyinsda@outlook.com>
Co-authored-by: guofangze <guofangze@kuaishou.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: jacobthebanana <50071502+jacobthebanana@users.noreply.github.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2024-04-03 14:15:55 -07:00
Roger Wang
3bec41f41a
[Doc] Fix vLLMEngine Doc Page (#3791) 2024-04-02 09:49:37 -07:00
bigPYJ1151
0e3f06fe9c
[Hardware][Intel] Add CPU inference backend (#3634)
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
Co-authored-by: Yuan Zhou <yuan.zhou@intel.com>
2024-04-01 22:07:30 -07:00
youkaichao
9c82a1bec3
[Doc] Update installation doc (#3746)
[Doc] Update installation doc for build from source and explain the dependency on torch/cuda version (#3746)
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
2024-03-30 16:34:38 -07:00
yhu422
d8658c8cc1
Usage Stats Collection (#2852) 2024-03-28 22:16:12 -07:00
wenyujin333
d6ea427f04
[Model] Add support for Qwen2MoeModel (#3346) 2024-03-28 15:19:59 +00:00
Woosuk Kwon
6d9aa00fc4
[Docs] Add Command-R to supported models (#3669) 2024-03-27 15:20:00 -07:00
Megha Agarwal
e24336b5a7
[Model] Add support for DBRX (#3660) 2024-03-27 13:01:46 -07:00
Woosuk Kwon
e66b629c04
[Misc] Minor fix in KVCache type (#3652) 2024-03-26 23:14:06 -07:00
Jee Li
76879342a3
[Doc]add lora support (#3649) 2024-03-27 02:06:46 +00:00
SangBin Cho
01bfb22b41
[CI] Try introducing isort. (#3495) 2024-03-25 07:59:47 -07:00
youkaichao
42bc386129
[CI/Build] respect the common environment variable MAX_JOBS (#3600) 2024-03-24 17:04:00 -07:00
Lalit Pradhan
4c07dd28c0
[🚀 Ready to be merged] Added support for Jais models (#3183) 2024-03-21 09:45:24 +00:00
Jim Burtoft
63e8b28a99
[Doc] minor fix of spelling in amd-installation.rst (#3506) 2024-03-19 20:32:30 +00:00
Jim Burtoft
2a60c9bd17
[Doc] minor fix to neuron-installation.rst (#3505) 2024-03-19 13:21:35 -07:00
Simon Mo
ef65dcfa6f
[Doc] Add docs about OpenAI compatible server (#3288) 2024-03-18 22:05:34 -07:00
laneeee
8fa7357f2d
fix document error for value and v_vec illustration (#3421) 2024-03-15 16:06:09 -07:00
Sherlock Xu
b0925b3878
docs: Add BentoML deployment doc (#3336)
Signed-off-by: Sherlock113 <sherlockxu07@gmail.com>
2024-03-12 10:34:30 -07:00
Zhuohan Li
4c922709b6
Add distributed model executor abstraction (#3191) 2024-03-11 11:03:45 -07:00
Philipp Moritz
657061fdce
[docs] Add LoRA support information for models (#3299) 2024-03-11 00:54:51 -07:00
Roger Wang
99c3cfb83c
[Docs] Fix Unmocked Imports (#3275) 2024-03-08 09:58:01 -08:00
Jialun Lyu
27a7b070db
Add document for vllm paged attention kernel. (#2978) 2024-03-04 09:23:34 -08:00
Liangfu Chen
d0fae88114
[DOC] add setup document to support neuron backend (#2777) 2024-03-04 01:03:51 +00:00