85 Commits

Author SHA1 Message Date
omer-dayan
995f56236b
[Core] Loading model from S3 using RunAI Model Streamer as optional loader (#10192)
Signed-off-by: OmerD <omer@run.ai>
2024-12-20 16:46:24 +00:00
Kuntai Du
38e599d6a8
[Doc] add documentation for disaggregated prefilling (#11197)
Signed-off-by: Kuntai Du <kuntai@uchicago.edu>
2024-12-15 13:31:16 -06:00
Russell Bryant
4863e5fba5
[Core] V1: Use multiprocessing by default (#11074)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2024-12-13 16:27:32 -08:00
Cyrus Leung
8f10d5e393
[Misc] Split up pooling tasks (#10820)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-12-11 01:28:00 -08:00
Maxime Fournioux
fe2e10c71b
Add example of helm chart for vllm deployment on k8s (#9199)
Signed-off-by: Maxime Fournioux <55544262+mfournioux@users.noreply.github.com>
2024-12-10 09:19:27 +00:00
Michael Goin
6d525288c1
[Docs] Add dedicated tool calling page to docs (#10554)
Signed-off-by: mgoin <michael@neuralmagic.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
2024-12-09 20:15:34 -05:00
Cyrus Leung
aa39a8e175
[Doc] Create a new "Usage" section (#10827)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-12-05 11:19:35 +08:00
Sanket Kale
a6760f6456
[Feature] vLLM ARM Enablement for AARCH64 CPUs (#9228)
Signed-off-by: Sanket Kale <sanketk.kale@fujitsu.com>
Co-authored-by: Sanket Kale <sanketk.kale@fujitsu.com>
Co-authored-by: mgoin <michael@neuralmagic.com>
2024-11-25 18:32:39 -08:00
Russell Bryant
5390d6664f
[Doc] Add the start of an arch overview page (#10368) 2024-11-19 09:52:11 +00:00
ismael-dm
31894a2155
[Doc] Add documentation for Structured Outputs (#9943)
Signed-off-by: ismael-dm <ismaeldm99@gmail.com>
2024-11-18 09:52:12 -08:00
youkaichao
755b85359b
[doc] add doc for the plugin system (#10372)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-11-15 21:46:27 -08:00
youkaichao
3a28f18b0b
[doc] explain the class hierarchy in vLLM (#10240)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-11-11 22:56:44 -08:00
Shawn Du
20cf2f553c
[Misc] small fixes to function tracing file path (#9543)
Signed-off-by: Shawn Du <shawnd200@outlook.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
2024-11-10 15:21:06 -08:00
youkaichao
8a4358ecb5
[doc] explaining the integration with huggingface (#10173)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-11-09 01:02:54 -08:00
Russell Bryant
3a7f15a398
[Doc] Move CONTRIBUTING to docs site (#9924)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2024-11-08 05:15:12 +00:00
Rafael Vasquez
d7263a1bb8
Doc: Improve benchmark documentation (#9927)
Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>
2024-11-06 23:50:35 -08:00
Konrad Zawora
a02a50e6e5
[Hardware][Intel-Gaudi] Add Intel Gaudi (HPU) inference backend (#6143)
Signed-off-by: yuwenzho <yuwen.zhou@intel.com>
Signed-off-by: Chendi.Xue <chendi.xue@intel.com>
Signed-off-by: Bob Zhu <bob.zhu@intel.com>
Signed-off-by: zehao-intel <zehao.huang@intel.com>
Signed-off-by: Konrad Zawora <kzawora@habana.ai>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
Co-authored-by: Sanju C Sudhakaran <scsudhakaran@habana.ai>
Co-authored-by: Michal Adamczyk <madamczyk@habana.ai>
Co-authored-by: Marceli Fylcek <mfylcek@habana.ai>
Co-authored-by: Himangshu Lahkar <49579433+hlahkar@users.noreply.github.com>
Co-authored-by: Vivek Goel <vgoel@habana.ai>
Co-authored-by: yuwenzho <yuwen.zhou@intel.com>
Co-authored-by: Dominika Olszewska <dolszewska@habana.ai>
Co-authored-by: barak goldberg <149692267+bgoldberg-habana@users.noreply.github.com>
Co-authored-by: Michal Szutenberg <37601244+szutenberg@users.noreply.github.com>
Co-authored-by: Jan Kaniecki <jkaniecki@habana.ai>
Co-authored-by: Agata Dobrzyniewicz <160237065+adobrzyniewicz-habana@users.noreply.github.com>
Co-authored-by: Krzysztof Wisniewski <kwisniewski@habana.ai>
Co-authored-by: Dudi Lester <160421192+dudilester@users.noreply.github.com>
Co-authored-by: Ilia Taraban <tarabanil@gmail.com>
Co-authored-by: Chendi.Xue <chendi.xue@intel.com>
Co-authored-by: Michał Kuligowski <mkuligowski@habana.ai>
Co-authored-by: Jakub Maksymczuk <jmaksymczuk@habana.ai>
Co-authored-by: Tomasz Zielinski <85164140+tzielinski-habana@users.noreply.github.com>
Co-authored-by: Sun Choi <schoi@habana.ai>
Co-authored-by: Iryna Boiko <iboiko@habana.ai>
Co-authored-by: Bob Zhu <41610754+czhu15@users.noreply.github.com>
Co-authored-by: hlin99 <73271530+hlin99@users.noreply.github.com>
Co-authored-by: Zehao Huang <zehao.huang@intel.com>
Co-authored-by: Andrzej Kotłowski <Andrzej.Kotlowski@intel.com>
Co-authored-by: Yan Tomsinsky <73292515+Yantom1@users.noreply.github.com>
Co-authored-by: Nir David <ndavid@habana.ai>
Co-authored-by: Yu-Zhou <yu.zhou@intel.com>
Co-authored-by: Ruheena Suhani Shaik <rsshaik@habana.ai>
Co-authored-by: Karol Damaszke <kdamaszke@habana.ai>
Co-authored-by: Marcin Swiniarski <mswiniarski@habana.ai>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: Jacek Czaja <jacek.czaja@intel.com>
Co-authored-by: Jacek Czaja <jczaja@habana.ai>
Co-authored-by: Yuan <yuan.zhou@outlook.com>
2024-11-06 01:09:10 -08:00
Cyrus Leung
06386a64dd
[Frontend] Chat-based Embeddings API (#9759) 2024-11-01 08:13:35 +00:00
Yuan
32a1ee74a0
[Hardware][Intel CPU][DOC] Update docs for CPU backend (#6212)
Signed-off-by: Yuan Zhou <yuan.zhou@intel.com>
Co-authored-by: Rafael Vasquez <rafvasq21@gmail.com>
Co-authored-by: Gubrud, Aaron D <aaron.d.gubrud@intel.com>
Co-authored-by: adgubrud <96072084+adgubrud@users.noreply.github.com>
2024-10-22 10:38:04 -07:00
Wallas Henrique
8baf85e4e9
[Doc] Compatibility matrix for mutual exclusive features (#8512)
Signed-off-by: Wallas Santos <wallashss@ibm.com>
2024-10-11 11:18:50 -07:00
TimWang
93cf74a8a7
[Doc]: Add deploying_with_k8s guide (#8451) 2024-10-07 13:31:45 -07:00
omrishiv
7c8566aa4f
[Doc] neuron documentation update (#8671)
Signed-off-by: omrishiv <327609+omrishiv@users.noreply.github.com>
2024-09-20 15:04:37 -07:00
Isotr0py
ea4647b7d7
[Doc] Add documentation for GGUF quantization (#8618) 2024-09-19 13:15:55 -06:00
William Lin
dd53c4b023
[misc] Add Torch profiler support (#7451)
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>
2024-08-21 15:39:26 -07:00
Michael Goin
b3f4e17935
[Doc] Add docs for llmcompressor INT8 and FP8 checkpoints (#7444) 2024-08-16 13:59:16 -07:00
Simon Mo
f020a6297e
[Docs] Update readme (#7316) 2024-08-11 17:13:37 -07:00
Kuntai Du
6a1e25b151
[Doc] Add documentations for nightly benchmarks (#6412) 2024-07-25 11:57:16 -07:00
dongmao zhang
87525fab92
[bitsandbytes]: support read bnb pre-quantized model (#5753)
Co-authored-by: Michael Goin <michael@neuralmagic.com>
2024-07-23 23:45:09 +00:00
Saliya Ekanayake
a27f87da34
[Doc] Fix Typo in Doc (#6392)
Co-authored-by: Saliya Ekanayake <esaliya@d-matrix.ai>
2024-07-13 00:48:23 +00:00
youkaichao
2d23b42d92
[doc] update pipeline parallel in readme (#6347) 2024-07-11 11:38:40 -07:00
Cyrus Leung
9389380015
[Doc] Move guide for multimodal model and other improvements (#6168) 2024-07-06 17:18:59 +08:00
ning.zhang
83bdcb6ac3
add FAQ doc under 'serving' (#5946) 2024-07-01 14:11:36 -07:00
Ilya Lavrenov
57f09a419c
[Hardware][Intel] OpenVINO vLLM backend (#5379) 2024-06-28 13:50:16 +00:00
Cyrus Leung
5cbe8d155c
[Core] Registry for processing model inputs (#5214)
Co-authored-by: ywang96 <ywang@roblox.com>
2024-06-28 12:09:56 +00:00
Michael Goin
5b15bde539
[Doc] Documentation on supported hardware for quantization methods (#5745) 2024-06-21 12:44:29 -04:00
Kunshang Ji
728c4c8a06
[Hardware][Intel GPU] Add Intel GPU(XPU) inference backend (#3814)
Co-authored-by: Jiang Li <jiang1.li@intel.com>
Co-authored-by: Abhilash Majumder <abhilash.majumder@intel.com>
Co-authored-by: Abhilash Majumder <30946547+abhilash1910@users.noreply.github.com>
2024-06-17 11:01:25 -07:00
Sanger Steel
6e2527a7cb
[Doc] Update documentation on Tensorizer (#5471) 2024-06-14 11:27:57 -07:00
Woosuk Kwon
1a8bfd92d5
[Hardware] Initial TPU integration (#5292) 2024-06-12 11:53:03 -07:00
Kuntai Du
9fde251bf0
[Doc] Add an automatic prefix caching section in vllm documentation (#5324)
Co-authored-by: simon-mo <simon.mo@hey.com>
2024-06-11 10:24:59 -07:00
Cade Daniel
4c2ffb28ff
[Speculative decoding] Initial spec decode docs (#5400) 2024-06-11 10:15:40 -07:00
youkaichao
d8f31f2f8b
[Doc] add debugging tips (#5409) 2024-06-10 23:21:43 -07:00
Michael Goin
77c87beb06
[Doc] Add documentation for FP8 W8A8 (#5388) 2024-06-10 18:55:12 -06:00
Cyrus Leung
7a64d24aad
[Core] Support image processor (#4197) 2024-06-02 22:56:41 -07:00
Cyrus Leung
5ae5ed1e60
[Core] Consolidate prompt arguments to LLM engines (#4328)
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-05-28 13:29:31 -07:00
Simon Mo
e941f88584
[Docs] Add acknowledgment for sponsors (#4925) 2024-05-21 00:17:25 -07:00
Zhuohan Li
c579b750a0
[Doc] Add meetups to the doc (#4798) 2024-05-13 18:48:00 -07:00
Cyrus Leung
4bfa7e7f75
[Doc] Add API reference for offline inference (#4710) 2024-05-13 17:47:42 -07:00
SangBin Cho
36fb68f947
[Doc] Chunked Prefill Documentation (#4580) 2024-05-04 00:18:00 -07:00
youkaichao
2d7bce9cd5
[Doc] add env vars to the doc (#4572) 2024-05-03 05:13:49 +00:00
Prashant Gupta
b31a1fb63c
[Doc] add visualization for multi-stage dockerfile (#4456)
Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-04-30 17:41:59 +00:00