13 Commits

Author SHA1 Message Date
youkaichao
c2cd1a2142
[doc] update pp support (#9853)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-10-30 13:36:51 -07:00
Murali Andoorveedu
fc912e0886
[Models] Support Qwen model with PP (#6974)
Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
2024-08-01 12:40:43 -07:00
youkaichao
f3ff63c3f4
[doc][distributed] improve multinode serving doc (#6804) 2024-07-25 15:38:32 -07:00
youkaichao
71950af726
[doc][distributed] fix doc argument order (#6691) 2024-07-23 08:55:33 -07:00
youkaichao
c051bfe4eb
[doc][distributed] doc for setting up multi-node environment (#6529)
[doc][distributed] add more doc for setting up multi-node environment (#6529)
2024-07-22 21:22:09 -07:00
Murali Andoorveedu
45ceb85a0c
[Docs] Update PP docs (#6598) 2024-07-19 16:38:21 -07:00
Cyrus Leung
5bf35a91e4
[Doc][CI/Build] Update docs and tests to use vllm serve (#6431) 2024-07-17 07:43:21 +00:00
youkaichao
94b82e8c18
[doc][distributed] add suggestion for distributed inference (#6418) 2024-07-15 09:45:51 -07:00
Murali Andoorveedu
673dd4cae9
[Docs] Docs update for Pipeline Parallel (#6222)
Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
Co-authored-by: Simon Mo <simon.mo@hey.com>
2024-07-09 16:24:58 -07:00
youkaichao
4050d646e5
[doc][misc] remove deprecated api server in doc (#6037) 2024-07-01 12:52:43 -04:00
youkaichao
c246212952
[doc][faq] add warning to download models for every nodes (#5783) 2024-06-24 15:37:42 +08:00
Nick Hill
99dac099ab
[Core][Doc] Default to multiprocessing for single-node distributed case (#5230)
Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
2024-06-11 11:10:41 -07:00
Zhuohan Li
2cf1a333b6
[Doc] Documentation for distributed inference (#261) 2023-06-26 11:34:23 -07:00