[doc] update pipeline parallel in readme (#6347)
This commit is contained in:
parent
1df43de9bb
commit
2d23b42d92
@ -56,7 +56,7 @@ vLLM is flexible and easy to use with:
|
|||||||
|
|
||||||
- Seamless integration with popular Hugging Face models
|
- Seamless integration with popular Hugging Face models
|
||||||
- High-throughput serving with various decoding algorithms, including *parallel sampling*, *beam search*, and more
|
- High-throughput serving with various decoding algorithms, including *parallel sampling*, *beam search*, and more
|
||||||
- Tensor parallelism support for distributed inference
|
- Tensor parallelism and pipieline parallelism support for distributed inference
|
||||||
- Streaming outputs
|
- Streaming outputs
|
||||||
- OpenAI-compatible API server
|
- OpenAI-compatible API server
|
||||||
- Support NVIDIA GPUs, AMD CPUs and GPUs, Intel CPUs and GPUs, PowerPC CPUs
|
- Support NVIDIA GPUs, AMD CPUs and GPUs, Intel CPUs and GPUs, PowerPC CPUs
|
||||||
|
@ -38,7 +38,7 @@ vLLM is flexible and easy to use with:
|
|||||||
|
|
||||||
* Seamless integration with popular HuggingFace models
|
* Seamless integration with popular HuggingFace models
|
||||||
* High-throughput serving with various decoding algorithms, including *parallel sampling*, *beam search*, and more
|
* High-throughput serving with various decoding algorithms, including *parallel sampling*, *beam search*, and more
|
||||||
* Tensor parallelism support for distributed inference
|
* Tensor parallelism and pipieline parallelism support for distributed inference
|
||||||
* Streaming outputs
|
* Streaming outputs
|
||||||
* OpenAI-compatible API server
|
* OpenAI-compatible API server
|
||||||
* Support NVIDIA GPUs and AMD GPUs
|
* Support NVIDIA GPUs and AMD GPUs
|
||||||
|
Loading…
x
Reference in New Issue
Block a user