20231088/vllm

Rafael Vasquez 32aa2059ad

[Docs] Convert rST to MyST (Markdown) (#11145 )

Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>

2024-12-23 22:35:38 +00:00

506 B

Raw Blame History

(deploying-with-lws)=

Deploying with LWS

LeaderWorkerSet (LWS) is a Kubernetes API that aims to address common deployment patterns of AI/ML inference workloads. A major use case is for multi-host/multi-node distributed inference.

vLLM can be deployed with LWS on Kubernetes for distributed model serving.

Please see this guide for more details on deploying vLLM on Kubernetes using LWS.

506 B Raw Blame History

Deploying with LWS

506 B

Raw Blame History