Rafael Vasquez 43f3d9e699
[CI/Build] Add markdown linter (#11857)
Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>
2025-01-12 00:17:13 -08:00

983 B

(deployment-llamastack)=

Llama Stack

vLLM is also available via Llama Stack .

To install Llama Stack, run

pip install llama-stack -q

Inference using OpenAI Compatible API

Then start Llama Stack server pointing to your vLLM server with the following configuration:

inference:
  - provider_id: vllm0
    provider_type: remote::vllm
    config:
      url: http://127.0.0.1:8000

Please refer to this guide for more details on this remote vLLM provider.

Inference via Embedded vLLM

An inline vLLM provider is also available. This is a sample of configuration using that method:

inference
  - provider_type: vllm
    config:
      model: Llama3.1-8B-Instruct
      tensor_parallel_size: 4