.. _run_on_llamastack: Serving with Llama Stack ============================ vLLM is also available via `Llama Stack `_ . To install Llama Stack, run .. code-block:: console $ pip install llama-stack -q Inference using OpenAI Compatible API ------------------------------------- Then start Llama Stack server pointing to your vLLM server with the following configuration: .. code-block:: yaml inference: - provider_id: vllm0 provider_type: remote::vllm config: url: http://127.0.0.1:8000 Please refer to `this guide `_ for more details on this remote vLLM provider. Inference via Embedded vLLM --------------------------- An `inline vLLM provider `_ is also available. This is a sample of configuration using that method: .. code-block:: yaml inference - provider_type: vllm config: model: Llama3.1-8B-Instruct tensor_parallel_size: 4