7 lines
498 B
ReStructuredText
7 lines
498 B
ReStructuredText
.. _deploying_with_triton:
|
|
|
|
Deploying with NVIDIA Triton
|
|
============================
|
|
|
|
The `Triton Inference Server <https://github.com/triton-inference-server>`_ hosts a tutorial demonstrating how to quickly deploy a simple `facebook/opt-125m <https://huggingface.co/facebook/opt-125m>`_ model using vLLM. Please see `Deploying a vLLM model in Triton <https://github.com/triton-inference-server/tutorials/blob/main/Quick_Deploy/vLLM/README.md#deploying-a-vllm-model-in-triton>`_ for more details.
|