[Docs] Misc updates to TPU installation instructions (#10165)

This commit is contained in:
Michael Green 2024-11-15 21:26:17 +00:00 committed by GitHub
parent 3e8d14d8a1
commit 4f168f69a3
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

View File

@ -44,15 +44,18 @@ Requirements
Provision Cloud TPUs Provision Cloud TPUs
==================== ====================
You can provision Cloud TPUs using the `Cloud TPU API <https://cloud.google.com/tpu/docs/reference/rest>`_` You can provision Cloud TPUs using the `Cloud TPU API <https://cloud.google.com/tpu/docs/reference/rest>`_
or the `queued resources <https://cloud.google.com/tpu/docs/queued-resources>`_` or the `queued resources <https://cloud.google.com/tpu/docs/queued-resources>`_
API. This section shows how to create TPUs using the queued resource API. API. This section shows how to create TPUs using the queued resource API. For
For more information about using the Cloud TPU API, see `Create a Cloud TPU using the Create Node API <https://cloud.google.com/tpu/docs/managing-tpus-tpu-vm#create-node-api>`_. more information about using the Cloud TPU API, see `Create a Cloud TPU using the Create Node API <https://cloud.google.com/tpu/docs/managing-tpus-tpu-vm#create-node-api>`_.
`Queued resources <https://cloud.devsite.corp.google.com/tpu/docs/queued-resources>`_ Queued resources enable you to request Cloud TPU resources in a queued manner.
enable you to request Cloud TPU resources in a queued manner. When you request When you request queued resources, the request is added to a queue maintained by
queued resources, the request is added to a queue maintained by the Cloud TPU the Cloud TPU service. When the requested resource becomes available, it's
service. When the requested resource becomes available, it's assigned to your assigned to your Google Cloud project for your immediate exclusive use.
Google Cloud project for your immediate exclusive use.
.. note::
In all of the following commands, replace the ALL CAPS parameter names with
appropriate values. See the parameter descriptions table for more information.
Provision a Cloud TPU with the queued resource API Provision a Cloud TPU with the queued resource API
-------------------------------------------------- --------------------------------------------------
@ -68,6 +71,7 @@ Create a TPU v5e with 4 TPU chips:
--runtime-version RUNTIME_VERSION \ --runtime-version RUNTIME_VERSION \
--service-account SERVICE_ACCOUNT --service-account SERVICE_ACCOUNT
.. list-table:: Parameter descriptions .. list-table:: Parameter descriptions
:header-rows: 1 :header-rows: 1
@ -81,12 +85,13 @@ Create a TPU v5e with 4 TPU chips:
* - PROJECT_ID * - PROJECT_ID
- Your Google Cloud project - Your Google Cloud project
* - ZONE * - ZONE
- The `zone <https://cloud.google.com/tpu/docs/regions-zones>`_ where you - The GCP zone where you want to create your Cloud TPU. The value you use
want to create your Cloud TPU. depends on the version of TPUs you are using. For more information, see
`TPU regions and zones <https://cloud.google.com/tpu/docs/regions-zones>`_
* - ACCELERATOR_TYPE * - ACCELERATOR_TYPE
- The TPU version you want to use. Specify the TPU version, followed by a - The TPU version you want to use. Specify the TPU version, for example
'-' and the number of TPU cores. For example `v5e-4` specifies a v5e TPU `v5litepod-4` specifies a v5e TPU with 4 cores. For more information,
with 4 cores. For more information, see `TPU versions <https://cloud.devsite.corp.google.com/tpu/docs/system-architecture-tpu-vm#versions>`_. see `TPU versions <https://cloud.devsite.corp.google.com/tpu/docs/system-architecture-tpu-vm#versions>`_.
* - RUNTIME_VERSION * - RUNTIME_VERSION
- The TPU VM runtime version to use. For more information see `TPU VM images <https://cloud.google.com/tpu/docs/runtimes>`_. - The TPU VM runtime version to use. For more information see `TPU VM images <https://cloud.google.com/tpu/docs/runtimes>`_.
* - SERVICE_ACCOUNT * - SERVICE_ACCOUNT
@ -98,7 +103,15 @@ Connect to your TPU using SSH:
.. code-block:: bash .. code-block:: bash
gcloud compute tpus tpu-vm ssh TPU_NAME gcloud compute tpus tpu-vm ssh TPU_NAME --zone ZONE
Install Miniconda
.. code-block:: bash
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh
source ~/.bashrc
Create and activate a Conda environment for vLLM: Create and activate a Conda environment for vLLM:
@ -162,9 +175,11 @@ Run the Docker image with the following command:
.. note:: .. note::
Since TPU relies on XLA which requires static shapes, vLLM bucketizes the possible input shapes and compiles an XLA graph for each different shape. Since TPU relies on XLA which requires static shapes, vLLM bucketizes the
The compilation time may take 20~30 minutes in the first run. possible input shapes and compiles an XLA graph for each shape. The
However, the compilation time reduces to ~5 minutes afterwards because the XLA graphs are cached in the disk (in :code:`VLLM_XLA_CACHE_PATH` or :code:`~/.cache/vllm/xla_cache` by default). compilation time may take 20~30 minutes in the first run. However, the
compilation time reduces to ~5 minutes afterwards because the XLA graphs are
cached in the disk (in :code:`VLLM_XLA_CACHE_PATH` or :code:`~/.cache/vllm/xla_cache` by default).
.. tip:: .. tip::
@ -173,7 +188,8 @@ Run the Docker image with the following command:
.. code-block:: console .. code-block:: console
from torch._C import * # noqa: F403 from torch._C import * # noqa: F403
ImportError: libopenblas.so.0: cannot open shared object file: No such file or directory ImportError: libopenblas.so.0: cannot open shared object file: No such
file or directory
Install OpenBLAS with the following command: Install OpenBLAS with the following command: