[Docs] Misc updates to TPU installation instructions (#10165)
This commit is contained in:
parent
3e8d14d8a1
commit
4f168f69a3
@ -44,15 +44,18 @@ Requirements
|
|||||||
Provision Cloud TPUs
|
Provision Cloud TPUs
|
||||||
====================
|
====================
|
||||||
|
|
||||||
You can provision Cloud TPUs using the `Cloud TPU API <https://cloud.google.com/tpu/docs/reference/rest>`_`
|
You can provision Cloud TPUs using the `Cloud TPU API <https://cloud.google.com/tpu/docs/reference/rest>`_
|
||||||
or the `queued resources <https://cloud.google.com/tpu/docs/queued-resources>`_`
|
or the `queued resources <https://cloud.google.com/tpu/docs/queued-resources>`_
|
||||||
API. This section shows how to create TPUs using the queued resource API.
|
API. This section shows how to create TPUs using the queued resource API. For
|
||||||
For more information about using the Cloud TPU API, see `Create a Cloud TPU using the Create Node API <https://cloud.google.com/tpu/docs/managing-tpus-tpu-vm#create-node-api>`_.
|
more information about using the Cloud TPU API, see `Create a Cloud TPU using the Create Node API <https://cloud.google.com/tpu/docs/managing-tpus-tpu-vm#create-node-api>`_.
|
||||||
`Queued resources <https://cloud.devsite.corp.google.com/tpu/docs/queued-resources>`_
|
Queued resources enable you to request Cloud TPU resources in a queued manner.
|
||||||
enable you to request Cloud TPU resources in a queued manner. When you request
|
When you request queued resources, the request is added to a queue maintained by
|
||||||
queued resources, the request is added to a queue maintained by the Cloud TPU
|
the Cloud TPU service. When the requested resource becomes available, it's
|
||||||
service. When the requested resource becomes available, it's assigned to your
|
assigned to your Google Cloud project for your immediate exclusive use.
|
||||||
Google Cloud project for your immediate exclusive use.
|
|
||||||
|
.. note::
|
||||||
|
In all of the following commands, replace the ALL CAPS parameter names with
|
||||||
|
appropriate values. See the parameter descriptions table for more information.
|
||||||
|
|
||||||
Provision a Cloud TPU with the queued resource API
|
Provision a Cloud TPU with the queued resource API
|
||||||
--------------------------------------------------
|
--------------------------------------------------
|
||||||
@ -68,6 +71,7 @@ Create a TPU v5e with 4 TPU chips:
|
|||||||
--runtime-version RUNTIME_VERSION \
|
--runtime-version RUNTIME_VERSION \
|
||||||
--service-account SERVICE_ACCOUNT
|
--service-account SERVICE_ACCOUNT
|
||||||
|
|
||||||
|
|
||||||
.. list-table:: Parameter descriptions
|
.. list-table:: Parameter descriptions
|
||||||
:header-rows: 1
|
:header-rows: 1
|
||||||
|
|
||||||
@ -81,12 +85,13 @@ Create a TPU v5e with 4 TPU chips:
|
|||||||
* - PROJECT_ID
|
* - PROJECT_ID
|
||||||
- Your Google Cloud project
|
- Your Google Cloud project
|
||||||
* - ZONE
|
* - ZONE
|
||||||
- The `zone <https://cloud.google.com/tpu/docs/regions-zones>`_ where you
|
- The GCP zone where you want to create your Cloud TPU. The value you use
|
||||||
want to create your Cloud TPU.
|
depends on the version of TPUs you are using. For more information, see
|
||||||
|
`TPU regions and zones <https://cloud.google.com/tpu/docs/regions-zones>`_
|
||||||
* - ACCELERATOR_TYPE
|
* - ACCELERATOR_TYPE
|
||||||
- The TPU version you want to use. Specify the TPU version, followed by a
|
- The TPU version you want to use. Specify the TPU version, for example
|
||||||
'-' and the number of TPU cores. For example `v5e-4` specifies a v5e TPU
|
`v5litepod-4` specifies a v5e TPU with 4 cores. For more information,
|
||||||
with 4 cores. For more information, see `TPU versions <https://cloud.devsite.corp.google.com/tpu/docs/system-architecture-tpu-vm#versions>`_.
|
see `TPU versions <https://cloud.devsite.corp.google.com/tpu/docs/system-architecture-tpu-vm#versions>`_.
|
||||||
* - RUNTIME_VERSION
|
* - RUNTIME_VERSION
|
||||||
- The TPU VM runtime version to use. For more information see `TPU VM images <https://cloud.google.com/tpu/docs/runtimes>`_.
|
- The TPU VM runtime version to use. For more information see `TPU VM images <https://cloud.google.com/tpu/docs/runtimes>`_.
|
||||||
* - SERVICE_ACCOUNT
|
* - SERVICE_ACCOUNT
|
||||||
@ -98,7 +103,15 @@ Connect to your TPU using SSH:
|
|||||||
|
|
||||||
.. code-block:: bash
|
.. code-block:: bash
|
||||||
|
|
||||||
gcloud compute tpus tpu-vm ssh TPU_NAME
|
gcloud compute tpus tpu-vm ssh TPU_NAME --zone ZONE
|
||||||
|
|
||||||
|
Install Miniconda
|
||||||
|
|
||||||
|
.. code-block:: bash
|
||||||
|
|
||||||
|
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
|
||||||
|
bash Miniconda3-latest-Linux-x86_64.sh
|
||||||
|
source ~/.bashrc
|
||||||
|
|
||||||
Create and activate a Conda environment for vLLM:
|
Create and activate a Conda environment for vLLM:
|
||||||
|
|
||||||
@ -162,9 +175,11 @@ Run the Docker image with the following command:
|
|||||||
|
|
||||||
.. note::
|
.. note::
|
||||||
|
|
||||||
Since TPU relies on XLA which requires static shapes, vLLM bucketizes the possible input shapes and compiles an XLA graph for each different shape.
|
Since TPU relies on XLA which requires static shapes, vLLM bucketizes the
|
||||||
The compilation time may take 20~30 minutes in the first run.
|
possible input shapes and compiles an XLA graph for each shape. The
|
||||||
However, the compilation time reduces to ~5 minutes afterwards because the XLA graphs are cached in the disk (in :code:`VLLM_XLA_CACHE_PATH` or :code:`~/.cache/vllm/xla_cache` by default).
|
compilation time may take 20~30 minutes in the first run. However, the
|
||||||
|
compilation time reduces to ~5 minutes afterwards because the XLA graphs are
|
||||||
|
cached in the disk (in :code:`VLLM_XLA_CACHE_PATH` or :code:`~/.cache/vllm/xla_cache` by default).
|
||||||
|
|
||||||
.. tip::
|
.. tip::
|
||||||
|
|
||||||
@ -173,7 +188,8 @@ Run the Docker image with the following command:
|
|||||||
.. code-block:: console
|
.. code-block:: console
|
||||||
|
|
||||||
from torch._C import * # noqa: F403
|
from torch._C import * # noqa: F403
|
||||||
ImportError: libopenblas.so.0: cannot open shared object file: No such file or directory
|
ImportError: libopenblas.so.0: cannot open shared object file: No such
|
||||||
|
file or directory
|
||||||
|
|
||||||
|
|
||||||
Install OpenBLAS with the following command:
|
Install OpenBLAS with the following command:
|
||||||
|
Loading…
x
Reference in New Issue
Block a user