[Docs] Misc updates to TPU installation instructions (#10165)

2024-11-15 21:26:17 +00:00 · 2024-11-15 21:26:17 +00:00 · 4f168f69a3
commit 4f168f69a3
parent 3e8d14d8a1
1 changed files with 35 additions and 19 deletions
--- a/docs/source/getting_started/tpu-installation.rst
+++ b/docs/source/getting_started/tpu-installation.rst
@ -44,15 +44,18 @@ Requirements
 Provision Cloud TPUs
 ====================
-You can provision Cloud TPUs using the `Cloud TPU API <https://cloud.google.com/tpu/docs/reference/rest>`_` 
+You can provision Cloud TPUs using the `Cloud TPU API <https://cloud.google.com/tpu/docs/reference/rest>`_ 
-or the `queued resources <https://cloud.google.com/tpu/docs/queued-resources>`_` 
+or the `queued resources <https://cloud.google.com/tpu/docs/queued-resources>`_ 
-API. This section shows how to create TPUs using the queued resource API. 
+API. This section shows how to create TPUs using the queued resource API. For 
-For more information about using the Cloud TPU API, see `Create a Cloud TPU using the Create Node API <https://cloud.google.com/tpu/docs/managing-tpus-tpu-vm#create-node-api>`_. 
+more information about using the Cloud TPU API, see `Create a Cloud TPU using the Create Node API <https://cloud.google.com/tpu/docs/managing-tpus-tpu-vm#create-node-api>`_. 
-`Queued resources <https://cloud.devsite.corp.google.com/tpu/docs/queued-resources>`_
+Queued resources enable you to request Cloud TPU resources in a queued manner. 
-enable you to request Cloud TPU resources in a queued manner. When you request 
+When you request queued resources, the request is added to a queue maintained by 
-queued resources, the request is added to a queue maintained by the Cloud TPU 
+the Cloud TPU service. When the requested resource becomes available, it's 
-service. When the requested resource becomes available, it's assigned to your 
+assigned to your Google Cloud project for your immediate exclusive use. 
-Google Cloud project for your immediate exclusive use. 
+
 .. note::
   In all of the following commands, replace the ALL CAPS parameter names with 
   appropriate values. See the parameter descriptions table for more information.
 Provision a Cloud TPU with the queued resource API
 --------------------------------------------------
@ -68,6 +71,7 @@ Create a TPU v5e with 4 TPU chips:
    --runtime-version RUNTIME_VERSION \
    --service-account SERVICE_ACCOUNT
 .. list-table:: Parameter descriptions
    :header-rows: 1
@ -81,12 +85,13 @@ Create a TPU v5e with 4 TPU chips:
    * - PROJECT_ID
      - Your Google Cloud project
    * - ZONE
-      - The `zone <https://cloud.google.com/tpu/docs/regions-zones>`_ where you 
+      - The GCP zone where you want to create your Cloud TPU. The value you use 
-        want to create your Cloud TPU.
+        depends on the version of TPUs you are using. For more information, see 
        `TPU regions and zones <https://cloud.google.com/tpu/docs/regions-zones>`_ 
    * - ACCELERATOR_TYPE
-      - The TPU version you want to use. Specify the TPU version, followed by a 
+      - The TPU version you want to use. Specify the TPU version, for example 
-        '-' and the number of TPU cores. For example `v5e-4` specifies a v5e TPU 
+        `v5litepod-4` specifies a v5e TPU with 4 cores. For more information, 
-        with 4 cores. For more information, see `TPU versions <https://cloud.devsite.corp.google.com/tpu/docs/system-architecture-tpu-vm#versions>`_.
+        see `TPU versions <https://cloud.devsite.corp.google.com/tpu/docs/system-architecture-tpu-vm#versions>`_.
    * - RUNTIME_VERSION
      - The TPU VM runtime version to use. For more information see `TPU VM images <https://cloud.google.com/tpu/docs/runtimes>`_.
    * - SERVICE_ACCOUNT
@ -98,7 +103,15 @@ Connect to your TPU using SSH:
 .. code-block:: bash
-    gcloud compute tpus tpu-vm ssh TPU_NAME
+    gcloud compute tpus tpu-vm ssh TPU_NAME --zone ZONE
 Install Miniconda
 .. code-block:: bash
    wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
    bash Miniconda3-latest-Linux-x86_64.sh
    source ~/.bashrc
 Create and activate a Conda environment for vLLM:
@ -162,9 +175,11 @@ Run the Docker image with the following command:
 .. note::
-    Since TPU relies on XLA which requires static shapes, vLLM bucketizes the possible input shapes and compiles an XLA graph for each different shape.
+    Since TPU relies on XLA which requires static shapes, vLLM bucketizes the 
-    The compilation time may take 20~30 minutes in the first run.
+    possible input shapes and compiles an XLA graph for each shape. The 
-    However, the compilation time reduces to ~5 minutes afterwards because the XLA graphs are cached in the disk (in :code:`VLLM_XLA_CACHE_PATH` or :code:`~/.cache/vllm/xla_cache` by default).
+    compilation time may take 20~30 minutes in the first run. However, the 
    compilation time reduces to ~5 minutes afterwards because the XLA graphs are 
    cached in the disk (in :code:`VLLM_XLA_CACHE_PATH` or :code:`~/.cache/vllm/xla_cache` by default).
 .. tip::
@ -173,7 +188,8 @@ Run the Docker image with the following command:
    .. code-block:: console
        from torch._C import *  # noqa: F403
-        ImportError: libopenblas.so.0: cannot open shared object file: No such file or directory
+        ImportError: libopenblas.so.0: cannot open shared object file: No such 
        file or directory
    Install OpenBLAS with the following command: