vllm/docs/source/getting_started/tpu-installation.rst

.. _installation_tpu:

Installation with TPU
=====================

vLLM supports Google Cloud TPUs using PyTorch XLA.

Requirements
------------

* Google Cloud TPU VM (single & multi host)
* TPU versions: v5e, v5p, v4
* Python: 3.10

Installation options:

1. :ref:`Build a docker image with Dockerfile <build_docker_tpu>`.
2. :ref:`Build from source <build_from_source_tpu>`.

.. _build_docker_tpu:

Build a docker image with :code:`Dockerfile.tpu`
------------------------------------------------

`Dockerfile.tpu <https://github.com/vllm-project/vllm/blob/main/Dockerfile.tpu>`_ is provided to build a docker image with TPU support.

.. code-block:: console

    $ docker build -f Dockerfile.tpu -t vllm-tpu .


You can run the docker image with the following command:

.. code-block:: console

    $ # Make sure to add `--privileged --net host --shm-size=16G`.
    $ docker run --privileged --net host --shm-size=16G -it vllm-tpu


.. _build_from_source_tpu:

Build from source
-----------------

You can also build and install the TPU backend from source.

First, install the dependencies:

.. code-block:: console

    $ # (Recommended) Create a new conda environment.
    $ conda create -n myenv python=3.10 -y
    $ conda activate myenv

    $ # Clean up the existing torch and torch-xla packages.
    $ pip uninstall torch torch-xla -y

    $ # Install PyTorch and PyTorch XLA.
    $ export DATE="20240828"
    $ export TORCH_VERSION="2.5.0"
    $ pip install https://storage.googleapis.com/pytorch-xla-releases/wheels/tpuvm/torch-${TORCH_VERSION}.dev${DATE}-cp310-cp310-linux_x86_64.whl
    $ pip install https://storage.googleapis.com/pytorch-xla-releases/wheels/tpuvm/torch_xla-${TORCH_VERSION}.dev${DATE}-cp310-cp310-linux_x86_64.whl

    $ # Install JAX and Pallas.
    $ pip install torch_xla[tpu] -f https://storage.googleapis.com/libtpu-releases/index.html
    $ pip install torch_xla[pallas] -f https://storage.googleapis.com/jax-releases/jax_nightly_releases.html -f https://storage.googleapis.com/jax-releases/jaxlib_nightly_releases.html

    $ # Install other build dependencies.
    $ pip install -r requirements-tpu.txt


Next, build vLLM from source. This will only take a few seconds:

.. code-block:: console

    $ VLLM_TARGET_DEVICE="tpu" python setup.py develop


.. note::

    Since TPU relies on XLA which requires static shapes, vLLM bucketizes the possible input shapes and compiles an XLA graph for each different shape.
    The compilation time may take 20~30 minutes in the first run.
    However, the compilation time reduces to ~5 minutes afterwards because the XLA graphs are cached in the disk (in :code:`VLLM_XLA_CACHE_PATH` or :code:`~/.cache/vllm/xla_cache` by default).


.. tip::

    If you encounter the following error:

    .. code-block:: console

        from torch._C import *  # noqa: F403
        ImportError: libopenblas.so.0: cannot open shared object file: No such file or directory


    Please install OpenBLAS with the following command:

    .. code-block:: console

        $ sudo apt-get install libopenblas-base libopenmpi-dev libomp-dev
[Hardware] Initial TPU integration (#5292) 2024-06-12 11:53:03 -07:00			`.. _installation_tpu:`

			`Installation with TPU`
			`=====================`

			`vLLM supports Google Cloud TPUs using PyTorch XLA.`

			`Requirements`
			`------------`

[TPU] Support multi-host inference (#7457) 2024-08-13 16:31:20 -07:00			`* Google Cloud TPU VM (single & multi host)`
[Hardware] Initial TPU integration (#5292) 2024-06-12 11:53:03 -07:00			`* TPU versions: v5e, v5p, v4`
			`* Python: 3.10`

			`Installation options:`

			1. :ref:`Build a docker image with Dockerfile <build_docker_tpu>`.
			2. :ref:`Build from source <build_from_source_tpu>`.

			`.. _build_docker_tpu:`

			Build a docker image with :code:`Dockerfile.tpu`
			`------------------------------------------------`

			`Dockerfile.tpu <https://github.com/vllm-project/vllm/blob/main/Dockerfile.tpu>`_ is provided to build a docker image with TPU support.

			`.. code-block:: console`

			`$ docker build -f Dockerfile.tpu -t vllm-tpu .`


			`You can run the docker image with the following command:`

			`.. code-block:: console`

			$ # Make sure to add `--privileged --net host --shm-size=16G`.
			`$ docker run --privileged --net host --shm-size=16G -it vllm-tpu`


			`.. _build_from_source_tpu:`

			`Build from source`
			`-----------------`

			`You can also build and install the TPU backend from source.`

			`First, install the dependencies:`

			`.. code-block:: console`

			`$ # (Recommended) Create a new conda environment.`
			`$ conda create -n myenv python=3.10 -y`
			`$ conda activate myenv`

			`$ # Clean up the existing torch and torch-xla packages.`
			`$ pip uninstall torch torch-xla -y`

			`$ # Install PyTorch and PyTorch XLA.`
[TPU] Upgrade PyTorch XLA nightly (#7967) 2024-08-28 13:10:21 -07:00			`$ export DATE="20240828"`
			`$ export TORCH_VERSION="2.5.0"`
			`$ pip install https://storage.googleapis.com/pytorch-xla-releases/wheels/tpuvm/torch-${TORCH_VERSION}.dev${DATE}-cp310-cp310-linux_x86_64.whl`
[TPU][Bugfix] Use XLA rank for persistent cache path (#8137) 2024-09-03 18:35:33 -07:00			`$ pip install https://storage.googleapis.com/pytorch-xla-releases/wheels/tpuvm/torch_xla-${TORCH_VERSION}.dev${DATE}-cp310-cp310-linux_x86_64.whl`
[Hardware] Initial TPU integration (#5292) 2024-06-12 11:53:03 -07:00
			`$ # Install JAX and Pallas.`
			`$ pip install torch_xla[tpu] -f https://storage.googleapis.com/libtpu-releases/index.html`
			`$ pip install torch_xla[pallas] -f https://storage.googleapis.com/jax-releases/jax_nightly_releases.html -f https://storage.googleapis.com/jax-releases/jaxlib_nightly_releases.html`

			`$ # Install other build dependencies.`
[TPU] Use mark_dynamic to reduce compilation time (#7340) 2024-08-10 18:12:22 -07:00			`$ pip install -r requirements-tpu.txt`
[Hardware] Initial TPU integration (#5292) 2024-06-12 11:53:03 -07:00

			`Next, build vLLM from source. This will only take a few seconds:`

			`.. code-block:: console`

			`$ VLLM_TARGET_DEVICE="tpu" python setup.py develop`
[Docs][TPU] Add installation tip for TPU (#5761) 2024-06-21 23:09:40 -07:00

[TPU] Reduce compilation time & Upgrade PyTorch XLA version (#6856) 2024-07-27 10:28:33 -07:00			`.. note::`

			`Since TPU relies on XLA which requires static shapes, vLLM bucketizes the possible input shapes and compiles an XLA graph for each different shape.`
			`The compilation time may take 20~30 minutes in the first run.`
			However, the compilation time reduces to ~5 minutes afterwards because the XLA graphs are cached in the disk (in :code:`VLLM_XLA_CACHE_PATH` or :code:`~/.cache/vllm/xla_cache` by default).


[Docs][TPU] Add installation tip for TPU (#5761) 2024-06-21 23:09:40 -07:00			`.. tip::`

			`If you encounter the following error:`

			`.. code-block:: console`

			`from torch._C import * # noqa: F403`
			`ImportError: libopenblas.so.0: cannot open shared object file: No such file or directory`


[Hardware][TPU] Support MoE with Pallas GMM kernel (#6457) 2024-07-16 09:56:28 -07:00			`Please install OpenBLAS with the following command:`
[Docs][TPU] Add installation tip for TPU (#5761) 2024-06-21 23:09:40 -07:00
			`.. code-block:: console`

			`$ sudo apt-get install libopenblas-base libopenmpi-dev libomp-dev`