(installation-neuron)= # Installation with Neuron vLLM 0.3.3 onwards supports model inferencing and serving on AWS Trainium/Inferentia with Neuron SDK with continuous batching. Paged Attention and Chunked Prefill are currently in development and will be available soon. Data types currently supported in Neuron SDK are FP16 and BF16. ## Requirements - OS: Linux - Python: 3.9 -- 3.11 - Accelerator: NeuronCore_v2 (in trn1/inf2 instances) - Pytorch 2.0.1/2.1.1 - AWS Neuron SDK 2.16/2.17 (Verified on python 3.8) Installation steps: - [Build from source](#build-from-source-neuron) - [Step 0. Launch Trn1/Inf2 instances](#launch-instances) - [Step 1. Install drivers and tools](#install-drivers) - [Step 2. Install transformers-neuronx and its dependencies](#install-tnx) - [Step 3. Install vLLM from source](#install-vllm) (build-from-source-neuron)= ```{note} The currently supported version of Pytorch for Neuron installs `triton` version `2.1.0`. This is incompatible with vLLM >= 0.5.3. You may see an error `cannot import name 'default_dump_dir...`. To work around this, run a `pip install --upgrade triton==3.0.0` after installing the vLLM wheel. ``` ## Build from source Following instructions are applicable to Neuron SDK 2.16 and beyond. (launch-instances)= ### Step 0. Launch Trn1/Inf2 instances Here are the steps to launch trn1/inf2 instances, in order to install [PyTorch Neuron ("torch-neuronx") Setup on Ubuntu 22.04 LTS](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/general/setup/neuron-setup/pytorch/neuronx/ubuntu/torch-neuronx-ubuntu22.html). - Please follow the instructions at [launch an Amazon EC2 Instance](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EC2_GetStarted.html#ec2-launch-instance) to launch an instance. When choosing the instance type at the EC2 console, please make sure to select the correct instance type. - To get more information about instances sizes and pricing see: [Trn1 web page](https://aws.amazon.com/ec2/instance-types/trn1/), [Inf2 web page](https://aws.amazon.com/ec2/instance-types/inf2/) - Select Ubuntu Server 22.04 TLS AMI - When launching a Trn1/Inf2, please adjust your primary EBS volume size to a minimum of 512GB. - After launching the instance, follow the instructions in [Connect to your instance](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/AccessingInstancesLinux.html) to connect to the instance (install-drivers)= ### Step 1. Install drivers and tools The installation of drivers and tools wouldn't be necessary, if [Deep Learning AMI Neuron](https://docs.aws.amazon.com/dlami/latest/devguide/appendix-ami-release-notes.html) is installed. In case the drivers and tools are not installed on the operating system, follow the steps below: ```console # Configure Linux for Neuron repository updates . /etc/os-release sudo tee /etc/apt/sources.list.d/neuron.list > /dev/null <