20231088/vllm

Rafael Vasquez 32aa2059ad

[Docs] Convert rST to MyST (Markdown) (#11145 )

Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>

2024-12-23 22:35:38 +00:00

1.5 KiB

Raw Blame History

(installation-arm)=

Installation for ARM CPUs

vLLM has been adapted to work on ARM64 CPUs with NEON support, leveraging the CPU backend initially developed for the x86 platform. This guide provides installation instructions specific to ARM. For additional details on supported features, refer to the x86 platform documentation covering:

CPU backend inference capabilities
Relevant runtime environment variables
Performance optimization tips

ARM CPU backend currently supports Float32, FP16 and BFloat16 datatypes. Contents:

Requirements
Quick Start with Dockerfile
Building from Source

(arm-backend-requirements)=

Requirements

Operating System: Linux or macOS
Compiler: gcc/g++ >= 12.3.0 (optional, but recommended)
Instruction Set Architecture (ISA): NEON support is required

(arm-backend-quick-start-dockerfile)=

Quick Start with Dockerfile

You can quickly set up vLLM on ARM using Docker:

$ docker build -f Dockerfile.arm -t vllm-cpu-env --shm-size=4g .
$ docker run -it \
             --rm \
             --network=host \
             --cpuset-cpus=<cpu-id-list, optional> \
             --cpuset-mems=<memory-node, optional> \
             vllm-cpu-env

(build-arm-backend-from-source)=

Building from Source

To build vLLM from source on Ubuntu 22.04 or other Linux distributions, follow a similar process as with x86. Testing has been conducted on AWS Graviton3 instances for compatibility.

1.5 KiB Raw Blame History

Installation for ARM CPUs

Requirements

Quick Start with Dockerfile

Building from Source

1.5 KiB

Raw Blame History