20231088/vllm

[CPU][CI] Improve CPU Dockerfile (#15690 )

Signed-off-by: jiang1.li <jiang1.li@intel.com>

2025-03-28 01:36:31 -07:00

1.3 KiB

Raw Blame History

Installation

vLLM initially supports basic model inferencing and serving on x86 CPU platform, with data types FP32, FP16 and BF16.

:::{attention} There are no pre-built wheels or images for this device, so you must build vLLM from source. :::

Requirements

OS: Linux
Compiler: gcc/g++ >= 12.3.0 (optional, recommended)
Instruction Set Architecture (ISA): AVX512 (optional, recommended)

:::{tip} Intel Extension for PyTorch (IPEX) extends PyTorch with up-to-date features optimizations for an extra performance boost on Intel hardware. :::

Set up using Python

Pre-built wheels

Build wheel from source

:::{include} cpu/build.inc.md :::

:::{note}

AVX512_BF16 is an extension ISA provides native BF16 data type conversion and vector product instructions, which brings some performance improvement compared with pure AVX512. The CPU backend build script will check the host CPU flags to determine whether to enable AVX512_BF16.
If you want to force enable AVX512_BF16 for the cross-compilation, please set environment variable VLLM_CPU_AVX512BF16=1 before the building. :::

1.3 KiB

Raw Blame History

Installation

Requirements

Set up using Python

Pre-built wheels

Build wheel from source

Set up using Docker

Pre-built images

Build image from source

Extra information

1.3 KiB Raw Blame History

Installation

Requirements

Set up using Python

Pre-built wheels

Build wheel from source

Set up using Docker

Pre-built images

Build image from source

Extra information

1.3 KiB

Raw Blame History