36 lines
1.1 KiB
Markdown
36 lines
1.1 KiB
Markdown
![]() |
# Installation
|
||
|
|
||
|
vLLM initially supports basic model inferencing and serving on x86 CPU platform, with data types FP32, FP16 and BF16.
|
||
|
|
||
|
## Requirements
|
||
|
|
||
|
- OS: Linux
|
||
|
- Compiler: `gcc/g++ >= 12.3.0` (optional, recommended)
|
||
|
- Instruction Set Architecture (ISA): AVX512 (optional, recommended)
|
||
|
|
||
|
## Set up using Python
|
||
|
|
||
|
### Pre-built wheels
|
||
|
|
||
|
### Build wheel from source
|
||
|
|
||
|
:::{include} build.inc.md
|
||
|
:::
|
||
|
|
||
|
```{note}
|
||
|
- AVX512_BF16 is an extension ISA provides native BF16 data type conversion and vector product instructions, will brings some performance improvement compared with pure AVX512. The CPU backend build script will check the host CPU flags to determine whether to enable AVX512_BF16.
|
||
|
- If you want to force enable AVX512_BF16 for the cross-compilation, please set environment variable `VLLM_CPU_AVX512BF16=1` before the building.
|
||
|
```
|
||
|
|
||
|
## Set up using Docker
|
||
|
|
||
|
### Pre-built images
|
||
|
|
||
|
### Build image from source
|
||
|
|
||
|
## Extra information
|
||
|
|
||
|
## Intel Extension for PyTorch
|
||
|
|
||
|
- [Intel Extension for PyTorch (IPEX)](https://github.com/intel/intel-extension-for-pytorch) extends PyTorch with up-to-date features optimizations for an extra performance boost on Intel hardware.
|