[CI/Build] improve python-only dev setup (#9621)

Signed-off-by: Daniele Trifirò <dtrifiro@redhat.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
This commit is contained in:
Daniele 2024-12-04 22:48:13 +01:00 committed by GitHub
parent 82eb5ea8f3
commit e4c34c23de
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
4 changed files with 102 additions and 121 deletions

View File

@ -21,7 +21,7 @@ You can install vLLM using pip:
.. code-block:: console
$ # (Recommended) Create a new conda environment.
$ conda create -n myenv python=3.10 -y
$ conda create -n myenv python=3.12 -y
$ conda activate myenv
$ # Install vLLM with CUDA 12.1.
@ -89,45 +89,24 @@ Build from source
Python-only build (without compilation)
---------------------------------------
If you only need to change Python code, you can simply build vLLM without compilation.
The first step is to install the latest vLLM wheel:
.. code-block:: console
pip install https://vllm-wheels.s3.us-west-2.amazonaws.com/nightly/vllm-1.0.0.dev-cp38-abi3-manylinux1_x86_64.whl
You can find more information about vLLM's wheels `above <#install-the-latest-code>`_.
After verifying that the installation is successful, you can use `the following script <https://github.com/vllm-project/vllm/blob/main/python_only_dev.py>`_:
If you only need to change Python code, you can build and install vLLM without compilation. Using `pip's ``--editable`` flag <https://pip.pypa.io/en/stable/topics/local-project-installs/#editable-installs>`_, changes you make to the code will be reflected when you run vLLM:
.. code-block:: console
$ git clone https://github.com/vllm-project/vllm.git
$ cd vllm
$ python python_only_dev.py
$ VLLM_USE_PRECOMPILED=1 pip install --editable .
The script will:
This will download the latest nightly wheel and use the compiled libraries from there in the install.
* Find the installed vLLM package in the current environment.
* Copy built files to the current directory.
* Rename the installed vLLM package.
* Symbolically link the current directory to the installed vLLM package.
Now, you can edit the Python code in the current directory, and the changes will be reflected when you run vLLM.
Once you have finished editing or want to install another vLLM wheel, you should exit the development environment using `the same script <https://github.com/vllm-project/vllm/blob/main/python_only_dev.py>`_ with the ``--quit-dev`` (or ``-q`` for short) flag:
The ``VLLM_PRECOMPILED_WHEEL_LOCATION`` environment variable can be used instead of ``VLLM_USE_PRECOMPILED`` to specify a custom path or URL to the wheel file. For example, to use the `0.6.1.post1 PyPi wheel <https://pypi.org/project/vllm/#files>`_:
.. code-block:: console
$ python python_only_dev.py --quit-dev
$ export VLLM_PRECOMPILED_WHEEL_LOCATION=https://files.pythonhosted.org/packages/4a/4c/ee65ba33467a4c0de350ce29fbae39b9d0e7fcd887cc756fa993654d1228/vllm-0.6.3.post1-cp38-abi3-manylinux1_x86_64.whl
$ pip install --editable .
The ``--quit-dev`` flag will:
* Remove the symbolic link from the current directory to the vLLM package.
* Restore the original vLLM package from the backup.
If you update the vLLM wheel and rebuild from the source to make further edits, you will need to repeat the `Python-only build <#python-only-build>`_ steps again.
You can find more information about vLLM's wheels `above <#install-the-latest-code>`_.
.. note::
@ -148,9 +127,13 @@ If you want to modify C++ or CUDA code, you'll need to build vLLM from source. T
.. tip::
Building from source requires a lot of compilation. If you are building from source repeatedly, it's more efficient to cache the compilation results.
For example, you can install `ccache <https://github.com/ccache/ccache>`_ using ``conda install ccache`` or ``apt install ccache`` .
As long as ``which ccache`` command can find the ``ccache`` binary, it will be used automatically by the build system. After the first build, subsequent builds will be much faster.
`sccache <https://github.com/mozilla/sccache>`_ works similarly to ``ccache``, but has the capability to utilize caching in remote storage environments.
The following environment variables can be set to configure the vLLM ``sccache`` remote: ``SCCACHE_BUCKET=vllm-build-sccache SCCACHE_REGION=us-west-2 SCCACHE_S3_NO_CREDENTIALS=1``. We also recommend setting ``SCCACHE_IDLE_TIMEOUT=0``.
Use an existing PyTorch installation
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

View File

@ -1,92 +1,14 @@
# enable python only development
# copy compiled files to the current directory directly
msg = """Old style python only build (without compilation) is deprecated, please check https://docs.vllm.ai/en/latest/getting_started/installation.html#python-only-build-without-compilation for the new way to do python only build (without compilation).
import argparse
import os
import shutil
import subprocess
import sys
import warnings
TL;DR:
parser = argparse.ArgumentParser(
description="Development mode for python-only code")
parser.add_argument('-q',
'--quit-dev',
action='store_true',
help='Set the flag to quit development mode')
args = parser.parse_args()
VLLM_USE_PRECOMPILED=1 pip install -e .
# cannot directly `import vllm` , because it will try to
# import from the current directory
output = subprocess.run([sys.executable, "-m", "pip", "show", "vllm"],
capture_output=True)
or
assert output.returncode == 0, "vllm is not installed"
export VLLM_COMMIT=33f460b17a54acb3b6cc0b03f4a17876cff5eafd # use full commit hash from the main branch
export VLLM_PRECOMPILED_WHEEL_LOCATION=https://vllm-wheels.s3.us-west-2.amazonaws.com/${VLLM_COMMIT}/vllm-1.0.0.dev-cp38-abi3-manylinux1_x86_64.whl
pip install -e .
""" # noqa
text = output.stdout.decode("utf-8")
package_path = None
for line in text.split("\n"):
if line.startswith("Location: "):
package_path = line.split(": ")[1]
break
assert package_path is not None, "could not find package path"
cwd = os.getcwd()
assert cwd != package_path, "should not import from the current directory"
files_to_copy = [
"vllm/_C.abi3.so",
"vllm/_moe_C.abi3.so",
"vllm/vllm_flash_attn/vllm_flash_attn_c.abi3.so",
"vllm/vllm_flash_attn/flash_attn_interface.py",
"vllm/vllm_flash_attn/__init__.py",
# "vllm/_version.py", # not available in nightly wheels yet
]
# Try to create _version.py to avoid version related warning
# Refer to https://github.com/vllm-project/vllm/pull/8771
try:
from setuptools_scm import get_version
get_version(write_to="vllm/_version.py")
except ImportError:
warnings.warn(
"To avoid warnings related to vllm._version, "
"you should install setuptools-scm by `pip install setuptools-scm`",
stacklevel=2)
if not args.quit_dev:
for file in files_to_copy:
src = os.path.join(package_path, file)
dst = file
print(f"Copying {src} to {dst}")
shutil.copyfile(src, dst)
pre_built_vllm_path = os.path.join(package_path, "vllm")
tmp_path = os.path.join(package_path, "vllm_pre_built")
current_vllm_path = os.path.join(cwd, "vllm")
print(f"Renaming {pre_built_vllm_path} to {tmp_path} for backup")
shutil.copytree(pre_built_vllm_path, tmp_path)
shutil.rmtree(pre_built_vllm_path)
print(f"Linking {current_vllm_path} to {pre_built_vllm_path}")
os.symlink(current_vllm_path, pre_built_vllm_path)
else:
vllm_symlink_path = os.path.join(package_path, "vllm")
vllm_backup_path = os.path.join(package_path, "vllm_pre_built")
current_vllm_path = os.path.join(cwd, "vllm")
print(f"Unlinking {current_vllm_path} to {vllm_symlink_path}")
assert os.path.islink(
vllm_symlink_path
), f"not in dev mode: {vllm_symlink_path} is not a symbolic link"
assert current_vllm_path == os.readlink(
vllm_symlink_path
), "current directory is not the source code of package"
os.unlink(vllm_symlink_path)
print(f"Recovering backup from {vllm_backup_path} to {vllm_symlink_path}")
os.rename(vllm_backup_path, vllm_symlink_path)
print(msg)

View File

@ -249,6 +249,74 @@ class cmake_build_ext(build_ext):
self.copy_file(file, dst_file)
class repackage_wheel(build_ext):
"""Extracts libraries and other files from an existing wheel."""
default_wheel = "https://vllm-wheels.s3.us-west-2.amazonaws.com/nightly/vllm-1.0.0.dev-cp38-abi3-manylinux1_x86_64.whl"
def run(self) -> None:
wheel_location = os.getenv("VLLM_PRECOMPILED_WHEEL_LOCATION",
self.default_wheel)
assert _is_cuda(
), "VLLM_USE_PRECOMPILED is only supported for CUDA builds"
import zipfile
if os.path.isfile(wheel_location):
wheel_path = wheel_location
print(f"Using existing wheel={wheel_path}")
else:
# Download the wheel from a given URL, assume
# the filename is the last part of the URL
wheel_filename = wheel_location.split("/")[-1]
import tempfile
# create a temporary directory to store the wheel
temp_dir = tempfile.mkdtemp(prefix="vllm-wheels")
wheel_path = os.path.join(temp_dir, wheel_filename)
print(f"Downloading wheel from {wheel_location} to {wheel_path}")
from urllib.request import urlretrieve
try:
urlretrieve(wheel_location, filename=wheel_path)
except Exception as e:
from setuptools.errors import SetupError
raise SetupError(
f"Failed to get vLLM wheel from {wheel_location}") from e
with zipfile.ZipFile(wheel_path) as wheel:
files_to_copy = [
"vllm/_C.abi3.so",
"vllm/_moe_C.abi3.so",
"vllm/vllm_flash_attn/vllm_flash_attn_c.abi3.so",
"vllm/vllm_flash_attn/flash_attn_interface.py",
"vllm/vllm_flash_attn/__init__.py",
# "vllm/_version.py", # not available in nightly wheels yet
]
file_members = filter(lambda x: x.filename in files_to_copy,
wheel.filelist)
for file in file_members:
print(f"Extracting and including {file.filename} "
"from existing wheel")
package_name = os.path.dirname(file.filename).replace("/", ".")
file_name = os.path.basename(file.filename)
if package_name not in package_data:
package_data[package_name] = []
wheel.extract(file)
if file_name.endswith(".py"):
# python files shouldn't be added to package_data
continue
package_data[package_name].append(file_name)
def _is_hpu() -> bool:
is_hpu_available = True
try:
@ -403,6 +471,8 @@ def get_vllm_version() -> str:
# skip this for source tarball, required for pypi
if "sdist" not in sys.argv:
version += f"{sep}cu{cuda_version_str}"
if envs.VLLM_USE_PRECOMPILED:
version += ".precompiled"
elif _is_hip():
# Get the HIP version
hipcc_version = get_hipcc_rocm_version()
@ -514,13 +584,18 @@ if _build_custom_ops():
package_data = {
"vllm": ["py.typed", "model_executor/layers/fused_moe/configs/*.json"]
}
if envs.VLLM_USE_PRECOMPILED:
ext_modules = []
package_data["vllm"].append("*.so")
if _no_device():
ext_modules = []
if not ext_modules:
cmdclass = {}
else:
cmdclass = {
"build_ext":
repackage_wheel if envs.VLLM_USE_PRECOMPILED else cmake_build_ext
}
setup(
name="vllm",
version=get_vllm_version(),
@ -557,7 +632,7 @@ setup(
"audio": ["librosa", "soundfile"], # Required for audio processing
"video": ["decord"] # Required for video processing
},
cmdclass={"build_ext": cmake_build_ext} if len(ext_modules) > 0 else {},
cmdclass=cmdclass,
package_data=package_data,
entry_points={
"console_scripts": [

View File

@ -113,7 +113,8 @@ environment_variables: Dict[str, Callable[[], Any]] = {
# If set, vllm will use precompiled binaries (*.so)
"VLLM_USE_PRECOMPILED":
lambda: bool(os.environ.get("VLLM_USE_PRECOMPILED")),
lambda: bool(os.environ.get("VLLM_USE_PRECOMPILED")) or bool(
os.environ.get("VLLM_PRECOMPILED_WHEEL_LOCATION")),
# CMake build type
# If not set, defaults to "Debug" or "RelWithDebInfo"