[CI/Build] improve python-only dev setup (#9621)
Signed-off-by: Daniele Trifirò <dtrifiro@redhat.com> Signed-off-by: youkaichao <youkaichao@gmail.com> Co-authored-by: youkaichao <youkaichao@gmail.com>
This commit is contained in:
parent
82eb5ea8f3
commit
e4c34c23de
@ -21,7 +21,7 @@ You can install vLLM using pip:
|
|||||||
.. code-block:: console
|
.. code-block:: console
|
||||||
|
|
||||||
$ # (Recommended) Create a new conda environment.
|
$ # (Recommended) Create a new conda environment.
|
||||||
$ conda create -n myenv python=3.10 -y
|
$ conda create -n myenv python=3.12 -y
|
||||||
$ conda activate myenv
|
$ conda activate myenv
|
||||||
|
|
||||||
$ # Install vLLM with CUDA 12.1.
|
$ # Install vLLM with CUDA 12.1.
|
||||||
@ -89,45 +89,24 @@ Build from source
|
|||||||
Python-only build (without compilation)
|
Python-only build (without compilation)
|
||||||
---------------------------------------
|
---------------------------------------
|
||||||
|
|
||||||
If you only need to change Python code, you can simply build vLLM without compilation.
|
If you only need to change Python code, you can build and install vLLM without compilation. Using `pip's ``--editable`` flag <https://pip.pypa.io/en/stable/topics/local-project-installs/#editable-installs>`_, changes you make to the code will be reflected when you run vLLM:
|
||||||
|
|
||||||
The first step is to install the latest vLLM wheel:
|
|
||||||
|
|
||||||
.. code-block:: console
|
|
||||||
|
|
||||||
pip install https://vllm-wheels.s3.us-west-2.amazonaws.com/nightly/vllm-1.0.0.dev-cp38-abi3-manylinux1_x86_64.whl
|
|
||||||
|
|
||||||
You can find more information about vLLM's wheels `above <#install-the-latest-code>`_.
|
|
||||||
|
|
||||||
After verifying that the installation is successful, you can use `the following script <https://github.com/vllm-project/vllm/blob/main/python_only_dev.py>`_:
|
|
||||||
|
|
||||||
.. code-block:: console
|
.. code-block:: console
|
||||||
|
|
||||||
$ git clone https://github.com/vllm-project/vllm.git
|
$ git clone https://github.com/vllm-project/vllm.git
|
||||||
$ cd vllm
|
$ cd vllm
|
||||||
$ python python_only_dev.py
|
$ VLLM_USE_PRECOMPILED=1 pip install --editable .
|
||||||
|
|
||||||
The script will:
|
This will download the latest nightly wheel and use the compiled libraries from there in the install.
|
||||||
|
|
||||||
* Find the installed vLLM package in the current environment.
|
The ``VLLM_PRECOMPILED_WHEEL_LOCATION`` environment variable can be used instead of ``VLLM_USE_PRECOMPILED`` to specify a custom path or URL to the wheel file. For example, to use the `0.6.1.post1 PyPi wheel <https://pypi.org/project/vllm/#files>`_:
|
||||||
* Copy built files to the current directory.
|
|
||||||
* Rename the installed vLLM package.
|
|
||||||
* Symbolically link the current directory to the installed vLLM package.
|
|
||||||
|
|
||||||
Now, you can edit the Python code in the current directory, and the changes will be reflected when you run vLLM.
|
|
||||||
|
|
||||||
Once you have finished editing or want to install another vLLM wheel, you should exit the development environment using `the same script <https://github.com/vllm-project/vllm/blob/main/python_only_dev.py>`_ with the ``--quit-dev`` (or ``-q`` for short) flag:
|
|
||||||
|
|
||||||
.. code-block:: console
|
.. code-block:: console
|
||||||
|
|
||||||
$ python python_only_dev.py --quit-dev
|
$ export VLLM_PRECOMPILED_WHEEL_LOCATION=https://files.pythonhosted.org/packages/4a/4c/ee65ba33467a4c0de350ce29fbae39b9d0e7fcd887cc756fa993654d1228/vllm-0.6.3.post1-cp38-abi3-manylinux1_x86_64.whl
|
||||||
|
$ pip install --editable .
|
||||||
|
|
||||||
The ``--quit-dev`` flag will:
|
You can find more information about vLLM's wheels `above <#install-the-latest-code>`_.
|
||||||
|
|
||||||
* Remove the symbolic link from the current directory to the vLLM package.
|
|
||||||
* Restore the original vLLM package from the backup.
|
|
||||||
|
|
||||||
If you update the vLLM wheel and rebuild from the source to make further edits, you will need to repeat the `Python-only build <#python-only-build>`_ steps again.
|
|
||||||
|
|
||||||
.. note::
|
.. note::
|
||||||
|
|
||||||
@ -148,9 +127,13 @@ If you want to modify C++ or CUDA code, you'll need to build vLLM from source. T
|
|||||||
.. tip::
|
.. tip::
|
||||||
|
|
||||||
Building from source requires a lot of compilation. If you are building from source repeatedly, it's more efficient to cache the compilation results.
|
Building from source requires a lot of compilation. If you are building from source repeatedly, it's more efficient to cache the compilation results.
|
||||||
|
|
||||||
For example, you can install `ccache <https://github.com/ccache/ccache>`_ using ``conda install ccache`` or ``apt install ccache`` .
|
For example, you can install `ccache <https://github.com/ccache/ccache>`_ using ``conda install ccache`` or ``apt install ccache`` .
|
||||||
As long as ``which ccache`` command can find the ``ccache`` binary, it will be used automatically by the build system. After the first build, subsequent builds will be much faster.
|
As long as ``which ccache`` command can find the ``ccache`` binary, it will be used automatically by the build system. After the first build, subsequent builds will be much faster.
|
||||||
|
|
||||||
|
`sccache <https://github.com/mozilla/sccache>`_ works similarly to ``ccache``, but has the capability to utilize caching in remote storage environments.
|
||||||
|
The following environment variables can be set to configure the vLLM ``sccache`` remote: ``SCCACHE_BUCKET=vllm-build-sccache SCCACHE_REGION=us-west-2 SCCACHE_S3_NO_CREDENTIALS=1``. We also recommend setting ``SCCACHE_IDLE_TIMEOUT=0``.
|
||||||
|
|
||||||
|
|
||||||
Use an existing PyTorch installation
|
Use an existing PyTorch installation
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
@ -1,92 +1,14 @@
|
|||||||
# enable python only development
|
msg = """Old style python only build (without compilation) is deprecated, please check https://docs.vllm.ai/en/latest/getting_started/installation.html#python-only-build-without-compilation for the new way to do python only build (without compilation).
|
||||||
# copy compiled files to the current directory directly
|
|
||||||
|
|
||||||
import argparse
|
TL;DR:
|
||||||
import os
|
|
||||||
import shutil
|
|
||||||
import subprocess
|
|
||||||
import sys
|
|
||||||
import warnings
|
|
||||||
|
|
||||||
parser = argparse.ArgumentParser(
|
VLLM_USE_PRECOMPILED=1 pip install -e .
|
||||||
description="Development mode for python-only code")
|
|
||||||
parser.add_argument('-q',
|
|
||||||
'--quit-dev',
|
|
||||||
action='store_true',
|
|
||||||
help='Set the flag to quit development mode')
|
|
||||||
args = parser.parse_args()
|
|
||||||
|
|
||||||
# cannot directly `import vllm` , because it will try to
|
or
|
||||||
# import from the current directory
|
|
||||||
output = subprocess.run([sys.executable, "-m", "pip", "show", "vllm"],
|
|
||||||
capture_output=True)
|
|
||||||
|
|
||||||
assert output.returncode == 0, "vllm is not installed"
|
export VLLM_COMMIT=33f460b17a54acb3b6cc0b03f4a17876cff5eafd # use full commit hash from the main branch
|
||||||
|
export VLLM_PRECOMPILED_WHEEL_LOCATION=https://vllm-wheels.s3.us-west-2.amazonaws.com/${VLLM_COMMIT}/vllm-1.0.0.dev-cp38-abi3-manylinux1_x86_64.whl
|
||||||
|
pip install -e .
|
||||||
|
""" # noqa
|
||||||
|
|
||||||
text = output.stdout.decode("utf-8")
|
print(msg)
|
||||||
|
|
||||||
package_path = None
|
|
||||||
for line in text.split("\n"):
|
|
||||||
if line.startswith("Location: "):
|
|
||||||
package_path = line.split(": ")[1]
|
|
||||||
break
|
|
||||||
|
|
||||||
assert package_path is not None, "could not find package path"
|
|
||||||
|
|
||||||
cwd = os.getcwd()
|
|
||||||
|
|
||||||
assert cwd != package_path, "should not import from the current directory"
|
|
||||||
|
|
||||||
files_to_copy = [
|
|
||||||
"vllm/_C.abi3.so",
|
|
||||||
"vllm/_moe_C.abi3.so",
|
|
||||||
"vllm/vllm_flash_attn/vllm_flash_attn_c.abi3.so",
|
|
||||||
"vllm/vllm_flash_attn/flash_attn_interface.py",
|
|
||||||
"vllm/vllm_flash_attn/__init__.py",
|
|
||||||
# "vllm/_version.py", # not available in nightly wheels yet
|
|
||||||
]
|
|
||||||
|
|
||||||
# Try to create _version.py to avoid version related warning
|
|
||||||
# Refer to https://github.com/vllm-project/vllm/pull/8771
|
|
||||||
try:
|
|
||||||
from setuptools_scm import get_version
|
|
||||||
get_version(write_to="vllm/_version.py")
|
|
||||||
except ImportError:
|
|
||||||
warnings.warn(
|
|
||||||
"To avoid warnings related to vllm._version, "
|
|
||||||
"you should install setuptools-scm by `pip install setuptools-scm`",
|
|
||||||
stacklevel=2)
|
|
||||||
|
|
||||||
if not args.quit_dev:
|
|
||||||
for file in files_to_copy:
|
|
||||||
src = os.path.join(package_path, file)
|
|
||||||
dst = file
|
|
||||||
print(f"Copying {src} to {dst}")
|
|
||||||
shutil.copyfile(src, dst)
|
|
||||||
|
|
||||||
pre_built_vllm_path = os.path.join(package_path, "vllm")
|
|
||||||
tmp_path = os.path.join(package_path, "vllm_pre_built")
|
|
||||||
current_vllm_path = os.path.join(cwd, "vllm")
|
|
||||||
|
|
||||||
print(f"Renaming {pre_built_vllm_path} to {tmp_path} for backup")
|
|
||||||
shutil.copytree(pre_built_vllm_path, tmp_path)
|
|
||||||
shutil.rmtree(pre_built_vllm_path)
|
|
||||||
|
|
||||||
print(f"Linking {current_vllm_path} to {pre_built_vllm_path}")
|
|
||||||
os.symlink(current_vllm_path, pre_built_vllm_path)
|
|
||||||
else:
|
|
||||||
vllm_symlink_path = os.path.join(package_path, "vllm")
|
|
||||||
vllm_backup_path = os.path.join(package_path, "vllm_pre_built")
|
|
||||||
current_vllm_path = os.path.join(cwd, "vllm")
|
|
||||||
|
|
||||||
print(f"Unlinking {current_vllm_path} to {vllm_symlink_path}")
|
|
||||||
assert os.path.islink(
|
|
||||||
vllm_symlink_path
|
|
||||||
), f"not in dev mode: {vllm_symlink_path} is not a symbolic link"
|
|
||||||
assert current_vllm_path == os.readlink(
|
|
||||||
vllm_symlink_path
|
|
||||||
), "current directory is not the source code of package"
|
|
||||||
os.unlink(vllm_symlink_path)
|
|
||||||
|
|
||||||
print(f"Recovering backup from {vllm_backup_path} to {vllm_symlink_path}")
|
|
||||||
os.rename(vllm_backup_path, vllm_symlink_path)
|
|
||||||
|
83
setup.py
83
setup.py
@ -249,6 +249,74 @@ class cmake_build_ext(build_ext):
|
|||||||
self.copy_file(file, dst_file)
|
self.copy_file(file, dst_file)
|
||||||
|
|
||||||
|
|
||||||
|
class repackage_wheel(build_ext):
|
||||||
|
"""Extracts libraries and other files from an existing wheel."""
|
||||||
|
default_wheel = "https://vllm-wheels.s3.us-west-2.amazonaws.com/nightly/vllm-1.0.0.dev-cp38-abi3-manylinux1_x86_64.whl"
|
||||||
|
|
||||||
|
def run(self) -> None:
|
||||||
|
wheel_location = os.getenv("VLLM_PRECOMPILED_WHEEL_LOCATION",
|
||||||
|
self.default_wheel)
|
||||||
|
|
||||||
|
assert _is_cuda(
|
||||||
|
), "VLLM_USE_PRECOMPILED is only supported for CUDA builds"
|
||||||
|
|
||||||
|
import zipfile
|
||||||
|
|
||||||
|
if os.path.isfile(wheel_location):
|
||||||
|
wheel_path = wheel_location
|
||||||
|
print(f"Using existing wheel={wheel_path}")
|
||||||
|
else:
|
||||||
|
# Download the wheel from a given URL, assume
|
||||||
|
# the filename is the last part of the URL
|
||||||
|
wheel_filename = wheel_location.split("/")[-1]
|
||||||
|
|
||||||
|
import tempfile
|
||||||
|
|
||||||
|
# create a temporary directory to store the wheel
|
||||||
|
temp_dir = tempfile.mkdtemp(prefix="vllm-wheels")
|
||||||
|
wheel_path = os.path.join(temp_dir, wheel_filename)
|
||||||
|
|
||||||
|
print(f"Downloading wheel from {wheel_location} to {wheel_path}")
|
||||||
|
|
||||||
|
from urllib.request import urlretrieve
|
||||||
|
|
||||||
|
try:
|
||||||
|
urlretrieve(wheel_location, filename=wheel_path)
|
||||||
|
except Exception as e:
|
||||||
|
from setuptools.errors import SetupError
|
||||||
|
|
||||||
|
raise SetupError(
|
||||||
|
f"Failed to get vLLM wheel from {wheel_location}") from e
|
||||||
|
|
||||||
|
with zipfile.ZipFile(wheel_path) as wheel:
|
||||||
|
files_to_copy = [
|
||||||
|
"vllm/_C.abi3.so",
|
||||||
|
"vllm/_moe_C.abi3.so",
|
||||||
|
"vllm/vllm_flash_attn/vllm_flash_attn_c.abi3.so",
|
||||||
|
"vllm/vllm_flash_attn/flash_attn_interface.py",
|
||||||
|
"vllm/vllm_flash_attn/__init__.py",
|
||||||
|
# "vllm/_version.py", # not available in nightly wheels yet
|
||||||
|
]
|
||||||
|
file_members = filter(lambda x: x.filename in files_to_copy,
|
||||||
|
wheel.filelist)
|
||||||
|
|
||||||
|
for file in file_members:
|
||||||
|
print(f"Extracting and including {file.filename} "
|
||||||
|
"from existing wheel")
|
||||||
|
package_name = os.path.dirname(file.filename).replace("/", ".")
|
||||||
|
file_name = os.path.basename(file.filename)
|
||||||
|
|
||||||
|
if package_name not in package_data:
|
||||||
|
package_data[package_name] = []
|
||||||
|
|
||||||
|
wheel.extract(file)
|
||||||
|
if file_name.endswith(".py"):
|
||||||
|
# python files shouldn't be added to package_data
|
||||||
|
continue
|
||||||
|
|
||||||
|
package_data[package_name].append(file_name)
|
||||||
|
|
||||||
|
|
||||||
def _is_hpu() -> bool:
|
def _is_hpu() -> bool:
|
||||||
is_hpu_available = True
|
is_hpu_available = True
|
||||||
try:
|
try:
|
||||||
@ -403,6 +471,8 @@ def get_vllm_version() -> str:
|
|||||||
# skip this for source tarball, required for pypi
|
# skip this for source tarball, required for pypi
|
||||||
if "sdist" not in sys.argv:
|
if "sdist" not in sys.argv:
|
||||||
version += f"{sep}cu{cuda_version_str}"
|
version += f"{sep}cu{cuda_version_str}"
|
||||||
|
if envs.VLLM_USE_PRECOMPILED:
|
||||||
|
version += ".precompiled"
|
||||||
elif _is_hip():
|
elif _is_hip():
|
||||||
# Get the HIP version
|
# Get the HIP version
|
||||||
hipcc_version = get_hipcc_rocm_version()
|
hipcc_version = get_hipcc_rocm_version()
|
||||||
@ -514,13 +584,18 @@ if _build_custom_ops():
|
|||||||
package_data = {
|
package_data = {
|
||||||
"vllm": ["py.typed", "model_executor/layers/fused_moe/configs/*.json"]
|
"vllm": ["py.typed", "model_executor/layers/fused_moe/configs/*.json"]
|
||||||
}
|
}
|
||||||
if envs.VLLM_USE_PRECOMPILED:
|
|
||||||
ext_modules = []
|
|
||||||
package_data["vllm"].append("*.so")
|
|
||||||
|
|
||||||
if _no_device():
|
if _no_device():
|
||||||
ext_modules = []
|
ext_modules = []
|
||||||
|
|
||||||
|
if not ext_modules:
|
||||||
|
cmdclass = {}
|
||||||
|
else:
|
||||||
|
cmdclass = {
|
||||||
|
"build_ext":
|
||||||
|
repackage_wheel if envs.VLLM_USE_PRECOMPILED else cmake_build_ext
|
||||||
|
}
|
||||||
|
|
||||||
setup(
|
setup(
|
||||||
name="vllm",
|
name="vllm",
|
||||||
version=get_vllm_version(),
|
version=get_vllm_version(),
|
||||||
@ -557,7 +632,7 @@ setup(
|
|||||||
"audio": ["librosa", "soundfile"], # Required for audio processing
|
"audio": ["librosa", "soundfile"], # Required for audio processing
|
||||||
"video": ["decord"] # Required for video processing
|
"video": ["decord"] # Required for video processing
|
||||||
},
|
},
|
||||||
cmdclass={"build_ext": cmake_build_ext} if len(ext_modules) > 0 else {},
|
cmdclass=cmdclass,
|
||||||
package_data=package_data,
|
package_data=package_data,
|
||||||
entry_points={
|
entry_points={
|
||||||
"console_scripts": [
|
"console_scripts": [
|
||||||
|
@ -113,7 +113,8 @@ environment_variables: Dict[str, Callable[[], Any]] = {
|
|||||||
|
|
||||||
# If set, vllm will use precompiled binaries (*.so)
|
# If set, vllm will use precompiled binaries (*.so)
|
||||||
"VLLM_USE_PRECOMPILED":
|
"VLLM_USE_PRECOMPILED":
|
||||||
lambda: bool(os.environ.get("VLLM_USE_PRECOMPILED")),
|
lambda: bool(os.environ.get("VLLM_USE_PRECOMPILED")) or bool(
|
||||||
|
os.environ.get("VLLM_PRECOMPILED_WHEEL_LOCATION")),
|
||||||
|
|
||||||
# CMake build type
|
# CMake build type
|
||||||
# If not set, defaults to "Debug" or "RelWithDebInfo"
|
# If not set, defaults to "Debug" or "RelWithDebInfo"
|
||||||
|
Loading…
x
Reference in New Issue
Block a user