vllm/vllm/__init__.py

# SPDX-License-Identifier: Apache-2.0
"""vLLM: a high-throughput and memory-efficient inference engine for LLMs"""
# The version.py should be independent library, and we always import the
# version library first.  Such assumption is critical for some customization.
from .version import __version__, __version_tuple__  # isort:skip

import os

import torch

from vllm.engine.arg_utils import AsyncEngineArgs, EngineArgs
from vllm.engine.async_llm_engine import AsyncLLMEngine
from vllm.engine.llm_engine import LLMEngine
from vllm.entrypoints.llm import LLM
from vllm.executor.ray_utils import initialize_ray_cluster
from vllm.inputs import PromptType, TextPrompt, TokensPrompt
from vllm.model_executor.models import ModelRegistry
from vllm.outputs import (ClassificationOutput, ClassificationRequestOutput,
                          CompletionOutput, EmbeddingOutput,
                          EmbeddingRequestOutput, PoolingOutput,
                          PoolingRequestOutput, RequestOutput, ScoringOutput,
                          ScoringRequestOutput)
from vllm.pooling_params import PoolingParams
from vllm.sampling_params import SamplingParams

# set some common config/environment variables that should be set
# for all processes created by vllm and all processes
# that interact with vllm workers.
# they are executed whenever `import vllm` is called.

# see https://github.com/NVIDIA/nccl/issues/1234
os.environ['NCCL_CUMEM_ENABLE'] = '0'

# see https://github.com/vllm-project/vllm/issues/10480
os.environ['TORCHINDUCTOR_COMPILE_THREADS'] = '1'
# see https://github.com/vllm-project/vllm/issues/10619
torch._inductor.config.compile_threads = 1

__all__ = [
    "__version__",
    "__version_tuple__",
    "LLM",
    "ModelRegistry",
    "PromptType",
    "TextPrompt",
    "TokensPrompt",
    "SamplingParams",
    "RequestOutput",
    "CompletionOutput",
    "PoolingOutput",
    "PoolingRequestOutput",
    "EmbeddingOutput",
    "EmbeddingRequestOutput",
    "ClassificationOutput",
    "ClassificationRequestOutput",
    "ScoringOutput",
    "ScoringRequestOutput",
    "LLMEngine",
    "EngineArgs",
    "AsyncLLMEngine",
    "AsyncEngineArgs",
    "initialize_ray_cluster",
    "PoolingParams",
]
[Misc] Add SPDX-License-Identifier headers to python source files (#12628) - Add SPDX license headers to python source files - Check for SPDX headers using pre-commit commit 9d7ef44c3cfb72ca4c32e1c677d99259d10d4745 Author: Russell Bryant <rbryant@redhat.com> Date: Fri Jan 31 14:18:24 2025 -0500 Add SPDX license headers to python source files This commit adds SPDX license headers to python source files as recommended to the project by the Linux Foundation. These headers provide a concise way that is both human and machine readable for communicating license information for each source file. It helps avoid any ambiguity about the license of the code and can also be easily used by tools to help manage license compliance. The Linux Foundation runs license scans against the codebase to help ensure we are in compliance with the licenses of the code we use, including dependencies. Having these headers in place helps that tool do its job. More information can be found on the SPDX site: - https://spdx.dev/learn/handling-license-info/ Signed-off-by: Russell Bryant <rbryant@redhat.com> commit 5a1cf1cb3b80759131c73f6a9dddebccac039dea Author: Russell Bryant <rbryant@redhat.com> Date: Fri Jan 31 14:36:32 2025 -0500 Check for SPDX headers using pre-commit Signed-off-by: Russell Bryant <rbryant@redhat.com> --------- Signed-off-by: Russell Bryant <rbryant@redhat.com> 2025-02-02 14:58:18 -05:00			`# SPDX-License-Identifier: Apache-2.0`
[Quality] Add code formatter and linter (#326) 2023-07-03 11:31:55 -07:00			`"""vLLM: a high-throughput and memory-efficient inference engine for LLMs"""`
[MISC] Always import version library first in the vllm package (#12979) Signed-off-by: Lu Fang <lufang@fb.com> 2025-02-09 02:56:40 -08:00			`# The version.py should be independent library, and we always import the`
			`# version library first. Such assumption is critical for some customization.`
			`from .version import __version__, __version_tuple__ # isort:skip`

[core][bugfix] configure env var during import vllm (#12209) Signed-off-by: youkaichao <youkaichao@gmail.com> 2025-01-20 19:35:59 +08:00			`import os`

			`import torch`
[Quality] Add code formatter and linter (#326) 2023-07-03 11:31:55 -07:00
[FIX] Make `flash_attn` optional (#3269) 2024-03-08 10:52:20 -08:00			`from vllm.engine.arg_utils import AsyncEngineArgs, EngineArgs`
			`from vllm.engine.async_llm_engine import AsyncLLMEngine`
			`from vllm.engine.llm_engine import LLMEngine`
			`from vllm.entrypoints.llm import LLM`
[Core] Move ray_utils.py from `engine` to `executor` package (#4347) 2024-04-24 23:52:22 -07:00			`from vllm.executor.ray_utils import initialize_ray_cluster`
[Core] rename`PromptInputs` and `inputs` (#8876) 2024-09-27 11:35:15 +08:00			`from vllm.inputs import PromptType, TextPrompt, TokensPrompt`
[Core] enable out-of-tree model register (#3871) 2024-04-06 17:11:41 -07:00			`from vllm.model_executor.models import ModelRegistry`
[Frontend] Separate pooling APIs in offline inference (#11129) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> 2024-12-13 18:40:07 +08:00			`from vllm.outputs import (ClassificationOutput, ClassificationRequestOutput,`
			`CompletionOutput, EmbeddingOutput,`
			`EmbeddingRequestOutput, PoolingOutput,`
			`PoolingRequestOutput, RequestOutput, ScoringOutput,`
			`ScoringRequestOutput)`
[Model][Misc] Add e5-mistral-7b-instruct and Embedding API (#3734) 2024-05-11 11:30:37 -07:00			`from vllm.pooling_params import PoolingParams`
[FIX] Make `flash_attn` optional (#3269) 2024-03-08 10:52:20 -08:00			`from vllm.sampling_params import SamplingParams`
Change the name to vLLM (#150) 2023-06-17 03:07:40 -07:00
[core][bugfix] configure env var during import vllm (#12209) Signed-off-by: youkaichao <youkaichao@gmail.com> 2025-01-20 19:35:59 +08:00			`# set some common config/environment variables that should be set`
			`# for all processes created by vllm and all processes`
			`# that interact with vllm workers.`
			# they are executed whenever `import vllm` is called.
[core] LLM.collective_rpc interface and RLHF example (#12084) Signed-off-by: youkaichao <youkaichao@gmail.com> 2025-01-16 20:19:52 +08:00
[core][bugfix] configure env var during import vllm (#12209) Signed-off-by: youkaichao <youkaichao@gmail.com> 2025-01-20 19:35:59 +08:00			`# see https://github.com/NVIDIA/nccl/issues/1234`
			`os.environ['NCCL_CUMEM_ENABLE'] = '0'`
[core] LLM.collective_rpc interface and RLHF example (#12084) Signed-off-by: youkaichao <youkaichao@gmail.com> 2025-01-16 20:19:52 +08:00
[core][bugfix] configure env var during import vllm (#12209) Signed-off-by: youkaichao <youkaichao@gmail.com> 2025-01-20 19:35:59 +08:00			`# see https://github.com/vllm-project/vllm/issues/10480`
			`os.environ['TORCHINDUCTOR_COMPILE_THREADS'] = '1'`
			`# see https://github.com/vllm-project/vllm/issues/10619`
			`torch._inductor.config.compile_threads = 1`
[core] LLM.collective_rpc interface and RLHF example (#12084) Signed-off-by: youkaichao <youkaichao@gmail.com> 2025-01-16 20:19:52 +08:00
Change the name to vLLM (#150) 2023-06-17 03:07:40 -07:00			`__all__ = [`
[Misc] Add vLLM version getter to utils (#5098) 2024-06-14 02:21:39 +08:00			`"__version__",`
[CI/Build] use setuptools-scm to set __version__ (#4738) Co-authored-by: youkaichao <youkaichao@126.com> 2024-09-23 18:44:26 +02:00			`"__version_tuple__",`
Change the name to vLLM (#150) 2023-06-17 03:07:40 -07:00			`"LLM",`
[Core] enable out-of-tree model register (#3871) 2024-04-06 17:11:41 -07:00			`"ModelRegistry",`
[Core] rename`PromptInputs` and `inputs` (#8876) 2024-09-27 11:35:15 +08:00			`"PromptType",`
[Core] Consolidate prompt arguments to LLM engines (#4328) Co-authored-by: Roger Wang <ywang@roblox.com> 2024-05-29 04:29:31 +08:00			`"TextPrompt",`
			`"TokensPrompt",`
Change the name to vLLM (#150) 2023-06-17 03:07:40 -07:00			`"SamplingParams",`
			`"RequestOutput",`
			`"CompletionOutput",`
[Misc] Rename embedding classes to pooling (#10801) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> 2024-12-01 14:36:51 +08:00			`"PoolingOutput",`
			`"PoolingRequestOutput",`
[Frontend] Separate pooling APIs in offline inference (#11129) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> 2024-12-13 18:40:07 +08:00			`"EmbeddingOutput",`
			`"EmbeddingRequestOutput",`
			`"ClassificationOutput",`
			`"ClassificationRequestOutput",`
			`"ScoringOutput",`
			`"ScoringRequestOutput",`
Change the name to vLLM (#150) 2023-06-17 03:07:40 -07:00			`"LLMEngine",`
			`"EngineArgs",`
			`"AsyncLLMEngine",`
			`"AsyncEngineArgs",`
Add distributed model executor abstraction (#3191) 2024-03-11 11:03:45 -07:00			`"initialize_ray_cluster",`
[Model][Misc] Add e5-mistral-7b-instruct and Embedding API (#3734) 2024-05-11 11:30:37 -07:00			`"PoolingParams",`
Change the name to vLLM (#150) 2023-06-17 03:07:40 -07:00			`]`