vllm/tests/entrypoints/openai/test_models.py

# SPDX-License-Identifier: Apache-2.0

import openai  # use the official client for correctness check
import pytest
import pytest_asyncio
# downloading lora to test lora requests
from huggingface_hub import snapshot_download

from ...utils import RemoteOpenAIServer

# any model with a chat template should work here
MODEL_NAME = "HuggingFaceH4/zephyr-7b-beta"
# technically this needs Mistral-7B-v0.1 as base, but we're not testing
# generation quality here
LORA_NAME = "typeof/zephyr-7b-beta-lora"


@pytest.fixture(scope="module")
def zephyr_lora_files():
    return snapshot_download(repo_id=LORA_NAME)


@pytest.fixture(scope="module")
def server(zephyr_lora_files):
    args = [
        # use half precision for speed and memory savings in CI environment
        "--dtype",
        "bfloat16",
        "--max-model-len",
        "8192",
        "--enforce-eager",
        # lora config below
        "--enable-lora",
        "--lora-modules",
        f"zephyr-lora={zephyr_lora_files}",
        f"zephyr-lora2={zephyr_lora_files}",
        "--max-lora-rank",
        "64",
        "--max-cpu-loras",
        "2",
        "--max-num-seqs",
        "128",
    ]

    with RemoteOpenAIServer(MODEL_NAME, args) as remote_server:
        yield remote_server


@pytest_asyncio.fixture
async def client(server):
    async with server.get_async_client() as async_client:
        yield async_client


@pytest.mark.asyncio
async def test_check_models(client: openai.AsyncOpenAI, zephyr_lora_files):
    models = await client.models.list()
    models = models.data
    served_model = models[0]
    lora_models = models[1:]
    assert served_model.id == MODEL_NAME
    assert served_model.root == MODEL_NAME
    assert all(lora_model.root == zephyr_lora_files
               for lora_model in lora_models)
    assert lora_models[0].id == "zephyr-lora"
    assert lora_models[1].id == "zephyr-lora2"
[Misc] Add SPDX-License-Identifier headers to python source files (#12628) - Add SPDX license headers to python source files - Check for SPDX headers using pre-commit commit 9d7ef44c3cfb72ca4c32e1c677d99259d10d4745 Author: Russell Bryant <rbryant@redhat.com> Date: Fri Jan 31 14:18:24 2025 -0500 Add SPDX license headers to python source files This commit adds SPDX license headers to python source files as recommended to the project by the Linux Foundation. These headers provide a concise way that is both human and machine readable for communicating license information for each source file. It helps avoid any ambiguity about the license of the code and can also be easily used by tools to help manage license compliance. The Linux Foundation runs license scans against the codebase to help ensure we are in compliance with the licenses of the code we use, including dependencies. Having these headers in place helps that tool do its job. More information can be found on the SPDX site: - https://spdx.dev/learn/handling-license-info/ Signed-off-by: Russell Bryant <rbryant@redhat.com> commit 5a1cf1cb3b80759131c73f6a9dddebccac039dea Author: Russell Bryant <rbryant@redhat.com> Date: Fri Jan 31 14:36:32 2025 -0500 Check for SPDX headers using pre-commit Signed-off-by: Russell Bryant <rbryant@redhat.com> --------- Signed-off-by: Russell Bryant <rbryant@redhat.com> 2025-02-02 14:58:18 -05:00			`# SPDX-License-Identifier: Apache-2.0`

[CI] Try introducing isort. (#3495) 2024-03-25 23:59:47 +09:00			`import openai # use the official client for correctness check`
OpenAI Server refactoring (#2360) 2024-01-17 05:33:14 +00:00			`import pytest`
[Tests] Disable retries and use context manager for openai client (#7565) 2024-08-26 21:33:17 -07:00			`import pytest_asyncio`
Re-enable the 80 char line width limit (#3305) 2024-03-10 19:49:14 -07:00			`# downloading lora to test lora requests`
			`from huggingface_hub import snapshot_download`
Support logit bias for OpenAI API (#3027) 2024-02-26 19:51:53 -08:00
[ci] try to add multi-node tests (#6280) Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai> Co-authored-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai> 2024-07-12 21:51:48 -07:00			`from ...utils import RemoteOpenAIServer`
[CI/Build] Move `test_utils.py` to `tests/utils.py` (#4425) Since #4335 was merged, I've noticed that the definition of ServerRunner in the tests is the same as in the test for OpenAI API. I have moved the class to the test utilities to avoid code duplication. (Although it only has been repeated twice so far, I will add another similar test suite in #4200 which would duplicate the code a third time) Also, I have moved the test utilities file (test_utils.py) to under the test directory (tests/utils.py), since none of its code is actually used in the main package. Note that I have added __init__.py to each test subpackage and updated the ray.init() call in the test utilities file in order to relative import tests/utils.py. 2024-05-13 22:50:09 +08:00
Re-enable the 80 char line width limit (#3305) 2024-03-10 19:49:14 -07:00			`# any model with a chat template should work here`
			`MODEL_NAME = "HuggingFaceH4/zephyr-7b-beta"`
			`# technically this needs Mistral-7B-v0.1 as base, but we're not testing`
			`# generation quality here`
			`LORA_NAME = "typeof/zephyr-7b-beta-lora"`
OpenAI Server refactoring (#2360) 2024-01-17 05:33:14 +00:00

[CI/Build] [2/3] Reorganize entrypoints tests (#5904) 2024-06-28 22:59:18 +08:00			`@pytest.fixture(scope="module")`
multi-LoRA as extra models in OpenAI server (#2775) how to serve the loras (mimicking the [multilora inference example](https://github.com/vllm-project/vllm/blob/main/examples/multilora_inference.py)): ```terminal $ export LORA_PATH=~/.cache/huggingface/hub/models--yard1--llama-2-7b-sql-lora-test/ $ python -m vllm.entrypoints.api_server \ --model meta-llama/Llama-2-7b-hf \ --enable-lora \ --lora-modules sql-lora=$LORA_PATH sql-lora2=$LORA_PATH ``` the above server will list 3 separate values if the user queries `/models`: one for the base served model, and one each for the specified lora modules. in this case sql-lora and sql-lora2 point to the same underlying lora, but this need not be the case. lora config values take the same values they do in EngineArgs no work has been done here to scope client permissions to specific models 2024-02-17 15:00:48 -05:00			`def zephyr_lora_files():`
			`return snapshot_download(repo_id=LORA_NAME)`


[Model][Misc] Add e5-mistral-7b-instruct and Embedding API (#3734) 2024-05-11 11:30:37 -07:00			`@pytest.fixture(scope="module")`
[ci] try to add multi-node tests (#6280) Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai> Co-authored-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai> 2024-07-12 21:51:48 -07:00			`def server(zephyr_lora_files):`
[Doc][CI/Build] Update docs and tests to use `vllm serve` (#6431) 2024-07-17 15:43:21 +08:00			`args = [`
			`# use half precision for speed and memory savings in CI environment`
			`"--dtype",`
			`"bfloat16",`
			`"--max-model-len",`
			`"8192",`
			`"--enforce-eager",`
			`# lora config below`
			`"--enable-lora",`
			`"--lora-modules",`
			`f"zephyr-lora={zephyr_lora_files}",`
			`f"zephyr-lora2={zephyr_lora_files}",`
			`"--max-lora-rank",`
			`"64",`
			`"--max-cpu-loras",`
			`"2",`
			`"--max-num-seqs",`
			`"128",`
			`]`

			`with RemoteOpenAIServer(MODEL_NAME, args) as remote_server:`
[ci] try to add multi-node tests (#6280) Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai> Co-authored-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai> 2024-07-12 21:51:48 -07:00			`yield remote_server`
OpenAI Server refactoring (#2360) 2024-01-17 05:33:14 +00:00

[Tests] Disable retries and use context manager for openai client (#7565) 2024-08-26 21:33:17 -07:00			`@pytest_asyncio.fixture`
			`async def client(server):`
			`async with server.get_async_client() as async_client:`
			`yield async_client`
OpenAI Server refactoring (#2360) 2024-01-17 05:33:14 +00:00

[CI/Build] [1/3] Reorganize entrypoints tests (#5526) 2024-06-27 20:43:17 +08:00			`@pytest.mark.asyncio`
[Core] Support Lora lineage and base model metadata management (#6315) 2024-09-19 23:20:56 -07:00			`async def test_check_models(client: openai.AsyncOpenAI, zephyr_lora_files):`
multi-LoRA as extra models in OpenAI server (#2775) how to serve the loras (mimicking the [multilora inference example](https://github.com/vllm-project/vllm/blob/main/examples/multilora_inference.py)): ```terminal $ export LORA_PATH=~/.cache/huggingface/hub/models--yard1--llama-2-7b-sql-lora-test/ $ python -m vllm.entrypoints.api_server \ --model meta-llama/Llama-2-7b-hf \ --enable-lora \ --lora-modules sql-lora=$LORA_PATH sql-lora2=$LORA_PATH ``` the above server will list 3 separate values if the user queries `/models`: one for the base served model, and one each for the specified lora modules. in this case sql-lora and sql-lora2 point to the same underlying lora, but this need not be the case. lora config values take the same values they do in EngineArgs no work has been done here to scope client permissions to specific models 2024-02-17 15:00:48 -05:00			`models = await client.models.list()`
			`models = models.data`
			`served_model = models[0]`
			`lora_models = models[1:]`
			`assert served_model.id == MODEL_NAME`
[Core] Support Lora lineage and base model metadata management (#6315) 2024-09-19 23:20:56 -07:00			`assert served_model.root == MODEL_NAME`
			`assert all(lora_model.root == zephyr_lora_files`
			`for lora_model in lora_models)`
multi-LoRA as extra models in OpenAI server (#2775) how to serve the loras (mimicking the [multilora inference example](https://github.com/vllm-project/vllm/blob/main/examples/multilora_inference.py)): ```terminal $ export LORA_PATH=~/.cache/huggingface/hub/models--yard1--llama-2-7b-sql-lora-test/ $ python -m vllm.entrypoints.api_server \ --model meta-llama/Llama-2-7b-hf \ --enable-lora \ --lora-modules sql-lora=$LORA_PATH sql-lora2=$LORA_PATH ``` the above server will list 3 separate values if the user queries `/models`: one for the base served model, and one each for the specified lora modules. in this case sql-lora and sql-lora2 point to the same underlying lora, but this need not be the case. lora config values take the same values they do in EngineArgs no work has been done here to scope client permissions to specific models 2024-02-17 15:00:48 -05:00			`assert lora_models[0].id == "zephyr-lora"`
			`assert lora_models[1].id == "zephyr-lora2"`