vllm/tests/tpu/test_custom_dispatcher.py

import os

from vllm.compilation.levels import CompilationLevel

from ..utils import compare_two_settings

# --enforce-eager on TPU causes graph compilation
# this times out default Health Check in the MQLLMEngine,
# so we set the timeout here to 30s
os.environ["VLLM_RPC_TIMEOUT"] = "30000"


def test_custom_dispatcher():
    compare_two_settings(
        "google/gemma-2b",
        arg1=["--enforce-eager"],
        arg2=["--enforce-eager"],
        env1={"VLLM_TORCH_COMPILE_LEVEL": str(CompilationLevel.DYNAMO_ONCE)},
        env2={"VLLM_TORCH_COMPILE_LEVEL": str(CompilationLevel.DYNAMO_AS_IS)})
[Core][Bugfix][Perf] Introduce `MQLLMEngine` to avoid `asyncio` OH (#8157) Co-authored-by: Nick Hill <nickhill@us.ibm.com> Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com> Co-authored-by: Simon Mo <simon.mo@hey.com> 2024-09-18 09:56:58 -04:00			`import os`

[torch.compile] integration with compilation control (#9058) 2024-10-10 12:39:36 -07:00			`from vllm.compilation.levels import CompilationLevel`

[torch.compile] avoid Dynamo guard evaluation overhead (#7898) Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> 2024-08-28 16:10:12 -07:00			`from ..utils import compare_two_settings`

[Core][Bugfix][Perf] Introduce `MQLLMEngine` to avoid `asyncio` OH (#8157) Co-authored-by: Nick Hill <nickhill@us.ibm.com> Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com> Co-authored-by: Simon Mo <simon.mo@hey.com> 2024-09-18 09:56:58 -04:00			`# --enforce-eager on TPU causes graph compilation`
			`# this times out default Health Check in the MQLLMEngine,`
			`# so we set the timeout here to 30s`
			`os.environ["VLLM_RPC_TIMEOUT"] = "30000"`

[torch.compile] avoid Dynamo guard evaluation overhead (#7898) Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> 2024-08-28 16:10:12 -07:00
			`def test_custom_dispatcher():`
[torch.compile] integration with compilation control (#9058) 2024-10-10 12:39:36 -07:00			`compare_two_settings(`
			`"google/gemma-2b",`
			`arg1=["--enforce-eager"],`
			`arg2=["--enforce-eager"],`
			`env1={"VLLM_TORCH_COMPILE_LEVEL": str(CompilationLevel.DYNAMO_ONCE)},`
			`env2={"VLLM_TORCH_COMPILE_LEVEL": str(CompilationLevel.DYNAMO_AS_IS)})`