[Bugfix] Fix failure to launch in Tensor Parallel TP mode on macOS. (#14948)
Signed-off-by: Kebe <mail@kebe7jun.com> Signed-off-by: youkaichao <youkaichao@gmail.com> Co-authored-by: youkaichao <youkaichao@gmail.com>
This commit is contained in:
parent
726efc6a32
commit
4e0f6076be
@ -24,7 +24,7 @@ This document describes how vLLM deals with these challenges.
|
|||||||
[Python multiprocessing methods](https://docs.python.org/3/library/multiprocessing.html#contexts-and-start-methods) include:
|
[Python multiprocessing methods](https://docs.python.org/3/library/multiprocessing.html#contexts-and-start-methods) include:
|
||||||
|
|
||||||
- `spawn` - spawn a new Python process. This will be the default as of Python
|
- `spawn` - spawn a new Python process. This will be the default as of Python
|
||||||
3.14.
|
3.14. In macOS, this is already the default.
|
||||||
|
|
||||||
- `fork` - Use `os.fork()` to fork the Python interpreter. This is the default
|
- `fork` - Use `os.fork()` to fork the Python interpreter. This is the default
|
||||||
in Python versions prior to 3.14.
|
in Python versions prior to 3.14.
|
||||||
@ -34,7 +34,7 @@ This document describes how vLLM deals with these challenges.
|
|||||||
### Tradeoffs
|
### Tradeoffs
|
||||||
|
|
||||||
`fork` is the fastest method, but is incompatible with dependencies that use
|
`fork` is the fastest method, but is incompatible with dependencies that use
|
||||||
threads.
|
threads. If you are under macOS, using `fork` may cause the process to crash.
|
||||||
|
|
||||||
`spawn` is more compatible with dependencies, but can be problematic when vLLM
|
`spawn` is more compatible with dependencies, but can be problematic when vLLM
|
||||||
is used as a library. If the consuming code does not use a `__main__` guard (`if
|
is used as a library. If the consuming code does not use a `__main__` guard (`if
|
||||||
|
@ -125,8 +125,13 @@ class ShmRingBuffer:
|
|||||||
lambda *args, **kwargs: None):
|
lambda *args, **kwargs: None):
|
||||||
try:
|
try:
|
||||||
self.shared_memory = shared_memory.SharedMemory(name=name)
|
self.shared_memory = shared_memory.SharedMemory(name=name)
|
||||||
assert (
|
# See https://docs.python.org/3/library/multiprocessing.shared_memory.html # noqa
|
||||||
self.shared_memory.size == self.total_bytes_of_buffer)
|
# Some platforms allocate memory based on page size,
|
||||||
|
# so the shared memory block size may be larger or equal
|
||||||
|
# to the requested size. The size parameter is ignored
|
||||||
|
# when attaching to an existing block.
|
||||||
|
assert (self.shared_memory.size
|
||||||
|
>= self.total_bytes_of_buffer)
|
||||||
except FileNotFoundError:
|
except FileNotFoundError:
|
||||||
# we might deserialize the object in a different node
|
# we might deserialize the object in a different node
|
||||||
# in this case, this object is not used,
|
# in this case, this object is not used,
|
||||||
|
@ -1,6 +1,7 @@
|
|||||||
# SPDX-License-Identifier: Apache-2.0
|
# SPDX-License-Identifier: Apache-2.0
|
||||||
|
|
||||||
import os
|
import os
|
||||||
|
import sys
|
||||||
from typing import TYPE_CHECKING, Optional
|
from typing import TYPE_CHECKING, Optional
|
||||||
|
|
||||||
import psutil
|
import psutil
|
||||||
@ -148,6 +149,13 @@ class CpuPlatform(Platform):
|
|||||||
# To hint IPEX uses shared memory based AllReduce
|
# To hint IPEX uses shared memory based AllReduce
|
||||||
os.environ["LOCAL_WORLD_SIZE"] = str(
|
os.environ["LOCAL_WORLD_SIZE"] = str(
|
||||||
vllm_config.parallel_config.tensor_parallel_size)
|
vllm_config.parallel_config.tensor_parallel_size)
|
||||||
|
if sys.platform == "darwin" and \
|
||||||
|
envs.VLLM_WORKER_MULTIPROC_METHOD == "fork":
|
||||||
|
if os.environ.get('VLLM_WORKER_MULTIPROC_METHOD', None) is None:
|
||||||
|
logger.warning(
|
||||||
|
"Default to spawn method on MacOS. If this is not desired,"
|
||||||
|
" set VLLM_WORKER_MULTIPROC_METHOD to fork explicitly.")
|
||||||
|
os.environ['VLLM_WORKER_MULTIPROC_METHOD'] = 'spawn'
|
||||||
|
|
||||||
@classmethod
|
@classmethod
|
||||||
def is_pin_memory_available(cls) -> bool:
|
def is_pin_memory_available(cls) -> bool:
|
||||||
|
Loading…
x
Reference in New Issue
Block a user