[Docs] Add Docs on Limitations of VLM Support (#5383)
This commit is contained in:
parent
c5602f0baa
commit
856c990041
@ -92,6 +92,7 @@ autodoc_mock_imports = [
|
|||||||
"vllm._C",
|
"vllm._C",
|
||||||
"PIL",
|
"PIL",
|
||||||
"numpy",
|
"numpy",
|
||||||
|
'triton'
|
||||||
"tqdm",
|
"tqdm",
|
||||||
"tensorizer",
|
"tensorizer",
|
||||||
]
|
]
|
||||||
|
@ -16,6 +16,13 @@ The following :ref:`engine arguments <engine_args>` are specific to VLMs:
|
|||||||
:prog: -m vllm.entrypoints.openai.api_server
|
:prog: -m vllm.entrypoints.openai.api_server
|
||||||
:nodefaultconst:
|
:nodefaultconst:
|
||||||
|
|
||||||
|
.. important::
|
||||||
|
Currently, the support for vision language models on vLLM has the following limitations:
|
||||||
|
|
||||||
|
* Only single image input is supported per text prompt.
|
||||||
|
* Dynamic ``image_input_shape`` is not supported: the input image will be resized to the static ``image_input_shape``. This means model output might not exactly match the huggingface implementation.
|
||||||
|
We are continuously improving user & developer experience for VLMs. Please raise an issue on GitHub if you have any feedback or feature requests.
|
||||||
|
|
||||||
Offline Batched Inference
|
Offline Batched Inference
|
||||||
-------------------------
|
-------------------------
|
||||||
|
|
||||||
@ -31,7 +38,7 @@ To initialize a VLM, the aforementioned arguments must be passed to the ``LLM``
|
|||||||
image_feature_size=576,
|
image_feature_size=576,
|
||||||
)
|
)
|
||||||
|
|
||||||
For now, we only support a single image per text prompt. To pass an image to the model, note the following in :class:`vllm.inputs.PromptStrictInputs`:
|
To pass an image to the model, note the following in :class:`vllm.inputs.PromptStrictInputs`:
|
||||||
|
|
||||||
* ``prompt``: The prompt should have a number of ``<image>`` tokens equal to ``image_feature_size``.
|
* ``prompt``: The prompt should have a number of ``<image>`` tokens equal to ``image_feature_size``.
|
||||||
* ``multi_modal_data``: This should be an instance of :class:`~vllm.multimodal.image.ImagePixelData` or :class:`~vllm.multimodal.image.ImageFeatureData`.
|
* ``multi_modal_data``: This should be an instance of :class:`~vllm.multimodal.image.ImagePixelData` or :class:`~vllm.multimodal.image.ImageFeatureData`.
|
||||||
|
Loading…
x
Reference in New Issue
Block a user