vllm/docs/source/api/multimodal/index.md

(multi-modality)=

# Multi-Modality

```{eval-rst}
.. currentmodule:: vllm.multimodal
```

vLLM provides experimental support for multi-modal models through the {mod}`vllm.multimodal` package.

Multi-modal inputs can be passed alongside text and token prompts to [supported models](#supported-mm-models)
via the `multi_modal_data` field in {class}`vllm.inputs.PromptType`.

Looking to add your own multi-modal model? Please follow the instructions listed [here](#enabling-multimodal-inputs).


## Module Contents

```{eval-rst}
.. automodule:: vllm.multimodal
```

### Registry

```{eval-rst}
.. autodata:: vllm.multimodal.MULTIMODAL_REGISTRY
```

```{eval-rst}
.. autoclass:: vllm.multimodal.MultiModalRegistry
    :members:
    :show-inheritance:
```

### Base Classes

```{eval-rst}
.. automodule:: vllm.multimodal.base
    :members:
    :show-inheritance:
```

### Input Classes

```{eval-rst}
.. automodule:: vllm.multimodal.inputs
    :members:
    :show-inheritance:
```

### Audio Classes

```{eval-rst}
.. automodule:: vllm.multimodal.audio
    :members:
    :show-inheritance:
```

### Image Classes

```{eval-rst}
.. automodule:: vllm.multimodal.image
    :members:
    :show-inheritance:
```

### Video Classes

```{eval-rst}
.. automodule:: vllm.multimodal.video
    :members:
    :show-inheritance:
```
[Docs] Convert rST to MyST (Markdown) (#11145) Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com> 2024-12-23 17:35:38 -05:00			`(multi-modality)=`
[Core] Dynamic image size support for VLMs (#5276) Signed-off-by: Xiaowei Jiang <xwjiang2010@gmail.com> Co-authored-by: Xiaowei Jiang <xwjiang2010@gmail.com> Co-authored-by: ywang96 <ywang@roblox.com> Co-authored-by: xwjiang2010 <87673679+xwjiang2010@users.noreply.github.com> Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com> 2024-07-03 11:34:00 +08:00
[Docs] Convert rST to MyST (Markdown) (#11145) Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com> 2024-12-23 17:35:38 -05:00			`# Multi-Modality`
[Core] Support image processor (#4197) 2024-06-03 13:56:41 +08:00
[Docs] Convert rST to MyST (Markdown) (#11145) Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com> 2024-12-23 17:35:38 -05:00			```{eval-rst}
[Core] Support image processor (#4197) 2024-06-03 13:56:41 +08:00			`.. currentmodule:: vllm.multimodal`
[Docs] Convert rST to MyST (Markdown) (#11145) Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com> 2024-12-23 17:35:38 -05:00			```
[Core] Support image processor (#4197) 2024-06-03 13:56:41 +08:00
[Docs] Convert rST to MyST (Markdown) (#11145) Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com> 2024-12-23 17:35:38 -05:00			vLLM provides experimental support for multi-modal models through the {mod}`vllm.multimodal` package.

			`Multi-modal inputs can be passed alongside text and token prompts to [supported models](#supported-mm-models)`
			via the `multi_modal_data` field in {class}`vllm.inputs.PromptType`.
[Core] Support image processor (#4197) 2024-06-03 13:56:41 +08:00
[Docs] Convert rST to MyST (Markdown) (#11145) Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com> 2024-12-23 17:35:38 -05:00			`Looking to add your own multi-modal model? Please follow the instructions listed [here](#enabling-multimodal-inputs).`
[VLM] Remove `image_input_type` from VLM config (#5852) Signed-off-by: Xiaowei Jiang <xwjiang2010@gmail.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: Roger Wang <ywang@roblox.com> 2024-07-02 00:57:09 -07:00
[Core] Dynamic image size support for VLMs (#5276) Signed-off-by: Xiaowei Jiang <xwjiang2010@gmail.com> Co-authored-by: Xiaowei Jiang <xwjiang2010@gmail.com> Co-authored-by: ywang96 <ywang@roblox.com> Co-authored-by: xwjiang2010 <87673679+xwjiang2010@users.noreply.github.com> Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com> 2024-07-03 11:34:00 +08:00
[Docs] Convert rST to MyST (Markdown) (#11145) Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com> 2024-12-23 17:35:38 -05:00			`## Module Contents`
[Core] Support image processor (#4197) 2024-06-03 13:56:41 +08:00
[Docs] Convert rST to MyST (Markdown) (#11145) Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com> 2024-12-23 17:35:38 -05:00			```{eval-rst}
[Core] Support image processor (#4197) 2024-06-03 13:56:41 +08:00			`.. automodule:: vllm.multimodal`
[Docs] Convert rST to MyST (Markdown) (#11145) Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com> 2024-12-23 17:35:38 -05:00			```
[Core] Support image processor (#4197) 2024-06-03 13:56:41 +08:00
[Docs] Convert rST to MyST (Markdown) (#11145) Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com> 2024-12-23 17:35:38 -05:00			`### Registry`
[Core] Support image processor (#4197) 2024-06-03 13:56:41 +08:00
[Docs] Convert rST to MyST (Markdown) (#11145) Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com> 2024-12-23 17:35:38 -05:00			```{eval-rst}
[Core] Registry for processing model inputs (#5214) Co-authored-by: ywang96 <ywang@roblox.com> 2024-06-28 20:09:56 +08:00			`.. autodata:: vllm.multimodal.MULTIMODAL_REGISTRY`
[Docs] Convert rST to MyST (Markdown) (#11145) Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com> 2024-12-23 17:35:38 -05:00			```
[Core] Support image processor (#4197) 2024-06-03 13:56:41 +08:00
[Docs] Convert rST to MyST (Markdown) (#11145) Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com> 2024-12-23 17:35:38 -05:00			```{eval-rst}
[Core] Support image processor (#4197) 2024-06-03 13:56:41 +08:00			`.. autoclass:: vllm.multimodal.MultiModalRegistry`
			`:members:`
			`:show-inheritance:`
[Docs] Convert rST to MyST (Markdown) (#11145) Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com> 2024-12-23 17:35:38 -05:00			```
[Core] Support image processor (#4197) 2024-06-03 13:56:41 +08:00
[Docs] Convert rST to MyST (Markdown) (#11145) Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com> 2024-12-23 17:35:38 -05:00			`### Base Classes`
[Core] Support image processor (#4197) 2024-06-03 13:56:41 +08:00
[Docs] Convert rST to MyST (Markdown) (#11145) Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com> 2024-12-23 17:35:38 -05:00			```{eval-rst}
[VLM] Support caching in merged multi-modal processor (#11396) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> 2024-12-28 01:22:48 +08:00			`.. automodule:: vllm.multimodal.base`
			`:members:`
			`:show-inheritance:`
[Docs] Convert rST to MyST (Markdown) (#11145) Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com> 2024-12-23 17:35:38 -05:00			```
[Model] Adding support for MiniCPM-V (#4087) 2024-07-25 11:59:30 +08:00
[VLM] Support caching in merged multi-modal processor (#11396) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> 2024-12-28 01:22:48 +08:00			`### Input Classes`
[Bugfix] Fix broadcasting logic for `multi_modal_kwargs` (#6836) 2024-07-31 10:38:45 +08:00
[Docs] Convert rST to MyST (Markdown) (#11145) Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com> 2024-12-23 17:35:38 -05:00			```{eval-rst}
[VLM] Support caching in merged multi-modal processor (#11396) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> 2024-12-28 01:22:48 +08:00			`.. automodule:: vllm.multimodal.inputs`
[Core] Support image processor (#4197) 2024-06-03 13:56:41 +08:00			`:members:`
			`:show-inheritance:`
[Docs] Convert rST to MyST (Markdown) (#11145) Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com> 2024-12-23 17:35:38 -05:00			```
[Core] Support image processor (#4197) 2024-06-03 13:56:41 +08:00
[VLM] Support caching in merged multi-modal processor (#11396) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> 2024-12-28 01:22:48 +08:00			`### Audio Classes`
[Doc] Guide for adding multi-modal plugins (#6205) 2024-07-10 14:55:34 +08:00
[Docs] Convert rST to MyST (Markdown) (#11145) Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com> 2024-12-23 17:35:38 -05:00			```{eval-rst}
[VLM] Support caching in merged multi-modal processor (#11396) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> 2024-12-28 01:22:48 +08:00			`.. automodule:: vllm.multimodal.audio`
[Core] Dynamic image size support for VLMs (#5276) Signed-off-by: Xiaowei Jiang <xwjiang2010@gmail.com> Co-authored-by: Xiaowei Jiang <xwjiang2010@gmail.com> Co-authored-by: ywang96 <ywang@roblox.com> Co-authored-by: xwjiang2010 <87673679+xwjiang2010@users.noreply.github.com> Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com> 2024-07-03 11:34:00 +08:00			`:members:`
			`:show-inheritance:`
[Docs] Convert rST to MyST (Markdown) (#11145) Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com> 2024-12-23 17:35:38 -05:00			```
[Core] Dynamic image size support for VLMs (#5276) Signed-off-by: Xiaowei Jiang <xwjiang2010@gmail.com> Co-authored-by: Xiaowei Jiang <xwjiang2010@gmail.com> Co-authored-by: ywang96 <ywang@roblox.com> Co-authored-by: xwjiang2010 <87673679+xwjiang2010@users.noreply.github.com> Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com> 2024-07-03 11:34:00 +08:00
[VLM] Support caching in merged multi-modal processor (#11396) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> 2024-12-28 01:22:48 +08:00			`### Image Classes`

[Docs] Convert rST to MyST (Markdown) (#11145) Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com> 2024-12-23 17:35:38 -05:00			```{eval-rst}
[VLM] Support caching in merged multi-modal processor (#11396) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> 2024-12-28 01:22:48 +08:00			`.. automodule:: vllm.multimodal.image`
[Core] Support image processor (#4197) 2024-06-03 13:56:41 +08:00			`:members:`
			`:show-inheritance:`
[Docs] Convert rST to MyST (Markdown) (#11145) Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com> 2024-12-23 17:35:38 -05:00			```
[Core] Support image processor (#4197) 2024-06-03 13:56:41 +08:00
[VLM] Support caching in merged multi-modal processor (#11396) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> 2024-12-28 01:22:48 +08:00			`### Video Classes`
[Core] Support image processor (#4197) 2024-06-03 13:56:41 +08:00
[Docs] Convert rST to MyST (Markdown) (#11145) Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com> 2024-12-23 17:35:38 -05:00			```{eval-rst}
[VLM] Support caching in merged multi-modal processor (#11396) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> 2024-12-28 01:22:48 +08:00			`.. automodule:: vllm.multimodal.video`
[Core] Support image processor (#4197) 2024-06-03 13:56:41 +08:00			`:members:`
			`:show-inheritance:`
[Docs] Convert rST to MyST (Markdown) (#11145) Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com> 2024-12-23 17:35:38 -05:00			```