vllm/docs/source/contributing/model/registration.md

(new-model-registration)=

# Registering a Model to vLLM

vLLM relies on a model registry to determine how to run each model.
A list of pre-registered architectures can be found [here](#supported-models).

If your model is not on this list, you must register it to vLLM.
This page provides detailed instructions on how to do so.

## Built-in models

To add a model directly to the vLLM library, start by forking our [GitHub repository](https://github.com/vllm-project/vllm) and then [build it from source](#build-from-source).
This gives you the ability to modify the codebase and test your model.

After you have implemented your model (see [tutorial](#new-model-basic)), put it into the <gh-dir:vllm/model_executor/models> directory.
Then, add your model class to `_VLLM_MODELS` in <gh-file:vllm/model_executor/models/registry.py> so that it is automatically registered upon importing vLLM.
Finally, update our [list of supported models](#supported-models) to promote your model!

:::{important}
The list of models in each section should be maintained in alphabetical order.
:::

## Out-of-tree models

You can load an external model using a plugin without modifying the vLLM codebase.

:::{seealso}
[vLLM's Plugin System](#plugin-system)
:::

To register the model, use the following code:

```python
from vllm import ModelRegistry
from your_code import YourModelForCausalLM
ModelRegistry.register_model("YourModelForCausalLM", YourModelForCausalLM)
```

If your model imports modules that initialize CUDA, consider lazy-importing it to avoid errors like `RuntimeError: Cannot re-initialize CUDA in forked subprocess`:

```python
from vllm import ModelRegistry

ModelRegistry.register_model("YourModelForCausalLM", "your_code:YourModelForCausalLM")
```

:::{important}
If your model is a multimodal model, ensure the model class implements the {class}`~vllm.model_executor.models.interfaces.SupportsMultiModal` interface.
Read more about that [here](#supports-multimodal).
:::

:::{note}
Although you can directly put these code snippets in your script using `vllm.LLM`, the recommended way is to place these snippets in a vLLM plugin. This ensures compatibility with various vLLM features like distributed inference and the API server.
:::
[Doc][2/N] Reorganize Models and Usage sections (#11755) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> 2025-01-06 21:40:31 +08:00			`(new-model-registration)=`

[Doc] Basic guide for writing unit tests for new models (#11951) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> 2025-01-11 21:27:24 +08:00			`# Registering a Model to vLLM`
[Doc][2/N] Reorganize Models and Usage sections (#11755) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> 2025-01-06 21:40:31 +08:00
			`vLLM relies on a model registry to determine how to run each model.`
[Doc][3/N] Reorganize Serving section (#11766) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> 2025-01-07 11:20:01 +08:00			`A list of pre-registered architectures can be found [here](#supported-models).`
[Doc][2/N] Reorganize Models and Usage sections (#11755) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> 2025-01-06 21:40:31 +08:00
			`If your model is not on this list, you must register it to vLLM.`
			`This page provides detailed instructions on how to do so.`

			`## Built-in models`

			`To add a model directly to the vLLM library, start by forking our [GitHub repository](https://github.com/vllm-project/vllm) and then [build it from source](#build-from-source).`
			`This gives you the ability to modify the codebase and test your model.`

			`After you have implemented your model (see [tutorial](#new-model-basic)), put it into the <gh-dir:vllm/model_executor/models> directory.`
			Then, add your model class to `_VLLM_MODELS` in <gh-file:vllm/model_executor/models/registry.py> so that it is automatically registered upon importing vLLM.
[Doc][3/N] Reorganize Serving section (#11766) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> 2025-01-07 11:20:01 +08:00			`Finally, update our [list of supported models](#supported-models) to promote your model!`
[Doc][2/N] Reorganize Models and Usage sections (#11755) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> 2025-01-06 21:40:31 +08:00
[Doc] Convert docs to use colon fences (#12471) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> 2025-01-29 03:38:29 +00:00			`:::{important}`
[Doc][2/N] Reorganize Models and Usage sections (#11755) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> 2025-01-06 21:40:31 +08:00			`The list of models in each section should be maintained in alphabetical order.`
[Doc] Convert docs to use colon fences (#12471) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> 2025-01-29 03:38:29 +00:00			`:::`
[Doc][2/N] Reorganize Models and Usage sections (#11755) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> 2025-01-06 21:40:31 +08:00
			`## Out-of-tree models`

			`You can load an external model using a plugin without modifying the vLLM codebase.`

[Doc] Convert docs to use colon fences (#12471) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> 2025-01-29 03:38:29 +00:00			`:::{seealso}`
[Doc][2/N] Reorganize Models and Usage sections (#11755) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> 2025-01-06 21:40:31 +08:00			`[vLLM's Plugin System](#plugin-system)`
[Doc] Convert docs to use colon fences (#12471) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> 2025-01-29 03:38:29 +00:00			`:::`
[Doc][2/N] Reorganize Models and Usage sections (#11755) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> 2025-01-06 21:40:31 +08:00
			`To register the model, use the following code:`

			```python
			`from vllm import ModelRegistry`
			`from your_code import YourModelForCausalLM`
			`ModelRegistry.register_model("YourModelForCausalLM", YourModelForCausalLM)`
			```

			If your model imports modules that initialize CUDA, consider lazy-importing it to avoid errors like `RuntimeError: Cannot re-initialize CUDA in forked subprocess`:

			```python
			`from vllm import ModelRegistry`

			`ModelRegistry.register_model("YourModelForCausalLM", "your_code:YourModelForCausalLM")`
			```

[Doc] Convert docs to use colon fences (#12471) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> 2025-01-29 03:38:29 +00:00			`:::{important}`
[Doc][2/N] Reorganize Models and Usage sections (#11755) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> 2025-01-06 21:40:31 +08:00			If your model is a multimodal model, ensure the model class implements the {class}`~vllm.model_executor.models.interfaces.SupportsMultiModal` interface.
[Doc] [1/N] Initial guide for merged multi-modal processor (#11925) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> 2025-01-10 22:30:25 +08:00			`Read more about that [here](#supports-multimodal).`
[Doc] Convert docs to use colon fences (#12471) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> 2025-01-29 03:38:29 +00:00			`:::`
[Doc][2/N] Reorganize Models and Usage sections (#11755) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> 2025-01-06 21:40:31 +08:00
[Doc] Convert docs to use colon fences (#12471) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> 2025-01-29 03:38:29 +00:00			`:::{note}`
[Doc][2/N] Reorganize Models and Usage sections (#11755) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> 2025-01-06 21:40:31 +08:00			Although you can directly put these code snippets in your script using `vllm.LLM`, the recommended way is to place these snippets in a vLLM plugin. This ensures compatibility with various vLLM features like distributed inference and the API server.
[Doc] Convert docs to use colon fences (#12471) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> 2025-01-29 03:38:29 +00:00			`:::`