vllm/docs/source/models/supported_models.rst

.. _supported_models:

Supported Models
================

vLLM supports a variety of generative Transformer models in `HuggingFace Transformers <https://huggingface.co/models>`_.
The following is the list of model architectures that are currently supported by vLLM.
Alongside each architecture, we include some popular models that use it.

.. list-table::
  :widths: 25 25 50
  :header-rows: 1

  * - Architecture
    - Models
    - Example HuggingFace Models
  * - :code:`AquilaForCausalLM`
    - Aquila
    - :code:`BAAI/Aquila-7B`, :code:`BAAI/AquilaChat-7B`, etc.
  * - :code:`BaiChuanForCausalLM`
    - Baichuan
    - :code:`baichuan-inc/Baichuan2-13B-Chat`, :code:`baichuan-inc/Baichuan-7B`, etc.
  * - :code:`ChatGLMModel`
    - ChatGLM
    - :code:`THUDM/chatglm2-6b`, :code:`THUDM/chatglm3-6b`, etc.
  * - :code:`DeciLMForCausalLM`
    - DeciLM
    - :code:`Deci/DeciLM-7B`, :code:`Deci/DeciLM-7B-instruct`, etc.
  * - :code:`BloomForCausalLM`
    - BLOOM, BLOOMZ, BLOOMChat
    - :code:`bigscience/bloom`, :code:`bigscience/bloomz`, etc.
  * - :code:`FalconForCausalLM`
    - Falcon
    - :code:`tiiuae/falcon-7b`, :code:`tiiuae/falcon-40b`, :code:`tiiuae/falcon-rw-7b`, etc.
  * - :code:`GPT2LMHeadModel`
    - GPT-2
    - :code:`gpt2`, :code:`gpt2-xl`, etc.
  * - :code:`GPTBigCodeForCausalLM`
    - StarCoder, SantaCoder, WizardCoder
    - :code:`bigcode/starcoder`, :code:`bigcode/gpt_bigcode-santacoder`, :code:`WizardLM/WizardCoder-15B-V1.0`, etc.
  * - :code:`GPTJForCausalLM`
    - GPT-J
    - :code:`EleutherAI/gpt-j-6b`, :code:`nomic-ai/gpt4all-j`, etc.
  * - :code:`GPTNeoXForCausalLM`
    - GPT-NeoX, Pythia, OpenAssistant, Dolly V2, StableLM
    - :code:`EleutherAI/gpt-neox-20b`, :code:`EleutherAI/pythia-12b`, :code:`OpenAssistant/oasst-sft-4-pythia-12b-epoch-3.5`, :code:`databricks/dolly-v2-12b`, :code:`stabilityai/stablelm-tuned-alpha-7b`, etc.
  * - :code:`InternLMForCausalLM`
    - InternLM
    - :code:`internlm/internlm-7b`, :code:`internlm/internlm-chat-7b`, etc.
  * - :code:`LlamaForCausalLM`
    - LLaMA, LLaMA-2, Vicuna, Alpaca, Koala, Guanaco
    - :code:`meta-llama/Llama-2-13b-hf`, :code:`meta-llama/Llama-2-70b-hf`, :code:`openlm-research/open_llama_13b`, :code:`lmsys/vicuna-13b-v1.3`, :code:`young-geng/koala`, etc.
  * - :code:`MistralForCausalLM`
    - Mistral, Mistral-Instruct
    - :code:`mistralai/Mistral-7B-v0.1`, :code:`mistralai/Mistral-7B-Instruct-v0.1`, etc.
  * - :code:`MixtralForCausalLM`
    - Mixtral-8x7B, Mixtral-8x7B-Instruct
    - :code:`mistralai/Mixtral-8x7B-v0.1`, :code:`mistralai/Mixtral-8x7B-Instruct-v0.1`, etc.
  * - :code:`MPTForCausalLM`
    - MPT, MPT-Instruct, MPT-Chat, MPT-StoryWriter
    - :code:`mosaicml/mpt-7b`, :code:`mosaicml/mpt-7b-storywriter`, :code:`mosaicml/mpt-30b`, etc.
  * - :code:`OPTForCausalLM`
    - OPT, OPT-IML
    - :code:`facebook/opt-66b`, :code:`facebook/opt-iml-max-30b`, etc.
  * - :code:`PhiForCausalLM`
    - Phi
    - :code:`microsoft/phi-1_5`, :code:`microsoft/phi-2`, etc.
  * - :code:`QWenLMHeadModel`
    - Qwen
    - :code:`Qwen/Qwen-7B`, :code:`Qwen/Qwen-7B-Chat`, etc.
  * - :code:`Qwen2ForCausalLM`
    - Qwen2
    - :code:`Qwen/Qwen2-beta-7B`, :code:`Qwen/Qwen2-beta-7B-Chat`, etc.
  * - :code:`StableLMEpochForCausalLM`
    - StableLM
    - :code:`stabilityai/stablelm-3b-4e1t/` , :code:`stabilityai/stablelm-base-alpha-7b-v2`, etc.
  * - :code:`YiForCausalLM`
    - Yi
    - :code:`01-ai/Yi-6B`, :code:`01-ai/Yi-34B`, etc.

If your model uses one of the above model architectures, you can seamlessly run your model with vLLM.
Otherwise, please refer to :ref:`Adding a New Model <adding_a_new_model>` for instructions on how to implement support for your model.
Alternatively, you can raise an issue on our `GitHub <https://github.com/vllm-project/vllm/issues>`_ project.

.. note::
    Currently, the ROCm version of vLLM supports Mistral and Mixtral only for context lengths up to 4096.

.. tip::
    The easiest way to check if your model is supported is to run the program below:

    .. code-block:: python

        from vllm import LLM

        llm = LLM(model=...)  # Name or path of your model
        output = llm.generate("Hello, my name is")
        print(output)

    If vLLM successfully generates text, it indicates that your model is supported.

.. tip::
    To use models from `ModelScope <https://www.modelscope.cn>`_ instead of HuggingFace Hub, set an environment variable:

    .. code-block:: shell

       $ export VLLM_USE_MODELSCOPE=True

    And use with :code:`trust_remote_code=True`.

    .. code-block:: python

        from vllm import LLM

        llm = LLM(model=..., revision=..., trust_remote_code=True)  # Name or path of your model
        output = llm.generate("Hello, my name is")
        print(output)
Document supported models (#127) 2023-06-02 22:35:17 -07:00			`.. _supported_models:`

			`Supported Models`
			`================`

Write README and front page of doc (#147) 2023-06-18 03:19:38 -07:00			vLLM supports a variety of generative Transformer models in `HuggingFace Transformers <https://huggingface.co/models>`_.
Change the name to vLLM (#150) 2023-06-17 03:07:40 -07:00			`The following is the list of model architectures that are currently supported by vLLM.`
Document supported models (#127) 2023-06-02 22:35:17 -07:00			`Alongside each architecture, we include some popular models that use it.`

			`.. list-table::`
Add and list supported models in README (#161) 2023-06-20 10:57:46 +08:00			`:widths: 25 25 50`
Document supported models (#127) 2023-06-02 22:35:17 -07:00			`:header-rows: 1`

			`* - Architecture`
			`- Models`
Add and list supported models in README (#161) 2023-06-20 10:57:46 +08:00			`- Example HuggingFace Models`
Update Supported Model List (#825) 2023-08-22 11:51:44 -07:00			* - :code:`AquilaForCausalLM`
[Docs] Minor fixes in supported models (#920) * Minor fix in supported models * Add another small fix for Aquila model --------- Co-authored-by: Zhuohan Li <zhuohan123@gmail.com> 2023-09-01 08:28:39 +09:00			`- Aquila`
Update Supported Model List (#825) 2023-08-22 11:51:44 -07:00			- :code:`BAAI/Aquila-7B`, :code:`BAAI/AquilaChat-7B`, etc.
Add Baichuan-7B to README (#494) 2023-07-25 15:25:12 -07:00			* - :code:`BaiChuanForCausalLM`
[Doc] Add Baichuan 13B to supported models (#656) 2023-08-02 16:45:12 -07:00			`- Baichuan`
Normalize head weights for Baichuan 2 (#1876) 2023-11-30 20:03:58 -08:00			- :code:`baichuan-inc/Baichuan2-13B-Chat`, :code:`baichuan-inc/Baichuan-7B`, etc.
[Fix] Update Supported Models List (#1690) 2023-11-16 14:47:26 -08:00			* - :code:`ChatGLMModel`
			`- ChatGLM`
			- :code:`THUDM/chatglm2-6b`, :code:`THUDM/chatglm3-6b`, etc.
Added DeciLM-7b and DeciLM-7b-instruct (#2062) 2023-12-19 12:29:33 +02:00			* - :code:`DeciLMForCausalLM`
			`- DeciLM`
			- :code:`Deci/DeciLM-7B`, :code:`Deci/DeciLM-7B-instruct`, etc.
Add support for BLOOM (#331) 2023-07-03 13:12:35 -07:00			* - :code:`BloomForCausalLM`
			`- BLOOM, BLOOMZ, BLOOMChat`
			- :code:`bigscience/bloom`, :code:`bigscience/bloomz`, etc.
Add Falcon support (new) (#592) 2023-08-02 14:04:39 -07:00			* - :code:`FalconForCausalLM`
			`- Falcon`
fixing typo in `tiiuae/falcon-rw-7b` model name (#1226) 2023-09-29 23:40:25 +03:00			- :code:`tiiuae/falcon-7b`, :code:`tiiuae/falcon-40b`, :code:`tiiuae/falcon-rw-7b`, etc.
Document supported models (#127) 2023-06-02 22:35:17 -07:00			* - :code:`GPT2LMHeadModel`
			`- GPT-2`
Add and list supported models in README (#161) 2023-06-20 10:57:46 +08:00			- :code:`gpt2`, :code:`gpt2-xl`, etc.
[Docs] Add GPTBigCode to supported models (#213) 2023-06-22 15:05:11 -07:00			* - :code:`GPTBigCodeForCausalLM`
			`- StarCoder, SantaCoder, WizardCoder`
			- :code:`bigcode/starcoder`, :code:`bigcode/gpt_bigcode-santacoder`, :code:`WizardLM/WizardCoder-15B-V1.0`, etc.
[Model] Add support for GPT-J (#226) Co-authored-by: woWoosuk Kwon <woosuk.kwon@berkeley.edu> 2023-07-08 20:55:16 -04:00			* - :code:`GPTJForCausalLM`
			`- GPT-J`
			- :code:`EleutherAI/gpt-j-6b`, :code:`nomic-ai/gpt4all-j`, etc.
Document supported models (#127) 2023-06-02 22:35:17 -07:00			* - :code:`GPTNeoXForCausalLM`
			`- GPT-NeoX, Pythia, OpenAssistant, Dolly V2, StableLM`
Add and list supported models in README (#161) 2023-06-20 10:57:46 +08:00			- :code:`EleutherAI/gpt-neox-20b`, :code:`EleutherAI/pythia-12b`, :code:`OpenAssistant/oasst-sft-4-pythia-12b-epoch-3.5`, :code:`databricks/dolly-v2-12b`, :code:`stabilityai/stablelm-tuned-alpha-7b`, etc.
Update Supported Model List (#825) 2023-08-22 11:51:44 -07:00			* - :code:`InternLMForCausalLM`
			`- InternLM`
			- :code:`internlm/internlm-7b`, :code:`internlm/internlm-chat-7b`, etc.
Document supported models (#127) 2023-06-02 22:35:17 -07:00			* - :code:`LlamaForCausalLM`
Add support for LLaMA-2 (#505) 2023-07-20 11:38:27 -07:00			`- LLaMA, LLaMA-2, Vicuna, Alpaca, Koala, Guanaco`
[Docs] Minor fixes in supported models (#920) * Minor fix in supported models * Add another small fix for Aquila model --------- Co-authored-by: Zhuohan Li <zhuohan123@gmail.com> 2023-09-01 08:28:39 +09:00			- :code:`meta-llama/Llama-2-13b-hf`, :code:`meta-llama/Llama-2-70b-hf`, :code:`openlm-research/open_llama_13b`, :code:`lmsys/vicuna-13b-v1.3`, :code:`young-geng/koala`, etc.
Add Mistral to supported model list (#1221) 2023-09-28 14:33:04 -07:00			* - :code:`MistralForCausalLM`
			`- Mistral, Mistral-Instruct`
			- :code:`mistralai/Mistral-7B-v0.1`, :code:`mistralai/Mistral-7B-Instruct-v0.1`, etc.
Minor fixes for Mixtral (#2015) 2023-12-11 09:16:15 -08:00			* - :code:`MixtralForCausalLM`
			`- Mixtral-8x7B, Mixtral-8x7B-Instruct`
			- :code:`mistralai/Mixtral-8x7B-v0.1`, :code:`mistralai/Mixtral-8x7B-Instruct-v0.1`, etc.
[Docs] Fix typo (#346) 2023-07-03 16:51:47 -07:00			* - :code:`MPTForCausalLM`
[Model] Add support for MPT (#334) 2023-07-03 16:47:53 -07:00			`- MPT, MPT-Instruct, MPT-Chat, MPT-StoryWriter`
			- :code:`mosaicml/mpt-7b`, :code:`mosaicml/mpt-7b-storywriter`, :code:`mosaicml/mpt-30b`, etc.
Document supported models (#127) 2023-06-02 22:35:17 -07:00			* - :code:`OPTForCausalLM`
			`- OPT, OPT-IML`
Add and list supported models in README (#161) 2023-06-20 10:57:46 +08:00			- :code:`facebook/opt-66b`, :code:`facebook/opt-iml-max-30b`, etc.
[Fix] Update Supported Models List (#1690) 2023-11-16 14:47:26 -08:00			* - :code:`PhiForCausalLM`
[Minor] Add Phi 2 to supported models (#2159) 2023-12-17 02:54:57 -08:00			`- Phi`
			- :code:`microsoft/phi-1_5`, :code:`microsoft/phi-2`, etc.
[Docs] Minor fixes in supported models (#920) * Minor fix in supported models * Add another small fix for Aquila model --------- Co-authored-by: Zhuohan Li <zhuohan123@gmail.com> 2023-09-01 08:28:39 +09:00			* - :code:`QWenLMHeadModel`
Update Supported Model List (#825) 2023-08-22 11:51:44 -07:00			`- Qwen`
			- :code:`Qwen/Qwen-7B`, :code:`Qwen/Qwen-7B-Chat`, etc.
Add qwen2 (#2495) 2024-01-23 06:34:21 +08:00			* - :code:`Qwen2ForCausalLM`
			`- Qwen2`
fix names and license for Qwen2 (#2589) 2024-01-25 14:37:51 +08:00			- :code:`Qwen/Qwen2-beta-7B`, :code:`Qwen/Qwen2-beta-7B-Chat`, etc.
Fix the syntax error in the doc of supported_models (#2584) 2024-01-25 03:22:51 +08:00			* - :code:`StableLMEpochForCausalLM`
Add StableLM3B model (#2372) 2024-01-17 13:32:40 +09:00			`- StableLM`
			- :code:`stabilityai/stablelm-3b-4e1t/` , :code:`stabilityai/stablelm-base-alpha-7b-v2`, etc.
[Fix] Update Supported Models List (#1690) 2023-11-16 14:47:26 -08:00			* - :code:`YiForCausalLM`
			`- Yi`
			- :code:`01-ai/Yi-6B`, :code:`01-ai/Yi-34B`, etc.
Document supported models (#127) 2023-06-02 22:35:17 -07:00
Change the name to vLLM (#150) 2023-06-17 03:07:40 -07:00			`If your model uses one of the above model architectures, you can seamlessly run your model with vLLM.`
Document supported models (#127) 2023-06-02 22:35:17 -07:00			Otherwise, please refer to :ref:`Adding a New Model <adding_a_new_model>` for instructions on how to implement support for your model.
Fix repo & documentation URLs (#163) 2023-06-19 20:03:40 -07:00			Alternatively, you can raise an issue on our `GitHub <https://github.com/vllm-project/vllm/issues>`_ project.
Document supported models (#127) 2023-06-02 22:35:17 -07:00
[Docs] Add notes on ROCm-supported models (#2087) 2023-12-13 09:45:34 -08:00			`.. note::`
Optimize Mixtral with expert parallelism (#2090) 2023-12-13 23:55:07 -08:00			`Currently, the ROCm version of vLLM supports Mistral and Mixtral only for context lengths up to 4096.`
[Docs] Add notes on ROCm-supported models (#2087) 2023-12-13 09:45:34 -08:00
Document supported models (#127) 2023-06-02 22:35:17 -07:00			`.. tip::`
			`The easiest way to check if your model is supported is to run the program below:`

			`.. code-block:: python`

Change the name to vLLM (#150) 2023-06-17 03:07:40 -07:00			`from vllm import LLM`
Document supported models (#127) 2023-06-02 22:35:17 -07:00
			`llm = LLM(model=...) # Name or path of your model`
			`output = llm.generate("Hello, my name is")`
			`print(output)`

[Docs] Add notes on ROCm-supported models (#2087) 2023-12-13 09:45:34 -08:00			`If vLLM successfully generates text, it indicates that your model is supported.`

			`.. tip::`
[Docs] Fix broken links (#2222) 2023-12-20 22:43:42 +02:00			To use models from `ModelScope <https://www.modelscope.cn>`_ instead of HuggingFace Hub, set an environment variable:
Support download models from www.modelscope.cn (#1588) 2023-11-18 12:38:31 +08:00
			`.. code-block:: shell`

			`$ export VLLM_USE_MODELSCOPE=True`

[Docs] Add notes on ROCm-supported models (#2087) 2023-12-13 09:45:34 -08:00			And use with :code:`trust_remote_code=True`.

Support download models from www.modelscope.cn (#1588) 2023-11-18 12:38:31 +08:00			`.. code-block:: python`

			`from vllm import LLM`

			`llm = LLM(model=..., revision=..., trust_remote_code=True) # Name or path of your model`
			`output = llm.generate("Hello, my name is")`
			`print(output)`