2023-06-02 22:35:17 -07:00
|
|
|
.. _supported_models:
|
|
|
|
|
|
|
|
Supported Models
|
|
|
|
================
|
|
|
|
|
2023-06-18 03:19:38 -07:00
|
|
|
vLLM supports a variety of generative Transformer models in `HuggingFace Transformers <https://huggingface.co/models>`_.
|
2023-06-17 03:07:40 -07:00
|
|
|
The following is the list of model architectures that are currently supported by vLLM.
|
2023-06-02 22:35:17 -07:00
|
|
|
Alongside each architecture, we include some popular models that use it.
|
|
|
|
|
|
|
|
.. list-table::
|
2023-06-20 10:57:46 +08:00
|
|
|
:widths: 25 25 50
|
2023-06-02 22:35:17 -07:00
|
|
|
:header-rows: 1
|
|
|
|
|
|
|
|
* - Architecture
|
|
|
|
- Models
|
2023-06-20 10:57:46 +08:00
|
|
|
- Example HuggingFace Models
|
2023-08-22 11:51:44 -07:00
|
|
|
* - :code:`AquilaForCausalLM`
|
2023-09-01 08:28:39 +09:00
|
|
|
- Aquila
|
2023-08-22 11:51:44 -07:00
|
|
|
- :code:`BAAI/Aquila-7B`, :code:`BAAI/AquilaChat-7B`, etc.
|
2023-07-25 15:25:12 -07:00
|
|
|
* - :code:`BaiChuanForCausalLM`
|
2023-08-02 16:45:12 -07:00
|
|
|
- Baichuan
|
2023-11-30 20:03:58 -08:00
|
|
|
- :code:`baichuan-inc/Baichuan2-13B-Chat`, :code:`baichuan-inc/Baichuan-7B`, etc.
|
2023-11-16 14:47:26 -08:00
|
|
|
* - :code:`ChatGLMModel`
|
|
|
|
- ChatGLM
|
|
|
|
- :code:`THUDM/chatglm2-6b`, :code:`THUDM/chatglm3-6b`, etc.
|
2023-12-19 12:29:33 +02:00
|
|
|
* - :code:`DeciLMForCausalLM`
|
|
|
|
- DeciLM
|
|
|
|
- :code:`Deci/DeciLM-7B`, :code:`Deci/DeciLM-7B-instruct`, etc.
|
2023-07-03 13:12:35 -07:00
|
|
|
* - :code:`BloomForCausalLM`
|
|
|
|
- BLOOM, BLOOMZ, BLOOMChat
|
|
|
|
- :code:`bigscience/bloom`, :code:`bigscience/bloomz`, etc.
|
2023-08-02 14:04:39 -07:00
|
|
|
* - :code:`FalconForCausalLM`
|
|
|
|
- Falcon
|
2023-09-29 23:40:25 +03:00
|
|
|
- :code:`tiiuae/falcon-7b`, :code:`tiiuae/falcon-40b`, :code:`tiiuae/falcon-rw-7b`, etc.
|
2023-06-02 22:35:17 -07:00
|
|
|
* - :code:`GPT2LMHeadModel`
|
|
|
|
- GPT-2
|
2023-06-20 10:57:46 +08:00
|
|
|
- :code:`gpt2`, :code:`gpt2-xl`, etc.
|
2023-06-22 15:05:11 -07:00
|
|
|
* - :code:`GPTBigCodeForCausalLM`
|
|
|
|
- StarCoder, SantaCoder, WizardCoder
|
|
|
|
- :code:`bigcode/starcoder`, :code:`bigcode/gpt_bigcode-santacoder`, :code:`WizardLM/WizardCoder-15B-V1.0`, etc.
|
2023-07-08 20:55:16 -04:00
|
|
|
* - :code:`GPTJForCausalLM`
|
|
|
|
- GPT-J
|
|
|
|
- :code:`EleutherAI/gpt-j-6b`, :code:`nomic-ai/gpt4all-j`, etc.
|
2023-06-02 22:35:17 -07:00
|
|
|
* - :code:`GPTNeoXForCausalLM`
|
|
|
|
- GPT-NeoX, Pythia, OpenAssistant, Dolly V2, StableLM
|
2023-06-20 10:57:46 +08:00
|
|
|
- :code:`EleutherAI/gpt-neox-20b`, :code:`EleutherAI/pythia-12b`, :code:`OpenAssistant/oasst-sft-4-pythia-12b-epoch-3.5`, :code:`databricks/dolly-v2-12b`, :code:`stabilityai/stablelm-tuned-alpha-7b`, etc.
|
2023-08-22 11:51:44 -07:00
|
|
|
* - :code:`InternLMForCausalLM`
|
|
|
|
- InternLM
|
|
|
|
- :code:`internlm/internlm-7b`, :code:`internlm/internlm-chat-7b`, etc.
|
2023-06-02 22:35:17 -07:00
|
|
|
* - :code:`LlamaForCausalLM`
|
2023-07-20 11:38:27 -07:00
|
|
|
- LLaMA, LLaMA-2, Vicuna, Alpaca, Koala, Guanaco
|
2023-09-01 08:28:39 +09:00
|
|
|
- :code:`meta-llama/Llama-2-13b-hf`, :code:`meta-llama/Llama-2-70b-hf`, :code:`openlm-research/open_llama_13b`, :code:`lmsys/vicuna-13b-v1.3`, :code:`young-geng/koala`, etc.
|
2023-09-28 14:33:04 -07:00
|
|
|
* - :code:`MistralForCausalLM`
|
|
|
|
- Mistral, Mistral-Instruct
|
|
|
|
- :code:`mistralai/Mistral-7B-v0.1`, :code:`mistralai/Mistral-7B-Instruct-v0.1`, etc.
|
2023-12-11 09:16:15 -08:00
|
|
|
* - :code:`MixtralForCausalLM`
|
|
|
|
- Mixtral-8x7B, Mixtral-8x7B-Instruct
|
|
|
|
- :code:`mistralai/Mixtral-8x7B-v0.1`, :code:`mistralai/Mixtral-8x7B-Instruct-v0.1`, etc.
|
2023-07-03 16:51:47 -07:00
|
|
|
* - :code:`MPTForCausalLM`
|
2023-07-03 16:47:53 -07:00
|
|
|
- MPT, MPT-Instruct, MPT-Chat, MPT-StoryWriter
|
|
|
|
- :code:`mosaicml/mpt-7b`, :code:`mosaicml/mpt-7b-storywriter`, :code:`mosaicml/mpt-30b`, etc.
|
2023-06-02 22:35:17 -07:00
|
|
|
* - :code:`OPTForCausalLM`
|
|
|
|
- OPT, OPT-IML
|
2023-06-20 10:57:46 +08:00
|
|
|
- :code:`facebook/opt-66b`, :code:`facebook/opt-iml-max-30b`, etc.
|
2023-11-16 14:47:26 -08:00
|
|
|
* - :code:`PhiForCausalLM`
|
2023-12-17 02:54:57 -08:00
|
|
|
- Phi
|
|
|
|
- :code:`microsoft/phi-1_5`, :code:`microsoft/phi-2`, etc.
|
2023-09-01 08:28:39 +09:00
|
|
|
* - :code:`QWenLMHeadModel`
|
2023-08-22 11:51:44 -07:00
|
|
|
- Qwen
|
|
|
|
- :code:`Qwen/Qwen-7B`, :code:`Qwen/Qwen-7B-Chat`, etc.
|
2024-01-17 13:32:40 +09:00
|
|
|
* - :code:`StableLMEpochForCausalLM`
|
|
|
|
- StableLM
|
|
|
|
- :code:`stabilityai/stablelm-3b-4e1t/` , :code:`stabilityai/stablelm-base-alpha-7b-v2`, etc.
|
2023-11-16 14:47:26 -08:00
|
|
|
* - :code:`YiForCausalLM`
|
|
|
|
- Yi
|
|
|
|
- :code:`01-ai/Yi-6B`, :code:`01-ai/Yi-34B`, etc.
|
2023-06-02 22:35:17 -07:00
|
|
|
|
2023-06-17 03:07:40 -07:00
|
|
|
If your model uses one of the above model architectures, you can seamlessly run your model with vLLM.
|
2023-06-02 22:35:17 -07:00
|
|
|
Otherwise, please refer to :ref:`Adding a New Model <adding_a_new_model>` for instructions on how to implement support for your model.
|
2023-06-19 20:03:40 -07:00
|
|
|
Alternatively, you can raise an issue on our `GitHub <https://github.com/vllm-project/vllm/issues>`_ project.
|
2023-06-02 22:35:17 -07:00
|
|
|
|
2023-12-13 09:45:34 -08:00
|
|
|
.. note::
|
2023-12-13 23:55:07 -08:00
|
|
|
Currently, the ROCm version of vLLM supports Mistral and Mixtral only for context lengths up to 4096.
|
2023-12-13 09:45:34 -08:00
|
|
|
|
2023-06-02 22:35:17 -07:00
|
|
|
.. tip::
|
|
|
|
The easiest way to check if your model is supported is to run the program below:
|
|
|
|
|
|
|
|
.. code-block:: python
|
|
|
|
|
2023-06-17 03:07:40 -07:00
|
|
|
from vllm import LLM
|
2023-06-02 22:35:17 -07:00
|
|
|
|
|
|
|
llm = LLM(model=...) # Name or path of your model
|
|
|
|
output = llm.generate("Hello, my name is")
|
|
|
|
print(output)
|
|
|
|
|
2023-12-13 09:45:34 -08:00
|
|
|
If vLLM successfully generates text, it indicates that your model is supported.
|
|
|
|
|
|
|
|
.. tip::
|
2023-12-20 22:43:42 +02:00
|
|
|
To use models from `ModelScope <https://www.modelscope.cn>`_ instead of HuggingFace Hub, set an environment variable:
|
2023-11-18 12:38:31 +08:00
|
|
|
|
|
|
|
.. code-block:: shell
|
|
|
|
|
|
|
|
$ export VLLM_USE_MODELSCOPE=True
|
|
|
|
|
2023-12-13 09:45:34 -08:00
|
|
|
And use with :code:`trust_remote_code=True`.
|
|
|
|
|
2023-11-18 12:38:31 +08:00
|
|
|
.. code-block:: python
|
|
|
|
|
|
|
|
from vllm import LLM
|
|
|
|
|
|
|
|
llm = LLM(model=..., revision=..., trust_remote_code=True) # Name or path of your model
|
|
|
|
output = llm.generate("Hello, my name is")
|
|
|
|
print(output)
|