[Doc] Convert list tables to MyST (#11594)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
This commit is contained in:
parent
4fb8e329fd
commit
32b4c63f02
@ -197,4 +197,4 @@ if __name__ == '__main__':
|
|||||||
## Known Issues
|
## Known Issues
|
||||||
|
|
||||||
- In `v0.5.2`, `v0.5.3`, and `v0.5.3.post1`, there is a bug caused by [zmq](https://github.com/zeromq/pyzmq/issues/2000) , which can occasionally cause vLLM to hang depending on the machine configuration. The solution is to upgrade to the latest version of `vllm` to include the [fix](gh-pr:6759).
|
- In `v0.5.2`, `v0.5.3`, and `v0.5.3.post1`, there is a bug caused by [zmq](https://github.com/zeromq/pyzmq/issues/2000) , which can occasionally cause vLLM to hang depending on the machine configuration. The solution is to upgrade to the latest version of `vllm` to include the [fix](gh-pr:6759).
|
||||||
- To circumvent a NCCL [bug](https://github.com/NVIDIA/nccl/issues/1234) , all vLLM processes will set an environment variable ``NCCL_CUMEM_ENABLE=0`` to disable NCCL's ``cuMem`` allocator. It does not affect performance but only gives memory benefits. When external processes want to set up a NCCL connection with vLLM's processes, they should also set this environment variable, otherwise, inconsistent environment setup will cause NCCL to hang or crash, as observed in the [RLHF integration](https://github.com/OpenRLHF/OpenRLHF/pull/604) and the [discussion](gh-issue:5723#issuecomment-2554389656) .
|
- To circumvent a NCCL [bug](https://github.com/NVIDIA/nccl/issues/1234) , all vLLM processes will set an environment variable `NCCL_CUMEM_ENABLE=0` to disable NCCL's `cuMem` allocator. It does not affect performance but only gives memory benefits. When external processes want to set up a NCCL connection with vLLM's processes, they should also set this environment variable, otherwise, inconsistent environment setup will cause NCCL to hang or crash, as observed in the [RLHF integration](https://github.com/OpenRLHF/OpenRLHF/pull/604) and the [discussion](gh-issue:5723#issuecomment-2554389656) .
|
||||||
|
@ -141,13 +141,12 @@ Gaudi2 devices. Configurations that are not listed may or may not work.
|
|||||||
|
|
||||||
Currently in vLLM for HPU we support four execution modes, depending on selected HPU PyTorch Bridge backend (via `PT_HPU_LAZY_MODE` environment variable), and `--enforce-eager` flag.
|
Currently in vLLM for HPU we support four execution modes, depending on selected HPU PyTorch Bridge backend (via `PT_HPU_LAZY_MODE` environment variable), and `--enforce-eager` flag.
|
||||||
|
|
||||||
```{eval-rst}
|
```{list-table} vLLM execution modes
|
||||||
.. list-table:: vLLM execution modes
|
|
||||||
:widths: 25 25 50
|
:widths: 25 25 50
|
||||||
:header-rows: 1
|
:header-rows: 1
|
||||||
|
|
||||||
* - ``PT_HPU_LAZY_MODE``
|
* - `PT_HPU_LAZY_MODE`
|
||||||
- ``enforce_eager``
|
- `enforce_eager`
|
||||||
- execution mode
|
- execution mode
|
||||||
* - 0
|
* - 0
|
||||||
- 0
|
- 0
|
||||||
|
@ -68,8 +68,7 @@ gcloud alpha compute tpus queued-resources create QUEUED_RESOURCE_ID \
|
|||||||
--service-account SERVICE_ACCOUNT
|
--service-account SERVICE_ACCOUNT
|
||||||
```
|
```
|
||||||
|
|
||||||
```{eval-rst}
|
```{list-table} Parameter descriptions
|
||||||
.. list-table:: Parameter descriptions
|
|
||||||
:header-rows: 1
|
:header-rows: 1
|
||||||
|
|
||||||
* - Parameter name
|
* - Parameter name
|
||||||
|
@ -72,289 +72,288 @@ See [this page](#generative-models) for more information on how to use generativ
|
|||||||
|
|
||||||
#### Text Generation (`--task generate`)
|
#### Text Generation (`--task generate`)
|
||||||
|
|
||||||
```{eval-rst}
|
```{list-table}
|
||||||
.. list-table::
|
|
||||||
:widths: 25 25 50 5 5
|
:widths: 25 25 50 5 5
|
||||||
:header-rows: 1
|
:header-rows: 1
|
||||||
|
|
||||||
* - Architecture
|
* - Architecture
|
||||||
- Models
|
- Models
|
||||||
- Example HF Models
|
- Example HF Models
|
||||||
- :ref:`LoRA <lora-adapter>`
|
- [LoRA](#lora-adapter)
|
||||||
- :ref:`PP <distributed-serving>`
|
- [PP](#distributed-serving)
|
||||||
* - :code:`AquilaForCausalLM`
|
* - `AquilaForCausalLM`
|
||||||
- Aquila, Aquila2
|
- Aquila, Aquila2
|
||||||
- :code:`BAAI/Aquila-7B`, :code:`BAAI/AquilaChat-7B`, etc.
|
- `BAAI/Aquila-7B`, `BAAI/AquilaChat-7B`, etc.
|
||||||
- ✅︎
|
- ✅︎
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`ArcticForCausalLM`
|
* - `ArcticForCausalLM`
|
||||||
- Arctic
|
- Arctic
|
||||||
- :code:`Snowflake/snowflake-arctic-base`, :code:`Snowflake/snowflake-arctic-instruct`, etc.
|
- `Snowflake/snowflake-arctic-base`, `Snowflake/snowflake-arctic-instruct`, etc.
|
||||||
-
|
-
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`BaiChuanForCausalLM`
|
* - `BaiChuanForCausalLM`
|
||||||
- Baichuan2, Baichuan
|
- Baichuan2, Baichuan
|
||||||
- :code:`baichuan-inc/Baichuan2-13B-Chat`, :code:`baichuan-inc/Baichuan-7B`, etc.
|
- `baichuan-inc/Baichuan2-13B-Chat`, `baichuan-inc/Baichuan-7B`, etc.
|
||||||
- ✅︎
|
- ✅︎
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`BloomForCausalLM`
|
* - `BloomForCausalLM`
|
||||||
- BLOOM, BLOOMZ, BLOOMChat
|
- BLOOM, BLOOMZ, BLOOMChat
|
||||||
- :code:`bigscience/bloom`, :code:`bigscience/bloomz`, etc.
|
- `bigscience/bloom`, `bigscience/bloomz`, etc.
|
||||||
-
|
-
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`BartForConditionalGeneration`
|
* - `BartForConditionalGeneration`
|
||||||
- BART
|
- BART
|
||||||
- :code:`facebook/bart-base`, :code:`facebook/bart-large-cnn`, etc.
|
- `facebook/bart-base`, `facebook/bart-large-cnn`, etc.
|
||||||
-
|
-
|
||||||
-
|
-
|
||||||
* - :code:`ChatGLMModel`
|
* - `ChatGLMModel`
|
||||||
- ChatGLM
|
- ChatGLM
|
||||||
- :code:`THUDM/chatglm2-6b`, :code:`THUDM/chatglm3-6b`, etc.
|
- `THUDM/chatglm2-6b`, `THUDM/chatglm3-6b`, etc.
|
||||||
- ✅︎
|
- ✅︎
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`CohereForCausalLM`, :code:`Cohere2ForCausalLM`
|
* - `CohereForCausalLM`, `Cohere2ForCausalLM`
|
||||||
- Command-R
|
- Command-R
|
||||||
- :code:`CohereForAI/c4ai-command-r-v01`, :code:`CohereForAI/c4ai-command-r7b-12-2024`, etc.
|
- `CohereForAI/c4ai-command-r-v01`, `CohereForAI/c4ai-command-r7b-12-2024`, etc.
|
||||||
- ✅︎
|
- ✅︎
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`DbrxForCausalLM`
|
* - `DbrxForCausalLM`
|
||||||
- DBRX
|
- DBRX
|
||||||
- :code:`databricks/dbrx-base`, :code:`databricks/dbrx-instruct`, etc.
|
- `databricks/dbrx-base`, `databricks/dbrx-instruct`, etc.
|
||||||
-
|
-
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`DeciLMForCausalLM`
|
* - `DeciLMForCausalLM`
|
||||||
- DeciLM
|
- DeciLM
|
||||||
- :code:`Deci/DeciLM-7B`, :code:`Deci/DeciLM-7B-instruct`, etc.
|
- `Deci/DeciLM-7B`, `Deci/DeciLM-7B-instruct`, etc.
|
||||||
-
|
-
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`DeepseekForCausalLM`
|
* - `DeepseekForCausalLM`
|
||||||
- DeepSeek
|
- DeepSeek
|
||||||
- :code:`deepseek-ai/deepseek-llm-67b-base`, :code:`deepseek-ai/deepseek-llm-7b-chat` etc.
|
- `deepseek-ai/deepseek-llm-67b-base`, `deepseek-ai/deepseek-llm-7b-chat` etc.
|
||||||
-
|
-
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`DeepseekV2ForCausalLM`
|
* - `DeepseekV2ForCausalLM`
|
||||||
- DeepSeek-V2
|
- DeepSeek-V2
|
||||||
- :code:`deepseek-ai/DeepSeek-V2`, :code:`deepseek-ai/DeepSeek-V2-Chat` etc.
|
- `deepseek-ai/DeepSeek-V2`, `deepseek-ai/DeepSeek-V2-Chat` etc.
|
||||||
-
|
-
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`DeepseekV3ForCausalLM`
|
* - `DeepseekV3ForCausalLM`
|
||||||
- DeepSeek-V3
|
- DeepSeek-V3
|
||||||
- :code:`deepseek-ai/DeepSeek-V3-Base`, :code:`deepseek-ai/DeepSeek-V3` etc.
|
- `deepseek-ai/DeepSeek-V3-Base`, `deepseek-ai/DeepSeek-V3` etc.
|
||||||
-
|
-
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`ExaoneForCausalLM`
|
* - `ExaoneForCausalLM`
|
||||||
- EXAONE-3
|
- EXAONE-3
|
||||||
- :code:`LGAI-EXAONE/EXAONE-3.0-7.8B-Instruct`, etc.
|
- `LGAI-EXAONE/EXAONE-3.0-7.8B-Instruct`, etc.
|
||||||
- ✅︎
|
- ✅︎
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`FalconForCausalLM`
|
* - `FalconForCausalLM`
|
||||||
- Falcon
|
- Falcon
|
||||||
- :code:`tiiuae/falcon-7b`, :code:`tiiuae/falcon-40b`, :code:`tiiuae/falcon-rw-7b`, etc.
|
- `tiiuae/falcon-7b`, `tiiuae/falcon-40b`, `tiiuae/falcon-rw-7b`, etc.
|
||||||
-
|
-
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`FalconMambaForCausalLM`
|
* - `FalconMambaForCausalLM`
|
||||||
- FalconMamba
|
- FalconMamba
|
||||||
- :code:`tiiuae/falcon-mamba-7b`, :code:`tiiuae/falcon-mamba-7b-instruct`, etc.
|
- `tiiuae/falcon-mamba-7b`, `tiiuae/falcon-mamba-7b-instruct`, etc.
|
||||||
- ✅︎
|
- ✅︎
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`GemmaForCausalLM`
|
* - `GemmaForCausalLM`
|
||||||
- Gemma
|
- Gemma
|
||||||
- :code:`google/gemma-2b`, :code:`google/gemma-7b`, etc.
|
- `google/gemma-2b`, `google/gemma-7b`, etc.
|
||||||
- ✅︎
|
- ✅︎
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`Gemma2ForCausalLM`
|
* - `Gemma2ForCausalLM`
|
||||||
- Gemma2
|
- Gemma2
|
||||||
- :code:`google/gemma-2-9b`, :code:`google/gemma-2-27b`, etc.
|
- `google/gemma-2-9b`, `google/gemma-2-27b`, etc.
|
||||||
- ✅︎
|
- ✅︎
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`GlmForCausalLM`
|
* - `GlmForCausalLM`
|
||||||
- GLM-4
|
- GLM-4
|
||||||
- :code:`THUDM/glm-4-9b-chat-hf`, etc.
|
- `THUDM/glm-4-9b-chat-hf`, etc.
|
||||||
- ✅︎
|
- ✅︎
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`GPT2LMHeadModel`
|
* - `GPT2LMHeadModel`
|
||||||
- GPT-2
|
- GPT-2
|
||||||
- :code:`gpt2`, :code:`gpt2-xl`, etc.
|
- `gpt2`, `gpt2-xl`, etc.
|
||||||
-
|
-
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`GPTBigCodeForCausalLM`
|
* - `GPTBigCodeForCausalLM`
|
||||||
- StarCoder, SantaCoder, WizardCoder
|
- StarCoder, SantaCoder, WizardCoder
|
||||||
- :code:`bigcode/starcoder`, :code:`bigcode/gpt_bigcode-santacoder`, :code:`WizardLM/WizardCoder-15B-V1.0`, etc.
|
- `bigcode/starcoder`, `bigcode/gpt_bigcode-santacoder`, `WizardLM/WizardCoder-15B-V1.0`, etc.
|
||||||
- ✅︎
|
- ✅︎
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`GPTJForCausalLM`
|
* - `GPTJForCausalLM`
|
||||||
- GPT-J
|
- GPT-J
|
||||||
- :code:`EleutherAI/gpt-j-6b`, :code:`nomic-ai/gpt4all-j`, etc.
|
- `EleutherAI/gpt-j-6b`, `nomic-ai/gpt4all-j`, etc.
|
||||||
-
|
-
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`GPTNeoXForCausalLM`
|
* - `GPTNeoXForCausalLM`
|
||||||
- GPT-NeoX, Pythia, OpenAssistant, Dolly V2, StableLM
|
- GPT-NeoX, Pythia, OpenAssistant, Dolly V2, StableLM
|
||||||
- :code:`EleutherAI/gpt-neox-20b`, :code:`EleutherAI/pythia-12b`, :code:`OpenAssistant/oasst-sft-4-pythia-12b-epoch-3.5`, :code:`databricks/dolly-v2-12b`, :code:`stabilityai/stablelm-tuned-alpha-7b`, etc.
|
- `EleutherAI/gpt-neox-20b`, `EleutherAI/pythia-12b`, `OpenAssistant/oasst-sft-4-pythia-12b-epoch-3.5`, `databricks/dolly-v2-12b`, `stabilityai/stablelm-tuned-alpha-7b`, etc.
|
||||||
-
|
-
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`GraniteForCausalLM`
|
* - `GraniteForCausalLM`
|
||||||
- Granite 3.0, Granite 3.1, PowerLM
|
- Granite 3.0, Granite 3.1, PowerLM
|
||||||
- :code:`ibm-granite/granite-3.0-2b-base`, :code:`ibm-granite/granite-3.1-8b-instruct`, :code:`ibm/PowerLM-3b`, etc.
|
- `ibm-granite/granite-3.0-2b-base`, `ibm-granite/granite-3.1-8b-instruct`, `ibm/PowerLM-3b`, etc.
|
||||||
- ✅︎
|
- ✅︎
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`GraniteMoeForCausalLM`
|
* - `GraniteMoeForCausalLM`
|
||||||
- Granite 3.0 MoE, PowerMoE
|
- Granite 3.0 MoE, PowerMoE
|
||||||
- :code:`ibm-granite/granite-3.0-1b-a400m-base`, :code:`ibm-granite/granite-3.0-3b-a800m-instruct`, :code:`ibm/PowerMoE-3b`, etc.
|
- `ibm-granite/granite-3.0-1b-a400m-base`, `ibm-granite/granite-3.0-3b-a800m-instruct`, `ibm/PowerMoE-3b`, etc.
|
||||||
- ✅︎
|
- ✅︎
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`GritLM`
|
* - `GritLM`
|
||||||
- GritLM
|
- GritLM
|
||||||
- :code:`parasail-ai/GritLM-7B-vllm`.
|
- `parasail-ai/GritLM-7B-vllm`.
|
||||||
- ✅︎
|
- ✅︎
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`InternLMForCausalLM`
|
* - `InternLMForCausalLM`
|
||||||
- InternLM
|
- InternLM
|
||||||
- :code:`internlm/internlm-7b`, :code:`internlm/internlm-chat-7b`, etc.
|
- `internlm/internlm-7b`, `internlm/internlm-chat-7b`, etc.
|
||||||
- ✅︎
|
- ✅︎
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`InternLM2ForCausalLM`
|
* - `InternLM2ForCausalLM`
|
||||||
- InternLM2
|
- InternLM2
|
||||||
- :code:`internlm/internlm2-7b`, :code:`internlm/internlm2-chat-7b`, etc.
|
- `internlm/internlm2-7b`, `internlm/internlm2-chat-7b`, etc.
|
||||||
- ✅︎
|
- ✅︎
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`JAISLMHeadModel`
|
* - `JAISLMHeadModel`
|
||||||
- Jais
|
- Jais
|
||||||
- :code:`inceptionai/jais-13b`, :code:`inceptionai/jais-13b-chat`, :code:`inceptionai/jais-30b-v3`, :code:`inceptionai/jais-30b-chat-v3`, etc.
|
- `inceptionai/jais-13b`, `inceptionai/jais-13b-chat`, `inceptionai/jais-30b-v3`, `inceptionai/jais-30b-chat-v3`, etc.
|
||||||
-
|
-
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`JambaForCausalLM`
|
* - `JambaForCausalLM`
|
||||||
- Jamba
|
- Jamba
|
||||||
- :code:`ai21labs/AI21-Jamba-1.5-Large`, :code:`ai21labs/AI21-Jamba-1.5-Mini`, :code:`ai21labs/Jamba-v0.1`, etc.
|
- `ai21labs/AI21-Jamba-1.5-Large`, `ai21labs/AI21-Jamba-1.5-Mini`, `ai21labs/Jamba-v0.1`, etc.
|
||||||
- ✅︎
|
- ✅︎
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`LlamaForCausalLM`
|
* - `LlamaForCausalLM`
|
||||||
- Llama 3.1, Llama 3, Llama 2, LLaMA, Yi
|
- Llama 3.1, Llama 3, Llama 2, LLaMA, Yi
|
||||||
- :code:`meta-llama/Meta-Llama-3.1-405B-Instruct`, :code:`meta-llama/Meta-Llama-3.1-70B`, :code:`meta-llama/Meta-Llama-3-70B-Instruct`, :code:`meta-llama/Llama-2-70b-hf`, :code:`01-ai/Yi-34B`, etc.
|
- `meta-llama/Meta-Llama-3.1-405B-Instruct`, `meta-llama/Meta-Llama-3.1-70B`, `meta-llama/Meta-Llama-3-70B-Instruct`, `meta-llama/Llama-2-70b-hf`, `01-ai/Yi-34B`, etc.
|
||||||
- ✅︎
|
- ✅︎
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`MambaForCausalLM`
|
* - `MambaForCausalLM`
|
||||||
- Mamba
|
- Mamba
|
||||||
- :code:`state-spaces/mamba-130m-hf`, :code:`state-spaces/mamba-790m-hf`, :code:`state-spaces/mamba-2.8b-hf`, etc.
|
- `state-spaces/mamba-130m-hf`, `state-spaces/mamba-790m-hf`, `state-spaces/mamba-2.8b-hf`, etc.
|
||||||
-
|
-
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`MiniCPMForCausalLM`
|
* - `MiniCPMForCausalLM`
|
||||||
- MiniCPM
|
- MiniCPM
|
||||||
- :code:`openbmb/MiniCPM-2B-sft-bf16`, :code:`openbmb/MiniCPM-2B-dpo-bf16`, :code:`openbmb/MiniCPM-S-1B-sft`, etc.
|
- `openbmb/MiniCPM-2B-sft-bf16`, `openbmb/MiniCPM-2B-dpo-bf16`, `openbmb/MiniCPM-S-1B-sft`, etc.
|
||||||
- ✅︎
|
- ✅︎
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`MiniCPM3ForCausalLM`
|
* - `MiniCPM3ForCausalLM`
|
||||||
- MiniCPM3
|
- MiniCPM3
|
||||||
- :code:`openbmb/MiniCPM3-4B`, etc.
|
- `openbmb/MiniCPM3-4B`, etc.
|
||||||
- ✅︎
|
- ✅︎
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`MistralForCausalLM`
|
* - `MistralForCausalLM`
|
||||||
- Mistral, Mistral-Instruct
|
- Mistral, Mistral-Instruct
|
||||||
- :code:`mistralai/Mistral-7B-v0.1`, :code:`mistralai/Mistral-7B-Instruct-v0.1`, etc.
|
- `mistralai/Mistral-7B-v0.1`, `mistralai/Mistral-7B-Instruct-v0.1`, etc.
|
||||||
- ✅︎
|
- ✅︎
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`MixtralForCausalLM`
|
* - `MixtralForCausalLM`
|
||||||
- Mixtral-8x7B, Mixtral-8x7B-Instruct
|
- Mixtral-8x7B, Mixtral-8x7B-Instruct
|
||||||
- :code:`mistralai/Mixtral-8x7B-v0.1`, :code:`mistralai/Mixtral-8x7B-Instruct-v0.1`, :code:`mistral-community/Mixtral-8x22B-v0.1`, etc.
|
- `mistralai/Mixtral-8x7B-v0.1`, `mistralai/Mixtral-8x7B-Instruct-v0.1`, `mistral-community/Mixtral-8x22B-v0.1`, etc.
|
||||||
- ✅︎
|
- ✅︎
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`MPTForCausalLM`
|
* - `MPTForCausalLM`
|
||||||
- MPT, MPT-Instruct, MPT-Chat, MPT-StoryWriter
|
- MPT, MPT-Instruct, MPT-Chat, MPT-StoryWriter
|
||||||
- :code:`mosaicml/mpt-7b`, :code:`mosaicml/mpt-7b-storywriter`, :code:`mosaicml/mpt-30b`, etc.
|
- `mosaicml/mpt-7b`, `mosaicml/mpt-7b-storywriter`, `mosaicml/mpt-30b`, etc.
|
||||||
-
|
-
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`NemotronForCausalLM`
|
* - `NemotronForCausalLM`
|
||||||
- Nemotron-3, Nemotron-4, Minitron
|
- Nemotron-3, Nemotron-4, Minitron
|
||||||
- :code:`nvidia/Minitron-8B-Base`, :code:`mgoin/Nemotron-4-340B-Base-hf-FP8`, etc.
|
- `nvidia/Minitron-8B-Base`, `mgoin/Nemotron-4-340B-Base-hf-FP8`, etc.
|
||||||
- ✅︎
|
- ✅︎
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`OLMoForCausalLM`
|
* - `OLMoForCausalLM`
|
||||||
- OLMo
|
- OLMo
|
||||||
- :code:`allenai/OLMo-1B-hf`, :code:`allenai/OLMo-7B-hf`, etc.
|
- `allenai/OLMo-1B-hf`, `allenai/OLMo-7B-hf`, etc.
|
||||||
-
|
-
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`OLMo2ForCausalLM`
|
* - `OLMo2ForCausalLM`
|
||||||
- OLMo2
|
- OLMo2
|
||||||
- :code:`allenai/OLMo2-7B-1124`, etc.
|
- `allenai/OLMo2-7B-1124`, etc.
|
||||||
-
|
-
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`OLMoEForCausalLM`
|
* - `OLMoEForCausalLM`
|
||||||
- OLMoE
|
- OLMoE
|
||||||
- :code:`allenai/OLMoE-1B-7B-0924`, :code:`allenai/OLMoE-1B-7B-0924-Instruct`, etc.
|
- `allenai/OLMoE-1B-7B-0924`, `allenai/OLMoE-1B-7B-0924-Instruct`, etc.
|
||||||
- ✅︎
|
- ✅︎
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`OPTForCausalLM`
|
* - `OPTForCausalLM`
|
||||||
- OPT, OPT-IML
|
- OPT, OPT-IML
|
||||||
- :code:`facebook/opt-66b`, :code:`facebook/opt-iml-max-30b`, etc.
|
- `facebook/opt-66b`, `facebook/opt-iml-max-30b`, etc.
|
||||||
-
|
-
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`OrionForCausalLM`
|
* - `OrionForCausalLM`
|
||||||
- Orion
|
- Orion
|
||||||
- :code:`OrionStarAI/Orion-14B-Base`, :code:`OrionStarAI/Orion-14B-Chat`, etc.
|
- `OrionStarAI/Orion-14B-Base`, `OrionStarAI/Orion-14B-Chat`, etc.
|
||||||
-
|
-
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`PhiForCausalLM`
|
* - `PhiForCausalLM`
|
||||||
- Phi
|
- Phi
|
||||||
- :code:`microsoft/phi-1_5`, :code:`microsoft/phi-2`, etc.
|
- `microsoft/phi-1_5`, `microsoft/phi-2`, etc.
|
||||||
- ✅︎
|
- ✅︎
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`Phi3ForCausalLM`
|
* - `Phi3ForCausalLM`
|
||||||
- Phi-3
|
- Phi-3
|
||||||
- :code:`microsoft/Phi-3-mini-4k-instruct`, :code:`microsoft/Phi-3-mini-128k-instruct`, :code:`microsoft/Phi-3-medium-128k-instruct`, etc.
|
- `microsoft/Phi-3-mini-4k-instruct`, `microsoft/Phi-3-mini-128k-instruct`, `microsoft/Phi-3-medium-128k-instruct`, etc.
|
||||||
- ✅︎
|
- ✅︎
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`Phi3SmallForCausalLM`
|
* - `Phi3SmallForCausalLM`
|
||||||
- Phi-3-Small
|
- Phi-3-Small
|
||||||
- :code:`microsoft/Phi-3-small-8k-instruct`, :code:`microsoft/Phi-3-small-128k-instruct`, etc.
|
- `microsoft/Phi-3-small-8k-instruct`, `microsoft/Phi-3-small-128k-instruct`, etc.
|
||||||
-
|
-
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`PhiMoEForCausalLM`
|
* - `PhiMoEForCausalLM`
|
||||||
- Phi-3.5-MoE
|
- Phi-3.5-MoE
|
||||||
- :code:`microsoft/Phi-3.5-MoE-instruct`, etc.
|
- `microsoft/Phi-3.5-MoE-instruct`, etc.
|
||||||
- ✅︎
|
- ✅︎
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`PersimmonForCausalLM`
|
* - `PersimmonForCausalLM`
|
||||||
- Persimmon
|
- Persimmon
|
||||||
- :code:`adept/persimmon-8b-base`, :code:`adept/persimmon-8b-chat`, etc.
|
- `adept/persimmon-8b-base`, `adept/persimmon-8b-chat`, etc.
|
||||||
-
|
-
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`QWenLMHeadModel`
|
* - `QWenLMHeadModel`
|
||||||
- Qwen
|
- Qwen
|
||||||
- :code:`Qwen/Qwen-7B`, :code:`Qwen/Qwen-7B-Chat`, etc.
|
- `Qwen/Qwen-7B`, `Qwen/Qwen-7B-Chat`, etc.
|
||||||
- ✅︎
|
- ✅︎
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`Qwen2ForCausalLM`
|
* - `Qwen2ForCausalLM`
|
||||||
- Qwen2
|
- Qwen2
|
||||||
- :code:`Qwen/QwQ-32B-Preview`, :code:`Qwen/Qwen2-7B-Instruct`, :code:`Qwen/Qwen2-7B`, etc.
|
- `Qwen/QwQ-32B-Preview`, `Qwen/Qwen2-7B-Instruct`, `Qwen/Qwen2-7B`, etc.
|
||||||
- ✅︎
|
- ✅︎
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`Qwen2MoeForCausalLM`
|
* - `Qwen2MoeForCausalLM`
|
||||||
- Qwen2MoE
|
- Qwen2MoE
|
||||||
- :code:`Qwen/Qwen1.5-MoE-A2.7B`, :code:`Qwen/Qwen1.5-MoE-A2.7B-Chat`, etc.
|
- `Qwen/Qwen1.5-MoE-A2.7B`, `Qwen/Qwen1.5-MoE-A2.7B-Chat`, etc.
|
||||||
-
|
-
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`StableLmForCausalLM`
|
* - `StableLmForCausalLM`
|
||||||
- StableLM
|
- StableLM
|
||||||
- :code:`stabilityai/stablelm-3b-4e1t`, :code:`stabilityai/stablelm-base-alpha-7b-v2`, etc.
|
- `stabilityai/stablelm-3b-4e1t`, `stabilityai/stablelm-base-alpha-7b-v2`, etc.
|
||||||
-
|
-
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`Starcoder2ForCausalLM`
|
* - `Starcoder2ForCausalLM`
|
||||||
- Starcoder2
|
- Starcoder2
|
||||||
- :code:`bigcode/starcoder2-3b`, :code:`bigcode/starcoder2-7b`, :code:`bigcode/starcoder2-15b`, etc.
|
- `bigcode/starcoder2-3b`, `bigcode/starcoder2-7b`, `bigcode/starcoder2-15b`, etc.
|
||||||
-
|
-
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`SolarForCausalLM`
|
* - `SolarForCausalLM`
|
||||||
- Solar Pro
|
- Solar Pro
|
||||||
- :code:`upstage/solar-pro-preview-instruct`, etc.
|
- `upstage/solar-pro-preview-instruct`, etc.
|
||||||
- ✅︎
|
- ✅︎
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`TeleChat2ForCausalLM`
|
* - `TeleChat2ForCausalLM`
|
||||||
- TeleChat2
|
- TeleChat2
|
||||||
- :code:`TeleAI/TeleChat2-3B`, :code:`TeleAI/TeleChat2-7B`, :code:`TeleAI/TeleChat2-35B`, etc.
|
- `TeleAI/TeleChat2-3B`, `TeleAI/TeleChat2-7B`, `TeleAI/TeleChat2-35B`, etc.
|
||||||
- ✅︎
|
- ✅︎
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`XverseForCausalLM`
|
* - `XverseForCausalLM`
|
||||||
- XVERSE
|
- XVERSE
|
||||||
- :code:`xverse/XVERSE-7B-Chat`, :code:`xverse/XVERSE-13B-Chat`, :code:`xverse/XVERSE-65B-Chat`, etc.
|
- `xverse/XVERSE-7B-Chat`, `xverse/XVERSE-13B-Chat`, `xverse/XVERSE-65B-Chat`, etc.
|
||||||
- ✅︎
|
- ✅︎
|
||||||
- ✅︎
|
- ✅︎
|
||||||
```
|
```
|
||||||
@ -374,49 +373,48 @@ you should explicitly specify the task type to ensure that the model is used in
|
|||||||
|
|
||||||
#### Text Embedding (`--task embed`)
|
#### Text Embedding (`--task embed`)
|
||||||
|
|
||||||
```{eval-rst}
|
```{list-table}
|
||||||
.. list-table::
|
|
||||||
:widths: 25 25 50 5 5
|
:widths: 25 25 50 5 5
|
||||||
:header-rows: 1
|
:header-rows: 1
|
||||||
|
|
||||||
* - Architecture
|
* - Architecture
|
||||||
- Models
|
- Models
|
||||||
- Example HF Models
|
- Example HF Models
|
||||||
- :ref:`LoRA <lora-adapter>`
|
- [LoRA](#lora-adapter)
|
||||||
- :ref:`PP <distributed-serving>`
|
- [PP](#distributed-serving)
|
||||||
* - :code:`BertModel`
|
* - `BertModel`
|
||||||
- BERT-based
|
- BERT-based
|
||||||
- :code:`BAAI/bge-base-en-v1.5`, etc.
|
- `BAAI/bge-base-en-v1.5`, etc.
|
||||||
-
|
-
|
||||||
-
|
-
|
||||||
* - :code:`Gemma2Model`
|
* - `Gemma2Model`
|
||||||
- Gemma2-based
|
- Gemma2-based
|
||||||
- :code:`BAAI/bge-multilingual-gemma2`, etc.
|
- `BAAI/bge-multilingual-gemma2`, etc.
|
||||||
-
|
-
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`GritLM`
|
* - `GritLM`
|
||||||
- GritLM
|
- GritLM
|
||||||
- :code:`parasail-ai/GritLM-7B-vllm`.
|
- `parasail-ai/GritLM-7B-vllm`.
|
||||||
- ✅︎
|
- ✅︎
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`LlamaModel`, :code:`LlamaForCausalLM`, :code:`MistralModel`, etc.
|
* - `LlamaModel`, `LlamaForCausalLM`, `MistralModel`, etc.
|
||||||
- Llama-based
|
- Llama-based
|
||||||
- :code:`intfloat/e5-mistral-7b-instruct`, etc.
|
- `intfloat/e5-mistral-7b-instruct`, etc.
|
||||||
- ✅︎
|
- ✅︎
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`Qwen2Model`, :code:`Qwen2ForCausalLM`
|
* - `Qwen2Model`, `Qwen2ForCausalLM`
|
||||||
- Qwen2-based
|
- Qwen2-based
|
||||||
- :code:`ssmits/Qwen2-7B-Instruct-embed-base` (see note), :code:`Alibaba-NLP/gte-Qwen2-7B-instruct` (see note), etc.
|
- `ssmits/Qwen2-7B-Instruct-embed-base` (see note), `Alibaba-NLP/gte-Qwen2-7B-instruct` (see note), etc.
|
||||||
- ✅︎
|
- ✅︎
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`RobertaModel`, :code:`RobertaForMaskedLM`
|
* - `RobertaModel`, `RobertaForMaskedLM`
|
||||||
- RoBERTa-based
|
- RoBERTa-based
|
||||||
- :code:`sentence-transformers/all-roberta-large-v1`, :code:`sentence-transformers/all-roberta-large-v1`, etc.
|
- `sentence-transformers/all-roberta-large-v1`, `sentence-transformers/all-roberta-large-v1`, etc.
|
||||||
-
|
-
|
||||||
-
|
-
|
||||||
* - :code:`XLMRobertaModel`
|
* - `XLMRobertaModel`
|
||||||
- XLM-RoBERTa-based
|
- XLM-RoBERTa-based
|
||||||
- :code:`intfloat/multilingual-e5-large`, etc.
|
- `intfloat/multilingual-e5-large`, etc.
|
||||||
-
|
-
|
||||||
-
|
-
|
||||||
```
|
```
|
||||||
@ -440,29 +438,28 @@ of the whole prompt are extracted from the normalized hidden state corresponding
|
|||||||
|
|
||||||
#### Reward Modeling (`--task reward`)
|
#### Reward Modeling (`--task reward`)
|
||||||
|
|
||||||
```{eval-rst}
|
```{list-table}
|
||||||
.. list-table::
|
|
||||||
:widths: 25 25 50 5 5
|
:widths: 25 25 50 5 5
|
||||||
:header-rows: 1
|
:header-rows: 1
|
||||||
|
|
||||||
* - Architecture
|
* - Architecture
|
||||||
- Models
|
- Models
|
||||||
- Example HF Models
|
- Example HF Models
|
||||||
- :ref:`LoRA <lora-adapter>`
|
- [LoRA](#lora-adapter)
|
||||||
- :ref:`PP <distributed-serving>`
|
- [PP](#distributed-serving)
|
||||||
* - :code:`InternLM2ForRewardModel`
|
* - `InternLM2ForRewardModel`
|
||||||
- InternLM2-based
|
- InternLM2-based
|
||||||
- :code:`internlm/internlm2-1_8b-reward`, :code:`internlm/internlm2-7b-reward`, etc.
|
- `internlm/internlm2-1_8b-reward`, `internlm/internlm2-7b-reward`, etc.
|
||||||
- ✅︎
|
- ✅︎
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`LlamaForCausalLM`
|
* - `LlamaForCausalLM`
|
||||||
- Llama-based
|
- Llama-based
|
||||||
- :code:`peiyi9979/math-shepherd-mistral-7b-prm`, etc.
|
- `peiyi9979/math-shepherd-mistral-7b-prm`, etc.
|
||||||
- ✅︎
|
- ✅︎
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`Qwen2ForRewardModel`
|
* - `Qwen2ForRewardModel`
|
||||||
- Qwen2-based
|
- Qwen2-based
|
||||||
- :code:`Qwen/Qwen2.5-Math-RM-72B`, etc.
|
- `Qwen/Qwen2.5-Math-RM-72B`, etc.
|
||||||
- ✅︎
|
- ✅︎
|
||||||
- ✅︎
|
- ✅︎
|
||||||
```
|
```
|
||||||
@ -477,24 +474,23 @@ e.g.: {code}`--override-pooler-config '{"pooling_type": "STEP", "step_tag_id": 1
|
|||||||
|
|
||||||
#### Classification (`--task classify`)
|
#### Classification (`--task classify`)
|
||||||
|
|
||||||
```{eval-rst}
|
```{list-table}
|
||||||
.. list-table::
|
|
||||||
:widths: 25 25 50 5 5
|
:widths: 25 25 50 5 5
|
||||||
:header-rows: 1
|
:header-rows: 1
|
||||||
|
|
||||||
* - Architecture
|
* - Architecture
|
||||||
- Models
|
- Models
|
||||||
- Example HF Models
|
- Example HF Models
|
||||||
- :ref:`LoRA <lora-adapter>`
|
- [LoRA](#lora-adapter)
|
||||||
- :ref:`PP <distributed-serving>`
|
- [PP](#distributed-serving)
|
||||||
* - :code:`JambaForSequenceClassification`
|
* - `JambaForSequenceClassification`
|
||||||
- Jamba
|
- Jamba
|
||||||
- :code:`ai21labs/Jamba-tiny-reward-dev`, etc.
|
- `ai21labs/Jamba-tiny-reward-dev`, etc.
|
||||||
- ✅︎
|
- ✅︎
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`Qwen2ForSequenceClassification`
|
* - `Qwen2ForSequenceClassification`
|
||||||
- Qwen2-based
|
- Qwen2-based
|
||||||
- :code:`jason9693/Qwen2.5-1.5B-apeach`, etc.
|
- `jason9693/Qwen2.5-1.5B-apeach`, etc.
|
||||||
- ✅︎
|
- ✅︎
|
||||||
- ✅︎
|
- ✅︎
|
||||||
```
|
```
|
||||||
@ -504,29 +500,28 @@ If your model is not in the above list, we will try to automatically convert the
|
|||||||
|
|
||||||
#### Sentence Pair Scoring (`--task score`)
|
#### Sentence Pair Scoring (`--task score`)
|
||||||
|
|
||||||
```{eval-rst}
|
```{list-table}
|
||||||
.. list-table::
|
|
||||||
:widths: 25 25 50 5 5
|
:widths: 25 25 50 5 5
|
||||||
:header-rows: 1
|
:header-rows: 1
|
||||||
|
|
||||||
* - Architecture
|
* - Architecture
|
||||||
- Models
|
- Models
|
||||||
- Example HF Models
|
- Example HF Models
|
||||||
- :ref:`LoRA <lora-adapter>`
|
- [LoRA](#lora-adapter)
|
||||||
- :ref:`PP <distributed-serving>`
|
- [PP](#distributed-serving)
|
||||||
* - :code:`BertForSequenceClassification`
|
* - `BertForSequenceClassification`
|
||||||
- BERT-based
|
- BERT-based
|
||||||
- :code:`cross-encoder/ms-marco-MiniLM-L-6-v2`, etc.
|
- `cross-encoder/ms-marco-MiniLM-L-6-v2`, etc.
|
||||||
-
|
-
|
||||||
-
|
-
|
||||||
* - :code:`RobertaForSequenceClassification`
|
* - `RobertaForSequenceClassification`
|
||||||
- RoBERTa-based
|
- RoBERTa-based
|
||||||
- :code:`cross-encoder/quora-roberta-base`, etc.
|
- `cross-encoder/quora-roberta-base`, etc.
|
||||||
-
|
-
|
||||||
-
|
-
|
||||||
* - :code:`XLMRobertaForSequenceClassification`
|
* - `XLMRobertaForSequenceClassification`
|
||||||
- XLM-RoBERTa-based
|
- XLM-RoBERTa-based
|
||||||
- :code:`BAAI/bge-reranker-v2-m3`, etc.
|
- `BAAI/bge-reranker-v2-m3`, etc.
|
||||||
-
|
-
|
||||||
-
|
-
|
||||||
```
|
```
|
||||||
@ -558,8 +553,7 @@ See [this page](#generative-models) for more information on how to use generativ
|
|||||||
|
|
||||||
#### Text Generation (`--task generate`)
|
#### Text Generation (`--task generate`)
|
||||||
|
|
||||||
```{eval-rst}
|
```{list-table}
|
||||||
.. list-table::
|
|
||||||
:widths: 25 25 15 20 5 5 5
|
:widths: 25 25 15 20 5 5 5
|
||||||
:header-rows: 1
|
:header-rows: 1
|
||||||
|
|
||||||
@ -567,177 +561,174 @@ See [this page](#generative-models) for more information on how to use generativ
|
|||||||
- Models
|
- Models
|
||||||
- Inputs
|
- Inputs
|
||||||
- Example HF Models
|
- Example HF Models
|
||||||
- :ref:`LoRA <lora-adapter>`
|
- [LoRA](#lora-adapter)
|
||||||
- :ref:`PP <distributed-serving>`
|
- [PP](#distributed-serving)
|
||||||
- V1
|
- [V1](gh-issue:8779)
|
||||||
* - :code:`AriaForConditionalGeneration`
|
* - `AriaForConditionalGeneration`
|
||||||
- Aria
|
- Aria
|
||||||
- T + I
|
- T + I
|
||||||
- :code:`rhymes-ai/Aria`
|
- `rhymes-ai/Aria`
|
||||||
-
|
-
|
||||||
- ✅︎
|
- ✅︎
|
||||||
-
|
-
|
||||||
* - :code:`Blip2ForConditionalGeneration`
|
* - `Blip2ForConditionalGeneration`
|
||||||
- BLIP-2
|
- BLIP-2
|
||||||
- T + I\ :sup:`E`
|
- T + I<sup>E</sup>
|
||||||
- :code:`Salesforce/blip2-opt-2.7b`, :code:`Salesforce/blip2-opt-6.7b`, etc.
|
- `Salesforce/blip2-opt-2.7b`, `Salesforce/blip2-opt-6.7b`, etc.
|
||||||
-
|
-
|
||||||
- ✅︎
|
- ✅︎
|
||||||
-
|
-
|
||||||
* - :code:`ChameleonForConditionalGeneration`
|
* - `ChameleonForConditionalGeneration`
|
||||||
- Chameleon
|
- Chameleon
|
||||||
- T + I
|
- T + I
|
||||||
- :code:`facebook/chameleon-7b` etc.
|
- `facebook/chameleon-7b` etc.
|
||||||
-
|
-
|
||||||
- ✅︎
|
- ✅︎
|
||||||
-
|
-
|
||||||
* - :code:`FuyuForCausalLM`
|
* - `FuyuForCausalLM`
|
||||||
- Fuyu
|
- Fuyu
|
||||||
- T + I
|
- T + I
|
||||||
- :code:`adept/fuyu-8b` etc.
|
- `adept/fuyu-8b` etc.
|
||||||
-
|
-
|
||||||
- ✅︎
|
- ✅︎
|
||||||
-
|
-
|
||||||
* - :code:`ChatGLMModel`
|
* - `ChatGLMModel`
|
||||||
- GLM-4V
|
- GLM-4V
|
||||||
- T + I
|
- T + I
|
||||||
- :code:`THUDM/glm-4v-9b` etc.
|
- `THUDM/glm-4v-9b` etc.
|
||||||
- ✅︎
|
- ✅︎
|
||||||
- ✅︎
|
- ✅︎
|
||||||
-
|
-
|
||||||
* - :code:`H2OVLChatModel`
|
* - `H2OVLChatModel`
|
||||||
- H2OVL
|
- H2OVL
|
||||||
- T + I\ :sup:`E+`
|
- T + I<sup>E+</sup>
|
||||||
- :code:`h2oai/h2ovl-mississippi-800m`, :code:`h2oai/h2ovl-mississippi-2b`, etc.
|
- `h2oai/h2ovl-mississippi-800m`, `h2oai/h2ovl-mississippi-2b`, etc.
|
||||||
-
|
-
|
||||||
- ✅︎
|
- ✅︎
|
||||||
-
|
-
|
||||||
* - :code:`Idefics3ForConditionalGeneration`
|
* - `Idefics3ForConditionalGeneration`
|
||||||
- Idefics3
|
- Idefics3
|
||||||
- T + I
|
- T + I
|
||||||
- :code:`HuggingFaceM4/Idefics3-8B-Llama3` etc.
|
- `HuggingFaceM4/Idefics3-8B-Llama3` etc.
|
||||||
- ✅︎
|
- ✅︎
|
||||||
-
|
-
|
||||||
-
|
-
|
||||||
* - :code:`InternVLChatModel`
|
* - `InternVLChatModel`
|
||||||
- InternVL 2.5, Mono-InternVL, InternVL 2.0
|
- InternVL 2.5, Mono-InternVL, InternVL 2.0
|
||||||
- T + I\ :sup:`E+`
|
- T + I<sup>E+</sup>
|
||||||
- :code:`OpenGVLab/InternVL2_5-4B`, :code:`OpenGVLab/Mono-InternVL-2B`, :code:`OpenGVLab/InternVL2-4B`, etc.
|
- `OpenGVLab/InternVL2_5-4B`, `OpenGVLab/Mono-InternVL-2B`, `OpenGVLab/InternVL2-4B`, etc.
|
||||||
-
|
-
|
||||||
- ✅︎
|
- ✅︎
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`LlavaForConditionalGeneration`
|
* - `LlavaForConditionalGeneration`
|
||||||
- LLaVA-1.5
|
- LLaVA-1.5
|
||||||
- T + I\ :sup:`E+`
|
- T + I<sup>E+</sup>
|
||||||
- :code:`llava-hf/llava-1.5-7b-hf`, :code:`TIGER-Lab/Mantis-8B-siglip-llama3` (see note), etc.
|
- `llava-hf/llava-1.5-7b-hf`, `TIGER-Lab/Mantis-8B-siglip-llama3` (see note), etc.
|
||||||
-
|
-
|
||||||
- ✅︎
|
- ✅︎
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`LlavaNextForConditionalGeneration`
|
* - `LlavaNextForConditionalGeneration`
|
||||||
- LLaVA-NeXT
|
- LLaVA-NeXT
|
||||||
- T + I\ :sup:`E+`
|
- T + I<sup>E+</sup>
|
||||||
- :code:`llava-hf/llava-v1.6-mistral-7b-hf`, :code:`llava-hf/llava-v1.6-vicuna-7b-hf`, etc.
|
- `llava-hf/llava-v1.6-mistral-7b-hf`, `llava-hf/llava-v1.6-vicuna-7b-hf`, etc.
|
||||||
-
|
-
|
||||||
- ✅︎
|
- ✅︎
|
||||||
-
|
-
|
||||||
* - :code:`LlavaNextVideoForConditionalGeneration`
|
* - `LlavaNextVideoForConditionalGeneration`
|
||||||
- LLaVA-NeXT-Video
|
- LLaVA-NeXT-Video
|
||||||
- T + V
|
- T + V
|
||||||
- :code:`llava-hf/LLaVA-NeXT-Video-7B-hf`, etc.
|
- `llava-hf/LLaVA-NeXT-Video-7B-hf`, etc.
|
||||||
-
|
-
|
||||||
- ✅︎
|
- ✅︎
|
||||||
-
|
-
|
||||||
* - :code:`LlavaOnevisionForConditionalGeneration`
|
* - `LlavaOnevisionForConditionalGeneration`
|
||||||
- LLaVA-Onevision
|
- LLaVA-Onevision
|
||||||
- T + I\ :sup:`+` + V\ :sup:`+`
|
- T + I<sup>+</sup> + V<sup>+</sup>
|
||||||
- :code:`llava-hf/llava-onevision-qwen2-7b-ov-hf`, :code:`llava-hf/llava-onevision-qwen2-0.5b-ov-hf`, etc.
|
- `llava-hf/llava-onevision-qwen2-7b-ov-hf`, `llava-hf/llava-onevision-qwen2-0.5b-ov-hf`, etc.
|
||||||
-
|
-
|
||||||
- ✅︎
|
- ✅︎
|
||||||
-
|
-
|
||||||
* - :code:`MiniCPMV`
|
* - `MiniCPMV`
|
||||||
- MiniCPM-V
|
- MiniCPM-V
|
||||||
- T + I\ :sup:`E+`
|
- T + I<sup>E+</sup>
|
||||||
- :code:`openbmb/MiniCPM-V-2` (see note), :code:`openbmb/MiniCPM-Llama3-V-2_5`, :code:`openbmb/MiniCPM-V-2_6`, etc.
|
- `openbmb/MiniCPM-V-2` (see note), `openbmb/MiniCPM-Llama3-V-2_5`, `openbmb/MiniCPM-V-2_6`, etc.
|
||||||
- ✅︎
|
- ✅︎
|
||||||
- ✅︎
|
- ✅︎
|
||||||
-
|
-
|
||||||
* - :code:`MllamaForConditionalGeneration`
|
* - `MllamaForConditionalGeneration`
|
||||||
- Llama 3.2
|
- Llama 3.2
|
||||||
- T + I\ :sup:`+`
|
- T + I<sup>+</sup>
|
||||||
- :code:`meta-llama/Llama-3.2-90B-Vision-Instruct`, :code:`meta-llama/Llama-3.2-11B-Vision`, etc.
|
- `meta-llama/Llama-3.2-90B-Vision-Instruct`, `meta-llama/Llama-3.2-11B-Vision`, etc.
|
||||||
-
|
-
|
||||||
-
|
-
|
||||||
-
|
-
|
||||||
* - :code:`MolmoForCausalLM`
|
* - `MolmoForCausalLM`
|
||||||
- Molmo
|
- Molmo
|
||||||
- T + I
|
- T + I
|
||||||
- :code:`allenai/Molmo-7B-D-0924`, :code:`allenai/Molmo-72B-0924`, etc.
|
- `allenai/Molmo-7B-D-0924`, `allenai/Molmo-72B-0924`, etc.
|
||||||
-
|
-
|
||||||
- ✅︎
|
- ✅︎
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`NVLM_D_Model`
|
* - `NVLM_D_Model`
|
||||||
- NVLM-D 1.0
|
- NVLM-D 1.0
|
||||||
- T + I\ :sup:`E+`
|
- T + I<sup>E+</sup>
|
||||||
- :code:`nvidia/NVLM-D-72B`, etc.
|
- `nvidia/NVLM-D-72B`, etc.
|
||||||
-
|
-
|
||||||
- ✅︎
|
- ✅︎
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`PaliGemmaForConditionalGeneration`
|
* - `PaliGemmaForConditionalGeneration`
|
||||||
- PaliGemma, PaliGemma 2
|
- PaliGemma, PaliGemma 2
|
||||||
- T + I\ :sup:`E`
|
- T + I<sup>E</sup>
|
||||||
- :code:`google/paligemma-3b-pt-224`, :code:`google/paligemma-3b-mix-224`, :code:`google/paligemma2-3b-ft-docci-448`, etc.
|
- `google/paligemma-3b-pt-224`, `google/paligemma-3b-mix-224`, `google/paligemma2-3b-ft-docci-448`, etc.
|
||||||
-
|
-
|
||||||
- ✅︎
|
- ✅︎
|
||||||
-
|
-
|
||||||
* - :code:`Phi3VForCausalLM`
|
* - `Phi3VForCausalLM`
|
||||||
- Phi-3-Vision, Phi-3.5-Vision
|
- Phi-3-Vision, Phi-3.5-Vision
|
||||||
- T + I\ :sup:`E+`
|
- T + I<sup>E+</sup>
|
||||||
- :code:`microsoft/Phi-3-vision-128k-instruct`, :code:`microsoft/Phi-3.5-vision-instruct` etc.
|
- `microsoft/Phi-3-vision-128k-instruct`, `microsoft/Phi-3.5-vision-instruct` etc.
|
||||||
-
|
-
|
||||||
- ✅︎
|
- ✅︎
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`PixtralForConditionalGeneration`
|
* - `PixtralForConditionalGeneration`
|
||||||
- Pixtral
|
- Pixtral
|
||||||
- T + I\ :sup:`+`
|
- T + I<sup>+</sup>
|
||||||
- :code:`mistralai/Pixtral-12B-2409`, :code:`mistral-community/pixtral-12b` etc.
|
- `mistralai/Pixtral-12B-2409`, `mistral-community/pixtral-12b` etc.
|
||||||
-
|
-
|
||||||
- ✅︎
|
- ✅︎
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`QWenLMHeadModel`
|
* - `QWenLMHeadModel`
|
||||||
- Qwen-VL
|
- Qwen-VL
|
||||||
- T + I\ :sup:`E+`
|
- T + I<sup>E+</sup>
|
||||||
- :code:`Qwen/Qwen-VL`, :code:`Qwen/Qwen-VL-Chat`, etc.
|
- `Qwen/Qwen-VL`, `Qwen/Qwen-VL-Chat`, etc.
|
||||||
- ✅︎
|
- ✅︎
|
||||||
- ✅︎
|
- ✅︎
|
||||||
-
|
-
|
||||||
* - :code:`Qwen2AudioForConditionalGeneration`
|
* - `Qwen2AudioForConditionalGeneration`
|
||||||
- Qwen2-Audio
|
- Qwen2-Audio
|
||||||
- T + A\ :sup:`+`
|
- T + A<sup>+</sup>
|
||||||
- :code:`Qwen/Qwen2-Audio-7B-Instruct`
|
- `Qwen/Qwen2-Audio-7B-Instruct`
|
||||||
-
|
-
|
||||||
- ✅︎
|
- ✅︎
|
||||||
-
|
-
|
||||||
* - :code:`Qwen2VLForConditionalGeneration`
|
* - `Qwen2VLForConditionalGeneration`
|
||||||
- Qwen2-VL
|
- Qwen2-VL
|
||||||
- T + I\ :sup:`E+` + V\ :sup:`E+`
|
- T + I<sup>E+</sup> + V<sup>E+</sup>
|
||||||
- :code:`Qwen/QVQ-72B-Preview`, :code:`Qwen/Qwen2-VL-7B-Instruct`, :code:`Qwen/Qwen2-VL-72B-Instruct`, etc.
|
- `Qwen/QVQ-72B-Preview`, `Qwen/Qwen2-VL-7B-Instruct`, `Qwen/Qwen2-VL-72B-Instruct`, etc.
|
||||||
- ✅︎
|
- ✅︎
|
||||||
- ✅︎
|
- ✅︎
|
||||||
-
|
-
|
||||||
* - :code:`UltravoxModel`
|
* - `UltravoxModel`
|
||||||
- Ultravox
|
- Ultravox
|
||||||
- T + A\ :sup:`E+`
|
- T + A<sup>E+</sup>
|
||||||
- :code:`fixie-ai/ultravox-v0_3`
|
- `fixie-ai/ultravox-v0_3`
|
||||||
-
|
-
|
||||||
- ✅︎
|
- ✅︎
|
||||||
-
|
-
|
||||||
```
|
```
|
||||||
|
|
||||||
```{eval-rst}
|
<sup>E</sup> Pre-computed embeddings can be inputted for this modality.
|
||||||
:sup:`E` Pre-computed embeddings can be inputted for this modality.
|
<sup>+</sup> Multiple items can be inputted per text prompt for this modality.
|
||||||
|
|
||||||
:sup:`+` Multiple items can be inputted per text prompt for this modality.
|
|
||||||
```
|
|
||||||
|
|
||||||
````{important}
|
````{important}
|
||||||
To enable multiple multi-modal items per text prompt, you have to set {code}`limit_mm_per_prompt` (offline inference)
|
To enable multiple multi-modal items per text prompt, you have to set {code}`limit_mm_per_prompt` (offline inference)
|
||||||
@ -787,8 +778,7 @@ To get the best results, you should use pooling models that are specifically tra
|
|||||||
|
|
||||||
The following table lists those that are tested in vLLM.
|
The following table lists those that are tested in vLLM.
|
||||||
|
|
||||||
```{eval-rst}
|
```{list-table}
|
||||||
.. list-table::
|
|
||||||
:widths: 25 25 15 25 5 5
|
:widths: 25 25 15 25 5 5
|
||||||
:header-rows: 1
|
:header-rows: 1
|
||||||
|
|
||||||
@ -796,29 +786,29 @@ The following table lists those that are tested in vLLM.
|
|||||||
- Models
|
- Models
|
||||||
- Inputs
|
- Inputs
|
||||||
- Example HF Models
|
- Example HF Models
|
||||||
- :ref:`LoRA <lora-adapter>`
|
- [LoRA](#lora-adapter)
|
||||||
- :ref:`PP <distributed-serving>`
|
- [PP](#distributed-serving)
|
||||||
* - :code:`LlavaNextForConditionalGeneration`
|
* - `LlavaNextForConditionalGeneration`
|
||||||
- LLaVA-NeXT-based
|
- LLaVA-NeXT-based
|
||||||
- T / I
|
- T / I
|
||||||
- :code:`royokong/e5-v`
|
- `royokong/e5-v`
|
||||||
-
|
-
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`Phi3VForCausalLM`
|
* - `Phi3VForCausalLM`
|
||||||
- Phi-3-Vision-based
|
- Phi-3-Vision-based
|
||||||
- T + I
|
- T + I
|
||||||
- :code:`TIGER-Lab/VLM2Vec-Full`
|
- `TIGER-Lab/VLM2Vec-Full`
|
||||||
- 🚧
|
- 🚧
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`Qwen2VLForConditionalGeneration`
|
* - `Qwen2VLForConditionalGeneration`
|
||||||
- Qwen2-VL-based
|
- Qwen2-VL-based
|
||||||
- T + I
|
- T + I
|
||||||
- :code:`MrLight/dse-qwen2-2b-mrl-v1`
|
- `MrLight/dse-qwen2-2b-mrl-v1`
|
||||||
-
|
-
|
||||||
- ✅︎
|
- ✅︎
|
||||||
```
|
```
|
||||||
|
|
||||||
______________________________________________________________________
|
_________________
|
||||||
|
|
||||||
# Model Support Policy
|
# Model Support Policy
|
||||||
|
|
||||||
|
@ -4,8 +4,7 @@
|
|||||||
|
|
||||||
The table below shows the compatibility of various quantization implementations with different hardware platforms in vLLM:
|
The table below shows the compatibility of various quantization implementations with different hardware platforms in vLLM:
|
||||||
|
|
||||||
```{eval-rst}
|
```{list-table}
|
||||||
.. list-table::
|
|
||||||
:header-rows: 1
|
:header-rows: 1
|
||||||
:widths: 20 8 8 8 8 8 8 8 8 8 8
|
:widths: 20 8 8 8 8 8 8 8 8 8 8
|
||||||
|
|
||||||
|
@ -43,8 +43,7 @@ chart **including persistent volumes** and deletes the release.
|
|||||||
|
|
||||||
## Values
|
## Values
|
||||||
|
|
||||||
```{eval-rst}
|
```{list-table}
|
||||||
.. list-table:: Values
|
|
||||||
:widths: 25 25 25 25
|
:widths: 25 25 25 25
|
||||||
:header-rows: 1
|
:header-rows: 1
|
||||||
|
|
||||||
|
Loading…
x
Reference in New Issue
Block a user