[Doc] Convert list tables to MyST (#11594)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
This commit is contained in:
parent
4fb8e329fd
commit
32b4c63f02
@ -197,4 +197,4 @@ if __name__ == '__main__':
|
|||||||
## Known Issues
|
## Known Issues
|
||||||
|
|
||||||
- In `v0.5.2`, `v0.5.3`, and `v0.5.3.post1`, there is a bug caused by [zmq](https://github.com/zeromq/pyzmq/issues/2000) , which can occasionally cause vLLM to hang depending on the machine configuration. The solution is to upgrade to the latest version of `vllm` to include the [fix](gh-pr:6759).
|
- In `v0.5.2`, `v0.5.3`, and `v0.5.3.post1`, there is a bug caused by [zmq](https://github.com/zeromq/pyzmq/issues/2000) , which can occasionally cause vLLM to hang depending on the machine configuration. The solution is to upgrade to the latest version of `vllm` to include the [fix](gh-pr:6759).
|
||||||
- To circumvent a NCCL [bug](https://github.com/NVIDIA/nccl/issues/1234) , all vLLM processes will set an environment variable ``NCCL_CUMEM_ENABLE=0`` to disable NCCL's ``cuMem`` allocator. It does not affect performance but only gives memory benefits. When external processes want to set up a NCCL connection with vLLM's processes, they should also set this environment variable, otherwise, inconsistent environment setup will cause NCCL to hang or crash, as observed in the [RLHF integration](https://github.com/OpenRLHF/OpenRLHF/pull/604) and the [discussion](gh-issue:5723#issuecomment-2554389656) .
|
- To circumvent a NCCL [bug](https://github.com/NVIDIA/nccl/issues/1234) , all vLLM processes will set an environment variable `NCCL_CUMEM_ENABLE=0` to disable NCCL's `cuMem` allocator. It does not affect performance but only gives memory benefits. When external processes want to set up a NCCL connection with vLLM's processes, they should also set this environment variable, otherwise, inconsistent environment setup will cause NCCL to hang or crash, as observed in the [RLHF integration](https://github.com/OpenRLHF/OpenRLHF/pull/604) and the [discussion](gh-issue:5723#issuecomment-2554389656) .
|
||||||
|
@ -141,24 +141,23 @@ Gaudi2 devices. Configurations that are not listed may or may not work.
|
|||||||
|
|
||||||
Currently in vLLM for HPU we support four execution modes, depending on selected HPU PyTorch Bridge backend (via `PT_HPU_LAZY_MODE` environment variable), and `--enforce-eager` flag.
|
Currently in vLLM for HPU we support four execution modes, depending on selected HPU PyTorch Bridge backend (via `PT_HPU_LAZY_MODE` environment variable), and `--enforce-eager` flag.
|
||||||
|
|
||||||
```{eval-rst}
|
```{list-table} vLLM execution modes
|
||||||
.. list-table:: vLLM execution modes
|
:widths: 25 25 50
|
||||||
:widths: 25 25 50
|
:header-rows: 1
|
||||||
:header-rows: 1
|
|
||||||
|
|
||||||
* - ``PT_HPU_LAZY_MODE``
|
* - `PT_HPU_LAZY_MODE`
|
||||||
- ``enforce_eager``
|
- `enforce_eager`
|
||||||
- execution mode
|
- execution mode
|
||||||
* - 0
|
* - 0
|
||||||
- 0
|
- 0
|
||||||
- torch.compile
|
- torch.compile
|
||||||
* - 0
|
* - 0
|
||||||
- 1
|
- 1
|
||||||
- PyTorch eager mode
|
- PyTorch eager mode
|
||||||
* - 1
|
* - 1
|
||||||
- 0
|
- 0
|
||||||
- HPU Graphs
|
- HPU Graphs
|
||||||
* - 1
|
* - 1
|
||||||
- 1
|
- 1
|
||||||
- PyTorch lazy mode
|
- PyTorch lazy mode
|
||||||
```
|
```
|
||||||
|
@ -68,30 +68,29 @@ gcloud alpha compute tpus queued-resources create QUEUED_RESOURCE_ID \
|
|||||||
--service-account SERVICE_ACCOUNT
|
--service-account SERVICE_ACCOUNT
|
||||||
```
|
```
|
||||||
|
|
||||||
```{eval-rst}
|
```{list-table} Parameter descriptions
|
||||||
.. list-table:: Parameter descriptions
|
:header-rows: 1
|
||||||
:header-rows: 1
|
|
||||||
|
|
||||||
* - Parameter name
|
* - Parameter name
|
||||||
- Description
|
- Description
|
||||||
* - QUEUED_RESOURCE_ID
|
* - QUEUED_RESOURCE_ID
|
||||||
- The user-assigned ID of the queued resource request.
|
- The user-assigned ID of the queued resource request.
|
||||||
* - TPU_NAME
|
* - TPU_NAME
|
||||||
- The user-assigned name of the TPU which is created when the queued
|
- The user-assigned name of the TPU which is created when the queued
|
||||||
resource request is allocated.
|
resource request is allocated.
|
||||||
* - PROJECT_ID
|
* - PROJECT_ID
|
||||||
- Your Google Cloud project
|
- Your Google Cloud project
|
||||||
* - ZONE
|
* - ZONE
|
||||||
- The GCP zone where you want to create your Cloud TPU. The value you use
|
- The GCP zone where you want to create your Cloud TPU. The value you use
|
||||||
depends on the version of TPUs you are using. For more information, see
|
depends on the version of TPUs you are using. For more information, see
|
||||||
`TPU regions and zones <https://cloud.google.com/tpu/docs/regions-zones>`_
|
`TPU regions and zones <https://cloud.google.com/tpu/docs/regions-zones>`_
|
||||||
* - ACCELERATOR_TYPE
|
* - ACCELERATOR_TYPE
|
||||||
- The TPU version you want to use. Specify the TPU version, for example
|
- The TPU version you want to use. Specify the TPU version, for example
|
||||||
`v5litepod-4` specifies a v5e TPU with 4 cores. For more information,
|
`v5litepod-4` specifies a v5e TPU with 4 cores. For more information,
|
||||||
see `TPU versions <https://cloud.devsite.corp.google.com/tpu/docs/system-architecture-tpu-vm#versions>`_.
|
see `TPU versions <https://cloud.devsite.corp.google.com/tpu/docs/system-architecture-tpu-vm#versions>`_.
|
||||||
* - RUNTIME_VERSION
|
* - RUNTIME_VERSION
|
||||||
- The TPU VM runtime version to use. For more information see `TPU VM images <https://cloud.google.com/tpu/docs/runtimes>`_.
|
- The TPU VM runtime version to use. For more information see `TPU VM images <https://cloud.google.com/tpu/docs/runtimes>`_.
|
||||||
* - SERVICE_ACCOUNT
|
* - SERVICE_ACCOUNT
|
||||||
- The email address for your service account. You can find it in the IAM
|
- The email address for your service account. You can find it in the IAM
|
||||||
Cloud Console under *Service Accounts*. For example:
|
Cloud Console under *Service Accounts*. For example:
|
||||||
`tpu-service-account@<your_project_ID>.iam.gserviceaccount.com`
|
`tpu-service-account@<your_project_ID>.iam.gserviceaccount.com`
|
||||||
|
@ -72,289 +72,288 @@ See [this page](#generative-models) for more information on how to use generativ
|
|||||||
|
|
||||||
#### Text Generation (`--task generate`)
|
#### Text Generation (`--task generate`)
|
||||||
|
|
||||||
```{eval-rst}
|
```{list-table}
|
||||||
.. list-table::
|
:widths: 25 25 50 5 5
|
||||||
:widths: 25 25 50 5 5
|
:header-rows: 1
|
||||||
:header-rows: 1
|
|
||||||
|
|
||||||
* - Architecture
|
* - Architecture
|
||||||
- Models
|
- Models
|
||||||
- Example HF Models
|
- Example HF Models
|
||||||
- :ref:`LoRA <lora-adapter>`
|
- [LoRA](#lora-adapter)
|
||||||
- :ref:`PP <distributed-serving>`
|
- [PP](#distributed-serving)
|
||||||
* - :code:`AquilaForCausalLM`
|
* - `AquilaForCausalLM`
|
||||||
- Aquila, Aquila2
|
- Aquila, Aquila2
|
||||||
- :code:`BAAI/Aquila-7B`, :code:`BAAI/AquilaChat-7B`, etc.
|
- `BAAI/Aquila-7B`, `BAAI/AquilaChat-7B`, etc.
|
||||||
- ✅︎
|
- ✅︎
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`ArcticForCausalLM`
|
* - `ArcticForCausalLM`
|
||||||
- Arctic
|
- Arctic
|
||||||
- :code:`Snowflake/snowflake-arctic-base`, :code:`Snowflake/snowflake-arctic-instruct`, etc.
|
- `Snowflake/snowflake-arctic-base`, `Snowflake/snowflake-arctic-instruct`, etc.
|
||||||
-
|
-
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`BaiChuanForCausalLM`
|
* - `BaiChuanForCausalLM`
|
||||||
- Baichuan2, Baichuan
|
- Baichuan2, Baichuan
|
||||||
- :code:`baichuan-inc/Baichuan2-13B-Chat`, :code:`baichuan-inc/Baichuan-7B`, etc.
|
- `baichuan-inc/Baichuan2-13B-Chat`, `baichuan-inc/Baichuan-7B`, etc.
|
||||||
- ✅︎
|
- ✅︎
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`BloomForCausalLM`
|
* - `BloomForCausalLM`
|
||||||
- BLOOM, BLOOMZ, BLOOMChat
|
- BLOOM, BLOOMZ, BLOOMChat
|
||||||
- :code:`bigscience/bloom`, :code:`bigscience/bloomz`, etc.
|
- `bigscience/bloom`, `bigscience/bloomz`, etc.
|
||||||
-
|
-
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`BartForConditionalGeneration`
|
* - `BartForConditionalGeneration`
|
||||||
- BART
|
- BART
|
||||||
- :code:`facebook/bart-base`, :code:`facebook/bart-large-cnn`, etc.
|
- `facebook/bart-base`, `facebook/bart-large-cnn`, etc.
|
||||||
-
|
-
|
||||||
-
|
-
|
||||||
* - :code:`ChatGLMModel`
|
* - `ChatGLMModel`
|
||||||
- ChatGLM
|
- ChatGLM
|
||||||
- :code:`THUDM/chatglm2-6b`, :code:`THUDM/chatglm3-6b`, etc.
|
- `THUDM/chatglm2-6b`, `THUDM/chatglm3-6b`, etc.
|
||||||
- ✅︎
|
- ✅︎
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`CohereForCausalLM`, :code:`Cohere2ForCausalLM`
|
* - `CohereForCausalLM`, `Cohere2ForCausalLM`
|
||||||
- Command-R
|
- Command-R
|
||||||
- :code:`CohereForAI/c4ai-command-r-v01`, :code:`CohereForAI/c4ai-command-r7b-12-2024`, etc.
|
- `CohereForAI/c4ai-command-r-v01`, `CohereForAI/c4ai-command-r7b-12-2024`, etc.
|
||||||
- ✅︎
|
- ✅︎
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`DbrxForCausalLM`
|
* - `DbrxForCausalLM`
|
||||||
- DBRX
|
- DBRX
|
||||||
- :code:`databricks/dbrx-base`, :code:`databricks/dbrx-instruct`, etc.
|
- `databricks/dbrx-base`, `databricks/dbrx-instruct`, etc.
|
||||||
-
|
-
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`DeciLMForCausalLM`
|
* - `DeciLMForCausalLM`
|
||||||
- DeciLM
|
- DeciLM
|
||||||
- :code:`Deci/DeciLM-7B`, :code:`Deci/DeciLM-7B-instruct`, etc.
|
- `Deci/DeciLM-7B`, `Deci/DeciLM-7B-instruct`, etc.
|
||||||
-
|
-
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`DeepseekForCausalLM`
|
* - `DeepseekForCausalLM`
|
||||||
- DeepSeek
|
- DeepSeek
|
||||||
- :code:`deepseek-ai/deepseek-llm-67b-base`, :code:`deepseek-ai/deepseek-llm-7b-chat` etc.
|
- `deepseek-ai/deepseek-llm-67b-base`, `deepseek-ai/deepseek-llm-7b-chat` etc.
|
||||||
-
|
-
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`DeepseekV2ForCausalLM`
|
* - `DeepseekV2ForCausalLM`
|
||||||
- DeepSeek-V2
|
- DeepSeek-V2
|
||||||
- :code:`deepseek-ai/DeepSeek-V2`, :code:`deepseek-ai/DeepSeek-V2-Chat` etc.
|
- `deepseek-ai/DeepSeek-V2`, `deepseek-ai/DeepSeek-V2-Chat` etc.
|
||||||
-
|
-
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`DeepseekV3ForCausalLM`
|
* - `DeepseekV3ForCausalLM`
|
||||||
- DeepSeek-V3
|
- DeepSeek-V3
|
||||||
- :code:`deepseek-ai/DeepSeek-V3-Base`, :code:`deepseek-ai/DeepSeek-V3` etc.
|
- `deepseek-ai/DeepSeek-V3-Base`, `deepseek-ai/DeepSeek-V3` etc.
|
||||||
-
|
-
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`ExaoneForCausalLM`
|
* - `ExaoneForCausalLM`
|
||||||
- EXAONE-3
|
- EXAONE-3
|
||||||
- :code:`LGAI-EXAONE/EXAONE-3.0-7.8B-Instruct`, etc.
|
- `LGAI-EXAONE/EXAONE-3.0-7.8B-Instruct`, etc.
|
||||||
- ✅︎
|
- ✅︎
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`FalconForCausalLM`
|
* - `FalconForCausalLM`
|
||||||
- Falcon
|
- Falcon
|
||||||
- :code:`tiiuae/falcon-7b`, :code:`tiiuae/falcon-40b`, :code:`tiiuae/falcon-rw-7b`, etc.
|
- `tiiuae/falcon-7b`, `tiiuae/falcon-40b`, `tiiuae/falcon-rw-7b`, etc.
|
||||||
-
|
-
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`FalconMambaForCausalLM`
|
* - `FalconMambaForCausalLM`
|
||||||
- FalconMamba
|
- FalconMamba
|
||||||
- :code:`tiiuae/falcon-mamba-7b`, :code:`tiiuae/falcon-mamba-7b-instruct`, etc.
|
- `tiiuae/falcon-mamba-7b`, `tiiuae/falcon-mamba-7b-instruct`, etc.
|
||||||
- ✅︎
|
- ✅︎
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`GemmaForCausalLM`
|
* - `GemmaForCausalLM`
|
||||||
- Gemma
|
- Gemma
|
||||||
- :code:`google/gemma-2b`, :code:`google/gemma-7b`, etc.
|
- `google/gemma-2b`, `google/gemma-7b`, etc.
|
||||||
- ✅︎
|
- ✅︎
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`Gemma2ForCausalLM`
|
* - `Gemma2ForCausalLM`
|
||||||
- Gemma2
|
- Gemma2
|
||||||
- :code:`google/gemma-2-9b`, :code:`google/gemma-2-27b`, etc.
|
- `google/gemma-2-9b`, `google/gemma-2-27b`, etc.
|
||||||
- ✅︎
|
- ✅︎
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`GlmForCausalLM`
|
* - `GlmForCausalLM`
|
||||||
- GLM-4
|
- GLM-4
|
||||||
- :code:`THUDM/glm-4-9b-chat-hf`, etc.
|
- `THUDM/glm-4-9b-chat-hf`, etc.
|
||||||
- ✅︎
|
- ✅︎
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`GPT2LMHeadModel`
|
* - `GPT2LMHeadModel`
|
||||||
- GPT-2
|
- GPT-2
|
||||||
- :code:`gpt2`, :code:`gpt2-xl`, etc.
|
- `gpt2`, `gpt2-xl`, etc.
|
||||||
-
|
-
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`GPTBigCodeForCausalLM`
|
* - `GPTBigCodeForCausalLM`
|
||||||
- StarCoder, SantaCoder, WizardCoder
|
- StarCoder, SantaCoder, WizardCoder
|
||||||
- :code:`bigcode/starcoder`, :code:`bigcode/gpt_bigcode-santacoder`, :code:`WizardLM/WizardCoder-15B-V1.0`, etc.
|
- `bigcode/starcoder`, `bigcode/gpt_bigcode-santacoder`, `WizardLM/WizardCoder-15B-V1.0`, etc.
|
||||||
- ✅︎
|
- ✅︎
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`GPTJForCausalLM`
|
* - `GPTJForCausalLM`
|
||||||
- GPT-J
|
- GPT-J
|
||||||
- :code:`EleutherAI/gpt-j-6b`, :code:`nomic-ai/gpt4all-j`, etc.
|
- `EleutherAI/gpt-j-6b`, `nomic-ai/gpt4all-j`, etc.
|
||||||
-
|
-
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`GPTNeoXForCausalLM`
|
* - `GPTNeoXForCausalLM`
|
||||||
- GPT-NeoX, Pythia, OpenAssistant, Dolly V2, StableLM
|
- GPT-NeoX, Pythia, OpenAssistant, Dolly V2, StableLM
|
||||||
- :code:`EleutherAI/gpt-neox-20b`, :code:`EleutherAI/pythia-12b`, :code:`OpenAssistant/oasst-sft-4-pythia-12b-epoch-3.5`, :code:`databricks/dolly-v2-12b`, :code:`stabilityai/stablelm-tuned-alpha-7b`, etc.
|
- `EleutherAI/gpt-neox-20b`, `EleutherAI/pythia-12b`, `OpenAssistant/oasst-sft-4-pythia-12b-epoch-3.5`, `databricks/dolly-v2-12b`, `stabilityai/stablelm-tuned-alpha-7b`, etc.
|
||||||
-
|
-
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`GraniteForCausalLM`
|
* - `GraniteForCausalLM`
|
||||||
- Granite 3.0, Granite 3.1, PowerLM
|
- Granite 3.0, Granite 3.1, PowerLM
|
||||||
- :code:`ibm-granite/granite-3.0-2b-base`, :code:`ibm-granite/granite-3.1-8b-instruct`, :code:`ibm/PowerLM-3b`, etc.
|
- `ibm-granite/granite-3.0-2b-base`, `ibm-granite/granite-3.1-8b-instruct`, `ibm/PowerLM-3b`, etc.
|
||||||
- ✅︎
|
- ✅︎
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`GraniteMoeForCausalLM`
|
* - `GraniteMoeForCausalLM`
|
||||||
- Granite 3.0 MoE, PowerMoE
|
- Granite 3.0 MoE, PowerMoE
|
||||||
- :code:`ibm-granite/granite-3.0-1b-a400m-base`, :code:`ibm-granite/granite-3.0-3b-a800m-instruct`, :code:`ibm/PowerMoE-3b`, etc.
|
- `ibm-granite/granite-3.0-1b-a400m-base`, `ibm-granite/granite-3.0-3b-a800m-instruct`, `ibm/PowerMoE-3b`, etc.
|
||||||
- ✅︎
|
- ✅︎
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`GritLM`
|
* - `GritLM`
|
||||||
- GritLM
|
- GritLM
|
||||||
- :code:`parasail-ai/GritLM-7B-vllm`.
|
- `parasail-ai/GritLM-7B-vllm`.
|
||||||
- ✅︎
|
- ✅︎
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`InternLMForCausalLM`
|
* - `InternLMForCausalLM`
|
||||||
- InternLM
|
- InternLM
|
||||||
- :code:`internlm/internlm-7b`, :code:`internlm/internlm-chat-7b`, etc.
|
- `internlm/internlm-7b`, `internlm/internlm-chat-7b`, etc.
|
||||||
- ✅︎
|
- ✅︎
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`InternLM2ForCausalLM`
|
* - `InternLM2ForCausalLM`
|
||||||
- InternLM2
|
- InternLM2
|
||||||
- :code:`internlm/internlm2-7b`, :code:`internlm/internlm2-chat-7b`, etc.
|
- `internlm/internlm2-7b`, `internlm/internlm2-chat-7b`, etc.
|
||||||
- ✅︎
|
- ✅︎
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`JAISLMHeadModel`
|
* - `JAISLMHeadModel`
|
||||||
- Jais
|
- Jais
|
||||||
- :code:`inceptionai/jais-13b`, :code:`inceptionai/jais-13b-chat`, :code:`inceptionai/jais-30b-v3`, :code:`inceptionai/jais-30b-chat-v3`, etc.
|
- `inceptionai/jais-13b`, `inceptionai/jais-13b-chat`, `inceptionai/jais-30b-v3`, `inceptionai/jais-30b-chat-v3`, etc.
|
||||||
-
|
-
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`JambaForCausalLM`
|
* - `JambaForCausalLM`
|
||||||
- Jamba
|
- Jamba
|
||||||
- :code:`ai21labs/AI21-Jamba-1.5-Large`, :code:`ai21labs/AI21-Jamba-1.5-Mini`, :code:`ai21labs/Jamba-v0.1`, etc.
|
- `ai21labs/AI21-Jamba-1.5-Large`, `ai21labs/AI21-Jamba-1.5-Mini`, `ai21labs/Jamba-v0.1`, etc.
|
||||||
- ✅︎
|
- ✅︎
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`LlamaForCausalLM`
|
* - `LlamaForCausalLM`
|
||||||
- Llama 3.1, Llama 3, Llama 2, LLaMA, Yi
|
- Llama 3.1, Llama 3, Llama 2, LLaMA, Yi
|
||||||
- :code:`meta-llama/Meta-Llama-3.1-405B-Instruct`, :code:`meta-llama/Meta-Llama-3.1-70B`, :code:`meta-llama/Meta-Llama-3-70B-Instruct`, :code:`meta-llama/Llama-2-70b-hf`, :code:`01-ai/Yi-34B`, etc.
|
- `meta-llama/Meta-Llama-3.1-405B-Instruct`, `meta-llama/Meta-Llama-3.1-70B`, `meta-llama/Meta-Llama-3-70B-Instruct`, `meta-llama/Llama-2-70b-hf`, `01-ai/Yi-34B`, etc.
|
||||||
- ✅︎
|
- ✅︎
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`MambaForCausalLM`
|
* - `MambaForCausalLM`
|
||||||
- Mamba
|
- Mamba
|
||||||
- :code:`state-spaces/mamba-130m-hf`, :code:`state-spaces/mamba-790m-hf`, :code:`state-spaces/mamba-2.8b-hf`, etc.
|
- `state-spaces/mamba-130m-hf`, `state-spaces/mamba-790m-hf`, `state-spaces/mamba-2.8b-hf`, etc.
|
||||||
-
|
-
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`MiniCPMForCausalLM`
|
* - `MiniCPMForCausalLM`
|
||||||
- MiniCPM
|
- MiniCPM
|
||||||
- :code:`openbmb/MiniCPM-2B-sft-bf16`, :code:`openbmb/MiniCPM-2B-dpo-bf16`, :code:`openbmb/MiniCPM-S-1B-sft`, etc.
|
- `openbmb/MiniCPM-2B-sft-bf16`, `openbmb/MiniCPM-2B-dpo-bf16`, `openbmb/MiniCPM-S-1B-sft`, etc.
|
||||||
- ✅︎
|
- ✅︎
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`MiniCPM3ForCausalLM`
|
* - `MiniCPM3ForCausalLM`
|
||||||
- MiniCPM3
|
- MiniCPM3
|
||||||
- :code:`openbmb/MiniCPM3-4B`, etc.
|
- `openbmb/MiniCPM3-4B`, etc.
|
||||||
- ✅︎
|
- ✅︎
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`MistralForCausalLM`
|
* - `MistralForCausalLM`
|
||||||
- Mistral, Mistral-Instruct
|
- Mistral, Mistral-Instruct
|
||||||
- :code:`mistralai/Mistral-7B-v0.1`, :code:`mistralai/Mistral-7B-Instruct-v0.1`, etc.
|
- `mistralai/Mistral-7B-v0.1`, `mistralai/Mistral-7B-Instruct-v0.1`, etc.
|
||||||
- ✅︎
|
- ✅︎
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`MixtralForCausalLM`
|
* - `MixtralForCausalLM`
|
||||||
- Mixtral-8x7B, Mixtral-8x7B-Instruct
|
- Mixtral-8x7B, Mixtral-8x7B-Instruct
|
||||||
- :code:`mistralai/Mixtral-8x7B-v0.1`, :code:`mistralai/Mixtral-8x7B-Instruct-v0.1`, :code:`mistral-community/Mixtral-8x22B-v0.1`, etc.
|
- `mistralai/Mixtral-8x7B-v0.1`, `mistralai/Mixtral-8x7B-Instruct-v0.1`, `mistral-community/Mixtral-8x22B-v0.1`, etc.
|
||||||
- ✅︎
|
- ✅︎
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`MPTForCausalLM`
|
* - `MPTForCausalLM`
|
||||||
- MPT, MPT-Instruct, MPT-Chat, MPT-StoryWriter
|
- MPT, MPT-Instruct, MPT-Chat, MPT-StoryWriter
|
||||||
- :code:`mosaicml/mpt-7b`, :code:`mosaicml/mpt-7b-storywriter`, :code:`mosaicml/mpt-30b`, etc.
|
- `mosaicml/mpt-7b`, `mosaicml/mpt-7b-storywriter`, `mosaicml/mpt-30b`, etc.
|
||||||
-
|
-
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`NemotronForCausalLM`
|
* - `NemotronForCausalLM`
|
||||||
- Nemotron-3, Nemotron-4, Minitron
|
- Nemotron-3, Nemotron-4, Minitron
|
||||||
- :code:`nvidia/Minitron-8B-Base`, :code:`mgoin/Nemotron-4-340B-Base-hf-FP8`, etc.
|
- `nvidia/Minitron-8B-Base`, `mgoin/Nemotron-4-340B-Base-hf-FP8`, etc.
|
||||||
- ✅︎
|
- ✅︎
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`OLMoForCausalLM`
|
* - `OLMoForCausalLM`
|
||||||
- OLMo
|
- OLMo
|
||||||
- :code:`allenai/OLMo-1B-hf`, :code:`allenai/OLMo-7B-hf`, etc.
|
- `allenai/OLMo-1B-hf`, `allenai/OLMo-7B-hf`, etc.
|
||||||
-
|
-
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`OLMo2ForCausalLM`
|
* - `OLMo2ForCausalLM`
|
||||||
- OLMo2
|
- OLMo2
|
||||||
- :code:`allenai/OLMo2-7B-1124`, etc.
|
- `allenai/OLMo2-7B-1124`, etc.
|
||||||
-
|
-
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`OLMoEForCausalLM`
|
* - `OLMoEForCausalLM`
|
||||||
- OLMoE
|
- OLMoE
|
||||||
- :code:`allenai/OLMoE-1B-7B-0924`, :code:`allenai/OLMoE-1B-7B-0924-Instruct`, etc.
|
- `allenai/OLMoE-1B-7B-0924`, `allenai/OLMoE-1B-7B-0924-Instruct`, etc.
|
||||||
- ✅︎
|
- ✅︎
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`OPTForCausalLM`
|
* - `OPTForCausalLM`
|
||||||
- OPT, OPT-IML
|
- OPT, OPT-IML
|
||||||
- :code:`facebook/opt-66b`, :code:`facebook/opt-iml-max-30b`, etc.
|
- `facebook/opt-66b`, `facebook/opt-iml-max-30b`, etc.
|
||||||
-
|
-
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`OrionForCausalLM`
|
* - `OrionForCausalLM`
|
||||||
- Orion
|
- Orion
|
||||||
- :code:`OrionStarAI/Orion-14B-Base`, :code:`OrionStarAI/Orion-14B-Chat`, etc.
|
- `OrionStarAI/Orion-14B-Base`, `OrionStarAI/Orion-14B-Chat`, etc.
|
||||||
-
|
-
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`PhiForCausalLM`
|
* - `PhiForCausalLM`
|
||||||
- Phi
|
- Phi
|
||||||
- :code:`microsoft/phi-1_5`, :code:`microsoft/phi-2`, etc.
|
- `microsoft/phi-1_5`, `microsoft/phi-2`, etc.
|
||||||
- ✅︎
|
- ✅︎
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`Phi3ForCausalLM`
|
* - `Phi3ForCausalLM`
|
||||||
- Phi-3
|
- Phi-3
|
||||||
- :code:`microsoft/Phi-3-mini-4k-instruct`, :code:`microsoft/Phi-3-mini-128k-instruct`, :code:`microsoft/Phi-3-medium-128k-instruct`, etc.
|
- `microsoft/Phi-3-mini-4k-instruct`, `microsoft/Phi-3-mini-128k-instruct`, `microsoft/Phi-3-medium-128k-instruct`, etc.
|
||||||
- ✅︎
|
- ✅︎
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`Phi3SmallForCausalLM`
|
* - `Phi3SmallForCausalLM`
|
||||||
- Phi-3-Small
|
- Phi-3-Small
|
||||||
- :code:`microsoft/Phi-3-small-8k-instruct`, :code:`microsoft/Phi-3-small-128k-instruct`, etc.
|
- `microsoft/Phi-3-small-8k-instruct`, `microsoft/Phi-3-small-128k-instruct`, etc.
|
||||||
-
|
-
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`PhiMoEForCausalLM`
|
* - `PhiMoEForCausalLM`
|
||||||
- Phi-3.5-MoE
|
- Phi-3.5-MoE
|
||||||
- :code:`microsoft/Phi-3.5-MoE-instruct`, etc.
|
- `microsoft/Phi-3.5-MoE-instruct`, etc.
|
||||||
- ✅︎
|
- ✅︎
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`PersimmonForCausalLM`
|
* - `PersimmonForCausalLM`
|
||||||
- Persimmon
|
- Persimmon
|
||||||
- :code:`adept/persimmon-8b-base`, :code:`adept/persimmon-8b-chat`, etc.
|
- `adept/persimmon-8b-base`, `adept/persimmon-8b-chat`, etc.
|
||||||
-
|
-
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`QWenLMHeadModel`
|
* - `QWenLMHeadModel`
|
||||||
- Qwen
|
- Qwen
|
||||||
- :code:`Qwen/Qwen-7B`, :code:`Qwen/Qwen-7B-Chat`, etc.
|
- `Qwen/Qwen-7B`, `Qwen/Qwen-7B-Chat`, etc.
|
||||||
- ✅︎
|
- ✅︎
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`Qwen2ForCausalLM`
|
* - `Qwen2ForCausalLM`
|
||||||
- Qwen2
|
- Qwen2
|
||||||
- :code:`Qwen/QwQ-32B-Preview`, :code:`Qwen/Qwen2-7B-Instruct`, :code:`Qwen/Qwen2-7B`, etc.
|
- `Qwen/QwQ-32B-Preview`, `Qwen/Qwen2-7B-Instruct`, `Qwen/Qwen2-7B`, etc.
|
||||||
- ✅︎
|
- ✅︎
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`Qwen2MoeForCausalLM`
|
* - `Qwen2MoeForCausalLM`
|
||||||
- Qwen2MoE
|
- Qwen2MoE
|
||||||
- :code:`Qwen/Qwen1.5-MoE-A2.7B`, :code:`Qwen/Qwen1.5-MoE-A2.7B-Chat`, etc.
|
- `Qwen/Qwen1.5-MoE-A2.7B`, `Qwen/Qwen1.5-MoE-A2.7B-Chat`, etc.
|
||||||
-
|
-
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`StableLmForCausalLM`
|
* - `StableLmForCausalLM`
|
||||||
- StableLM
|
- StableLM
|
||||||
- :code:`stabilityai/stablelm-3b-4e1t`, :code:`stabilityai/stablelm-base-alpha-7b-v2`, etc.
|
- `stabilityai/stablelm-3b-4e1t`, `stabilityai/stablelm-base-alpha-7b-v2`, etc.
|
||||||
-
|
-
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`Starcoder2ForCausalLM`
|
* - `Starcoder2ForCausalLM`
|
||||||
- Starcoder2
|
- Starcoder2
|
||||||
- :code:`bigcode/starcoder2-3b`, :code:`bigcode/starcoder2-7b`, :code:`bigcode/starcoder2-15b`, etc.
|
- `bigcode/starcoder2-3b`, `bigcode/starcoder2-7b`, `bigcode/starcoder2-15b`, etc.
|
||||||
-
|
-
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`SolarForCausalLM`
|
* - `SolarForCausalLM`
|
||||||
- Solar Pro
|
- Solar Pro
|
||||||
- :code:`upstage/solar-pro-preview-instruct`, etc.
|
- `upstage/solar-pro-preview-instruct`, etc.
|
||||||
- ✅︎
|
- ✅︎
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`TeleChat2ForCausalLM`
|
* - `TeleChat2ForCausalLM`
|
||||||
- TeleChat2
|
- TeleChat2
|
||||||
- :code:`TeleAI/TeleChat2-3B`, :code:`TeleAI/TeleChat2-7B`, :code:`TeleAI/TeleChat2-35B`, etc.
|
- `TeleAI/TeleChat2-3B`, `TeleAI/TeleChat2-7B`, `TeleAI/TeleChat2-35B`, etc.
|
||||||
- ✅︎
|
- ✅︎
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`XverseForCausalLM`
|
* - `XverseForCausalLM`
|
||||||
- XVERSE
|
- XVERSE
|
||||||
- :code:`xverse/XVERSE-7B-Chat`, :code:`xverse/XVERSE-13B-Chat`, :code:`xverse/XVERSE-65B-Chat`, etc.
|
- `xverse/XVERSE-7B-Chat`, `xverse/XVERSE-13B-Chat`, `xverse/XVERSE-65B-Chat`, etc.
|
||||||
- ✅︎
|
- ✅︎
|
||||||
- ✅︎
|
- ✅︎
|
||||||
```
|
```
|
||||||
@ -374,49 +373,48 @@ you should explicitly specify the task type to ensure that the model is used in
|
|||||||
|
|
||||||
#### Text Embedding (`--task embed`)
|
#### Text Embedding (`--task embed`)
|
||||||
|
|
||||||
```{eval-rst}
|
```{list-table}
|
||||||
.. list-table::
|
:widths: 25 25 50 5 5
|
||||||
:widths: 25 25 50 5 5
|
:header-rows: 1
|
||||||
:header-rows: 1
|
|
||||||
|
|
||||||
* - Architecture
|
* - Architecture
|
||||||
- Models
|
- Models
|
||||||
- Example HF Models
|
- Example HF Models
|
||||||
- :ref:`LoRA <lora-adapter>`
|
- [LoRA](#lora-adapter)
|
||||||
- :ref:`PP <distributed-serving>`
|
- [PP](#distributed-serving)
|
||||||
* - :code:`BertModel`
|
* - `BertModel`
|
||||||
- BERT-based
|
- BERT-based
|
||||||
- :code:`BAAI/bge-base-en-v1.5`, etc.
|
- `BAAI/bge-base-en-v1.5`, etc.
|
||||||
-
|
-
|
||||||
-
|
-
|
||||||
* - :code:`Gemma2Model`
|
* - `Gemma2Model`
|
||||||
- Gemma2-based
|
- Gemma2-based
|
||||||
- :code:`BAAI/bge-multilingual-gemma2`, etc.
|
- `BAAI/bge-multilingual-gemma2`, etc.
|
||||||
-
|
-
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`GritLM`
|
* - `GritLM`
|
||||||
- GritLM
|
- GritLM
|
||||||
- :code:`parasail-ai/GritLM-7B-vllm`.
|
- `parasail-ai/GritLM-7B-vllm`.
|
||||||
- ✅︎
|
- ✅︎
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`LlamaModel`, :code:`LlamaForCausalLM`, :code:`MistralModel`, etc.
|
* - `LlamaModel`, `LlamaForCausalLM`, `MistralModel`, etc.
|
||||||
- Llama-based
|
- Llama-based
|
||||||
- :code:`intfloat/e5-mistral-7b-instruct`, etc.
|
- `intfloat/e5-mistral-7b-instruct`, etc.
|
||||||
- ✅︎
|
- ✅︎
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`Qwen2Model`, :code:`Qwen2ForCausalLM`
|
* - `Qwen2Model`, `Qwen2ForCausalLM`
|
||||||
- Qwen2-based
|
- Qwen2-based
|
||||||
- :code:`ssmits/Qwen2-7B-Instruct-embed-base` (see note), :code:`Alibaba-NLP/gte-Qwen2-7B-instruct` (see note), etc.
|
- `ssmits/Qwen2-7B-Instruct-embed-base` (see note), `Alibaba-NLP/gte-Qwen2-7B-instruct` (see note), etc.
|
||||||
- ✅︎
|
- ✅︎
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`RobertaModel`, :code:`RobertaForMaskedLM`
|
* - `RobertaModel`, `RobertaForMaskedLM`
|
||||||
- RoBERTa-based
|
- RoBERTa-based
|
||||||
- :code:`sentence-transformers/all-roberta-large-v1`, :code:`sentence-transformers/all-roberta-large-v1`, etc.
|
- `sentence-transformers/all-roberta-large-v1`, `sentence-transformers/all-roberta-large-v1`, etc.
|
||||||
-
|
-
|
||||||
-
|
-
|
||||||
* - :code:`XLMRobertaModel`
|
* - `XLMRobertaModel`
|
||||||
- XLM-RoBERTa-based
|
- XLM-RoBERTa-based
|
||||||
- :code:`intfloat/multilingual-e5-large`, etc.
|
- `intfloat/multilingual-e5-large`, etc.
|
||||||
-
|
-
|
||||||
-
|
-
|
||||||
```
|
```
|
||||||
@ -440,29 +438,28 @@ of the whole prompt are extracted from the normalized hidden state corresponding
|
|||||||
|
|
||||||
#### Reward Modeling (`--task reward`)
|
#### Reward Modeling (`--task reward`)
|
||||||
|
|
||||||
```{eval-rst}
|
```{list-table}
|
||||||
.. list-table::
|
:widths: 25 25 50 5 5
|
||||||
:widths: 25 25 50 5 5
|
:header-rows: 1
|
||||||
:header-rows: 1
|
|
||||||
|
|
||||||
* - Architecture
|
* - Architecture
|
||||||
- Models
|
- Models
|
||||||
- Example HF Models
|
- Example HF Models
|
||||||
- :ref:`LoRA <lora-adapter>`
|
- [LoRA](#lora-adapter)
|
||||||
- :ref:`PP <distributed-serving>`
|
- [PP](#distributed-serving)
|
||||||
* - :code:`InternLM2ForRewardModel`
|
* - `InternLM2ForRewardModel`
|
||||||
- InternLM2-based
|
- InternLM2-based
|
||||||
- :code:`internlm/internlm2-1_8b-reward`, :code:`internlm/internlm2-7b-reward`, etc.
|
- `internlm/internlm2-1_8b-reward`, `internlm/internlm2-7b-reward`, etc.
|
||||||
- ✅︎
|
- ✅︎
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`LlamaForCausalLM`
|
* - `LlamaForCausalLM`
|
||||||
- Llama-based
|
- Llama-based
|
||||||
- :code:`peiyi9979/math-shepherd-mistral-7b-prm`, etc.
|
- `peiyi9979/math-shepherd-mistral-7b-prm`, etc.
|
||||||
- ✅︎
|
- ✅︎
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`Qwen2ForRewardModel`
|
* - `Qwen2ForRewardModel`
|
||||||
- Qwen2-based
|
- Qwen2-based
|
||||||
- :code:`Qwen/Qwen2.5-Math-RM-72B`, etc.
|
- `Qwen/Qwen2.5-Math-RM-72B`, etc.
|
||||||
- ✅︎
|
- ✅︎
|
||||||
- ✅︎
|
- ✅︎
|
||||||
```
|
```
|
||||||
@ -477,24 +474,23 @@ e.g.: {code}`--override-pooler-config '{"pooling_type": "STEP", "step_tag_id": 1
|
|||||||
|
|
||||||
#### Classification (`--task classify`)
|
#### Classification (`--task classify`)
|
||||||
|
|
||||||
```{eval-rst}
|
```{list-table}
|
||||||
.. list-table::
|
:widths: 25 25 50 5 5
|
||||||
:widths: 25 25 50 5 5
|
:header-rows: 1
|
||||||
:header-rows: 1
|
|
||||||
|
|
||||||
* - Architecture
|
* - Architecture
|
||||||
- Models
|
- Models
|
||||||
- Example HF Models
|
- Example HF Models
|
||||||
- :ref:`LoRA <lora-adapter>`
|
- [LoRA](#lora-adapter)
|
||||||
- :ref:`PP <distributed-serving>`
|
- [PP](#distributed-serving)
|
||||||
* - :code:`JambaForSequenceClassification`
|
* - `JambaForSequenceClassification`
|
||||||
- Jamba
|
- Jamba
|
||||||
- :code:`ai21labs/Jamba-tiny-reward-dev`, etc.
|
- `ai21labs/Jamba-tiny-reward-dev`, etc.
|
||||||
- ✅︎
|
- ✅︎
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`Qwen2ForSequenceClassification`
|
* - `Qwen2ForSequenceClassification`
|
||||||
- Qwen2-based
|
- Qwen2-based
|
||||||
- :code:`jason9693/Qwen2.5-1.5B-apeach`, etc.
|
- `jason9693/Qwen2.5-1.5B-apeach`, etc.
|
||||||
- ✅︎
|
- ✅︎
|
||||||
- ✅︎
|
- ✅︎
|
||||||
```
|
```
|
||||||
@ -504,29 +500,28 @@ If your model is not in the above list, we will try to automatically convert the
|
|||||||
|
|
||||||
#### Sentence Pair Scoring (`--task score`)
|
#### Sentence Pair Scoring (`--task score`)
|
||||||
|
|
||||||
```{eval-rst}
|
```{list-table}
|
||||||
.. list-table::
|
:widths: 25 25 50 5 5
|
||||||
:widths: 25 25 50 5 5
|
:header-rows: 1
|
||||||
:header-rows: 1
|
|
||||||
|
|
||||||
* - Architecture
|
* - Architecture
|
||||||
- Models
|
- Models
|
||||||
- Example HF Models
|
- Example HF Models
|
||||||
- :ref:`LoRA <lora-adapter>`
|
- [LoRA](#lora-adapter)
|
||||||
- :ref:`PP <distributed-serving>`
|
- [PP](#distributed-serving)
|
||||||
* - :code:`BertForSequenceClassification`
|
* - `BertForSequenceClassification`
|
||||||
- BERT-based
|
- BERT-based
|
||||||
- :code:`cross-encoder/ms-marco-MiniLM-L-6-v2`, etc.
|
- `cross-encoder/ms-marco-MiniLM-L-6-v2`, etc.
|
||||||
-
|
-
|
||||||
-
|
-
|
||||||
* - :code:`RobertaForSequenceClassification`
|
* - `RobertaForSequenceClassification`
|
||||||
- RoBERTa-based
|
- RoBERTa-based
|
||||||
- :code:`cross-encoder/quora-roberta-base`, etc.
|
- `cross-encoder/quora-roberta-base`, etc.
|
||||||
-
|
-
|
||||||
-
|
-
|
||||||
* - :code:`XLMRobertaForSequenceClassification`
|
* - `XLMRobertaForSequenceClassification`
|
||||||
- XLM-RoBERTa-based
|
- XLM-RoBERTa-based
|
||||||
- :code:`BAAI/bge-reranker-v2-m3`, etc.
|
- `BAAI/bge-reranker-v2-m3`, etc.
|
||||||
-
|
-
|
||||||
-
|
-
|
||||||
```
|
```
|
||||||
@ -558,186 +553,182 @@ See [this page](#generative-models) for more information on how to use generativ
|
|||||||
|
|
||||||
#### Text Generation (`--task generate`)
|
#### Text Generation (`--task generate`)
|
||||||
|
|
||||||
```{eval-rst}
|
```{list-table}
|
||||||
.. list-table::
|
:widths: 25 25 15 20 5 5 5
|
||||||
:widths: 25 25 15 20 5 5 5
|
:header-rows: 1
|
||||||
:header-rows: 1
|
|
||||||
|
|
||||||
* - Architecture
|
* - Architecture
|
||||||
- Models
|
- Models
|
||||||
- Inputs
|
- Inputs
|
||||||
- Example HF Models
|
- Example HF Models
|
||||||
- :ref:`LoRA <lora-adapter>`
|
- [LoRA](#lora-adapter)
|
||||||
- :ref:`PP <distributed-serving>`
|
- [PP](#distributed-serving)
|
||||||
- V1
|
- [V1](gh-issue:8779)
|
||||||
* - :code:`AriaForConditionalGeneration`
|
* - `AriaForConditionalGeneration`
|
||||||
- Aria
|
- Aria
|
||||||
- T + I
|
- T + I
|
||||||
- :code:`rhymes-ai/Aria`
|
- `rhymes-ai/Aria`
|
||||||
-
|
-
|
||||||
- ✅︎
|
- ✅︎
|
||||||
-
|
-
|
||||||
* - :code:`Blip2ForConditionalGeneration`
|
* - `Blip2ForConditionalGeneration`
|
||||||
- BLIP-2
|
- BLIP-2
|
||||||
- T + I\ :sup:`E`
|
- T + I<sup>E</sup>
|
||||||
- :code:`Salesforce/blip2-opt-2.7b`, :code:`Salesforce/blip2-opt-6.7b`, etc.
|
- `Salesforce/blip2-opt-2.7b`, `Salesforce/blip2-opt-6.7b`, etc.
|
||||||
-
|
-
|
||||||
- ✅︎
|
- ✅︎
|
||||||
-
|
-
|
||||||
* - :code:`ChameleonForConditionalGeneration`
|
* - `ChameleonForConditionalGeneration`
|
||||||
- Chameleon
|
- Chameleon
|
||||||
- T + I
|
- T + I
|
||||||
- :code:`facebook/chameleon-7b` etc.
|
- `facebook/chameleon-7b` etc.
|
||||||
-
|
-
|
||||||
- ✅︎
|
- ✅︎
|
||||||
-
|
-
|
||||||
* - :code:`FuyuForCausalLM`
|
* - `FuyuForCausalLM`
|
||||||
- Fuyu
|
- Fuyu
|
||||||
- T + I
|
- T + I
|
||||||
- :code:`adept/fuyu-8b` etc.
|
- `adept/fuyu-8b` etc.
|
||||||
-
|
-
|
||||||
- ✅︎
|
- ✅︎
|
||||||
-
|
-
|
||||||
* - :code:`ChatGLMModel`
|
* - `ChatGLMModel`
|
||||||
- GLM-4V
|
- GLM-4V
|
||||||
- T + I
|
- T + I
|
||||||
- :code:`THUDM/glm-4v-9b` etc.
|
- `THUDM/glm-4v-9b` etc.
|
||||||
- ✅︎
|
- ✅︎
|
||||||
- ✅︎
|
- ✅︎
|
||||||
-
|
-
|
||||||
* - :code:`H2OVLChatModel`
|
* - `H2OVLChatModel`
|
||||||
- H2OVL
|
- H2OVL
|
||||||
- T + I\ :sup:`E+`
|
- T + I<sup>E+</sup>
|
||||||
- :code:`h2oai/h2ovl-mississippi-800m`, :code:`h2oai/h2ovl-mississippi-2b`, etc.
|
- `h2oai/h2ovl-mississippi-800m`, `h2oai/h2ovl-mississippi-2b`, etc.
|
||||||
-
|
-
|
||||||
- ✅︎
|
- ✅︎
|
||||||
-
|
-
|
||||||
* - :code:`Idefics3ForConditionalGeneration`
|
* - `Idefics3ForConditionalGeneration`
|
||||||
- Idefics3
|
- Idefics3
|
||||||
- T + I
|
- T + I
|
||||||
- :code:`HuggingFaceM4/Idefics3-8B-Llama3` etc.
|
- `HuggingFaceM4/Idefics3-8B-Llama3` etc.
|
||||||
- ✅︎
|
- ✅︎
|
||||||
-
|
-
|
||||||
-
|
-
|
||||||
* - :code:`InternVLChatModel`
|
* - `InternVLChatModel`
|
||||||
- InternVL 2.5, Mono-InternVL, InternVL 2.0
|
- InternVL 2.5, Mono-InternVL, InternVL 2.0
|
||||||
- T + I\ :sup:`E+`
|
- T + I<sup>E+</sup>
|
||||||
- :code:`OpenGVLab/InternVL2_5-4B`, :code:`OpenGVLab/Mono-InternVL-2B`, :code:`OpenGVLab/InternVL2-4B`, etc.
|
- `OpenGVLab/InternVL2_5-4B`, `OpenGVLab/Mono-InternVL-2B`, `OpenGVLab/InternVL2-4B`, etc.
|
||||||
-
|
-
|
||||||
- ✅︎
|
- ✅︎
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`LlavaForConditionalGeneration`
|
* - `LlavaForConditionalGeneration`
|
||||||
- LLaVA-1.5
|
- LLaVA-1.5
|
||||||
- T + I\ :sup:`E+`
|
- T + I<sup>E+</sup>
|
||||||
- :code:`llava-hf/llava-1.5-7b-hf`, :code:`TIGER-Lab/Mantis-8B-siglip-llama3` (see note), etc.
|
- `llava-hf/llava-1.5-7b-hf`, `TIGER-Lab/Mantis-8B-siglip-llama3` (see note), etc.
|
||||||
-
|
-
|
||||||
- ✅︎
|
- ✅︎
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`LlavaNextForConditionalGeneration`
|
* - `LlavaNextForConditionalGeneration`
|
||||||
- LLaVA-NeXT
|
- LLaVA-NeXT
|
||||||
- T + I\ :sup:`E+`
|
- T + I<sup>E+</sup>
|
||||||
- :code:`llava-hf/llava-v1.6-mistral-7b-hf`, :code:`llava-hf/llava-v1.6-vicuna-7b-hf`, etc.
|
- `llava-hf/llava-v1.6-mistral-7b-hf`, `llava-hf/llava-v1.6-vicuna-7b-hf`, etc.
|
||||||
-
|
-
|
||||||
- ✅︎
|
- ✅︎
|
||||||
-
|
-
|
||||||
* - :code:`LlavaNextVideoForConditionalGeneration`
|
* - `LlavaNextVideoForConditionalGeneration`
|
||||||
- LLaVA-NeXT-Video
|
- LLaVA-NeXT-Video
|
||||||
- T + V
|
- T + V
|
||||||
- :code:`llava-hf/LLaVA-NeXT-Video-7B-hf`, etc.
|
- `llava-hf/LLaVA-NeXT-Video-7B-hf`, etc.
|
||||||
-
|
-
|
||||||
- ✅︎
|
- ✅︎
|
||||||
-
|
-
|
||||||
* - :code:`LlavaOnevisionForConditionalGeneration`
|
* - `LlavaOnevisionForConditionalGeneration`
|
||||||
- LLaVA-Onevision
|
- LLaVA-Onevision
|
||||||
- T + I\ :sup:`+` + V\ :sup:`+`
|
- T + I<sup>+</sup> + V<sup>+</sup>
|
||||||
- :code:`llava-hf/llava-onevision-qwen2-7b-ov-hf`, :code:`llava-hf/llava-onevision-qwen2-0.5b-ov-hf`, etc.
|
- `llava-hf/llava-onevision-qwen2-7b-ov-hf`, `llava-hf/llava-onevision-qwen2-0.5b-ov-hf`, etc.
|
||||||
-
|
-
|
||||||
- ✅︎
|
- ✅︎
|
||||||
-
|
-
|
||||||
* - :code:`MiniCPMV`
|
* - `MiniCPMV`
|
||||||
- MiniCPM-V
|
- MiniCPM-V
|
||||||
- T + I\ :sup:`E+`
|
- T + I<sup>E+</sup>
|
||||||
- :code:`openbmb/MiniCPM-V-2` (see note), :code:`openbmb/MiniCPM-Llama3-V-2_5`, :code:`openbmb/MiniCPM-V-2_6`, etc.
|
- `openbmb/MiniCPM-V-2` (see note), `openbmb/MiniCPM-Llama3-V-2_5`, `openbmb/MiniCPM-V-2_6`, etc.
|
||||||
- ✅︎
|
- ✅︎
|
||||||
- ✅︎
|
- ✅︎
|
||||||
-
|
-
|
||||||
* - :code:`MllamaForConditionalGeneration`
|
* - `MllamaForConditionalGeneration`
|
||||||
- Llama 3.2
|
- Llama 3.2
|
||||||
- T + I\ :sup:`+`
|
- T + I<sup>+</sup>
|
||||||
- :code:`meta-llama/Llama-3.2-90B-Vision-Instruct`, :code:`meta-llama/Llama-3.2-11B-Vision`, etc.
|
- `meta-llama/Llama-3.2-90B-Vision-Instruct`, `meta-llama/Llama-3.2-11B-Vision`, etc.
|
||||||
-
|
-
|
||||||
-
|
-
|
||||||
-
|
-
|
||||||
* - :code:`MolmoForCausalLM`
|
* - `MolmoForCausalLM`
|
||||||
- Molmo
|
- Molmo
|
||||||
- T + I
|
- T + I
|
||||||
- :code:`allenai/Molmo-7B-D-0924`, :code:`allenai/Molmo-72B-0924`, etc.
|
- `allenai/Molmo-7B-D-0924`, `allenai/Molmo-72B-0924`, etc.
|
||||||
-
|
-
|
||||||
- ✅︎
|
- ✅︎
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`NVLM_D_Model`
|
* - `NVLM_D_Model`
|
||||||
- NVLM-D 1.0
|
- NVLM-D 1.0
|
||||||
- T + I\ :sup:`E+`
|
- T + I<sup>E+</sup>
|
||||||
- :code:`nvidia/NVLM-D-72B`, etc.
|
- `nvidia/NVLM-D-72B`, etc.
|
||||||
-
|
-
|
||||||
- ✅︎
|
- ✅︎
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`PaliGemmaForConditionalGeneration`
|
* - `PaliGemmaForConditionalGeneration`
|
||||||
- PaliGemma, PaliGemma 2
|
- PaliGemma, PaliGemma 2
|
||||||
- T + I\ :sup:`E`
|
- T + I<sup>E</sup>
|
||||||
- :code:`google/paligemma-3b-pt-224`, :code:`google/paligemma-3b-mix-224`, :code:`google/paligemma2-3b-ft-docci-448`, etc.
|
- `google/paligemma-3b-pt-224`, `google/paligemma-3b-mix-224`, `google/paligemma2-3b-ft-docci-448`, etc.
|
||||||
-
|
-
|
||||||
- ✅︎
|
- ✅︎
|
||||||
-
|
-
|
||||||
* - :code:`Phi3VForCausalLM`
|
* - `Phi3VForCausalLM`
|
||||||
- Phi-3-Vision, Phi-3.5-Vision
|
- Phi-3-Vision, Phi-3.5-Vision
|
||||||
- T + I\ :sup:`E+`
|
- T + I<sup>E+</sup>
|
||||||
- :code:`microsoft/Phi-3-vision-128k-instruct`, :code:`microsoft/Phi-3.5-vision-instruct` etc.
|
- `microsoft/Phi-3-vision-128k-instruct`, `microsoft/Phi-3.5-vision-instruct` etc.
|
||||||
-
|
-
|
||||||
- ✅︎
|
- ✅︎
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`PixtralForConditionalGeneration`
|
* - `PixtralForConditionalGeneration`
|
||||||
- Pixtral
|
- Pixtral
|
||||||
- T + I\ :sup:`+`
|
- T + I<sup>+</sup>
|
||||||
- :code:`mistralai/Pixtral-12B-2409`, :code:`mistral-community/pixtral-12b` etc.
|
- `mistralai/Pixtral-12B-2409`, `mistral-community/pixtral-12b` etc.
|
||||||
-
|
-
|
||||||
- ✅︎
|
- ✅︎
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`QWenLMHeadModel`
|
* - `QWenLMHeadModel`
|
||||||
- Qwen-VL
|
- Qwen-VL
|
||||||
- T + I\ :sup:`E+`
|
- T + I<sup>E+</sup>
|
||||||
- :code:`Qwen/Qwen-VL`, :code:`Qwen/Qwen-VL-Chat`, etc.
|
- `Qwen/Qwen-VL`, `Qwen/Qwen-VL-Chat`, etc.
|
||||||
- ✅︎
|
- ✅︎
|
||||||
- ✅︎
|
- ✅︎
|
||||||
-
|
-
|
||||||
* - :code:`Qwen2AudioForConditionalGeneration`
|
* - `Qwen2AudioForConditionalGeneration`
|
||||||
- Qwen2-Audio
|
- Qwen2-Audio
|
||||||
- T + A\ :sup:`+`
|
- T + A<sup>+</sup>
|
||||||
- :code:`Qwen/Qwen2-Audio-7B-Instruct`
|
- `Qwen/Qwen2-Audio-7B-Instruct`
|
||||||
-
|
-
|
||||||
- ✅︎
|
- ✅︎
|
||||||
-
|
-
|
||||||
* - :code:`Qwen2VLForConditionalGeneration`
|
* - `Qwen2VLForConditionalGeneration`
|
||||||
- Qwen2-VL
|
- Qwen2-VL
|
||||||
- T + I\ :sup:`E+` + V\ :sup:`E+`
|
- T + I<sup>E+</sup> + V<sup>E+</sup>
|
||||||
- :code:`Qwen/QVQ-72B-Preview`, :code:`Qwen/Qwen2-VL-7B-Instruct`, :code:`Qwen/Qwen2-VL-72B-Instruct`, etc.
|
- `Qwen/QVQ-72B-Preview`, `Qwen/Qwen2-VL-7B-Instruct`, `Qwen/Qwen2-VL-72B-Instruct`, etc.
|
||||||
- ✅︎
|
- ✅︎
|
||||||
- ✅︎
|
- ✅︎
|
||||||
-
|
-
|
||||||
* - :code:`UltravoxModel`
|
* - `UltravoxModel`
|
||||||
- Ultravox
|
- Ultravox
|
||||||
- T + A\ :sup:`E+`
|
- T + A<sup>E+</sup>
|
||||||
- :code:`fixie-ai/ultravox-v0_3`
|
- `fixie-ai/ultravox-v0_3`
|
||||||
-
|
-
|
||||||
- ✅︎
|
- ✅︎
|
||||||
-
|
-
|
||||||
```
|
```
|
||||||
|
|
||||||
```{eval-rst}
|
<sup>E</sup> Pre-computed embeddings can be inputted for this modality.
|
||||||
:sup:`E` Pre-computed embeddings can be inputted for this modality.
|
<sup>+</sup> Multiple items can be inputted per text prompt for this modality.
|
||||||
|
|
||||||
:sup:`+` Multiple items can be inputted per text prompt for this modality.
|
|
||||||
```
|
|
||||||
|
|
||||||
````{important}
|
````{important}
|
||||||
To enable multiple multi-modal items per text prompt, you have to set {code}`limit_mm_per_prompt` (offline inference)
|
To enable multiple multi-modal items per text prompt, you have to set {code}`limit_mm_per_prompt` (offline inference)
|
||||||
@ -787,38 +778,37 @@ To get the best results, you should use pooling models that are specifically tra
|
|||||||
|
|
||||||
The following table lists those that are tested in vLLM.
|
The following table lists those that are tested in vLLM.
|
||||||
|
|
||||||
```{eval-rst}
|
```{list-table}
|
||||||
.. list-table::
|
:widths: 25 25 15 25 5 5
|
||||||
:widths: 25 25 15 25 5 5
|
:header-rows: 1
|
||||||
:header-rows: 1
|
|
||||||
|
|
||||||
* - Architecture
|
* - Architecture
|
||||||
- Models
|
- Models
|
||||||
- Inputs
|
- Inputs
|
||||||
- Example HF Models
|
- Example HF Models
|
||||||
- :ref:`LoRA <lora-adapter>`
|
- [LoRA](#lora-adapter)
|
||||||
- :ref:`PP <distributed-serving>`
|
- [PP](#distributed-serving)
|
||||||
* - :code:`LlavaNextForConditionalGeneration`
|
* - `LlavaNextForConditionalGeneration`
|
||||||
- LLaVA-NeXT-based
|
- LLaVA-NeXT-based
|
||||||
- T / I
|
- T / I
|
||||||
- :code:`royokong/e5-v`
|
- `royokong/e5-v`
|
||||||
-
|
-
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`Phi3VForCausalLM`
|
* - `Phi3VForCausalLM`
|
||||||
- Phi-3-Vision-based
|
- Phi-3-Vision-based
|
||||||
- T + I
|
- T + I
|
||||||
- :code:`TIGER-Lab/VLM2Vec-Full`
|
- `TIGER-Lab/VLM2Vec-Full`
|
||||||
- 🚧
|
- 🚧
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - :code:`Qwen2VLForConditionalGeneration`
|
* - `Qwen2VLForConditionalGeneration`
|
||||||
- Qwen2-VL-based
|
- Qwen2-VL-based
|
||||||
- T + I
|
- T + I
|
||||||
- :code:`MrLight/dse-qwen2-2b-mrl-v1`
|
- `MrLight/dse-qwen2-2b-mrl-v1`
|
||||||
-
|
-
|
||||||
- ✅︎
|
- ✅︎
|
||||||
```
|
```
|
||||||
|
|
||||||
______________________________________________________________________
|
_________________
|
||||||
|
|
||||||
# Model Support Policy
|
# Model Support Policy
|
||||||
|
|
||||||
|
@ -4,12 +4,11 @@
|
|||||||
|
|
||||||
The table below shows the compatibility of various quantization implementations with different hardware platforms in vLLM:
|
The table below shows the compatibility of various quantization implementations with different hardware platforms in vLLM:
|
||||||
|
|
||||||
```{eval-rst}
|
```{list-table}
|
||||||
.. list-table::
|
:header-rows: 1
|
||||||
:header-rows: 1
|
:widths: 20 8 8 8 8 8 8 8 8 8 8
|
||||||
:widths: 20 8 8 8 8 8 8 8 8 8 8
|
|
||||||
|
|
||||||
* - Implementation
|
* - Implementation
|
||||||
- Volta
|
- Volta
|
||||||
- Turing
|
- Turing
|
||||||
- Ampere
|
- Ampere
|
||||||
@ -20,7 +19,7 @@ The table below shows the compatibility of various quantization implementations
|
|||||||
- x86 CPU
|
- x86 CPU
|
||||||
- AWS Inferentia
|
- AWS Inferentia
|
||||||
- Google TPU
|
- Google TPU
|
||||||
* - AWQ
|
* - AWQ
|
||||||
- ✗
|
- ✗
|
||||||
- ✅︎
|
- ✅︎
|
||||||
- ✅︎
|
- ✅︎
|
||||||
@ -31,7 +30,7 @@ The table below shows the compatibility of various quantization implementations
|
|||||||
- ✅︎
|
- ✅︎
|
||||||
- ✗
|
- ✗
|
||||||
- ✗
|
- ✗
|
||||||
* - GPTQ
|
* - GPTQ
|
||||||
- ✅︎
|
- ✅︎
|
||||||
- ✅︎
|
- ✅︎
|
||||||
- ✅︎
|
- ✅︎
|
||||||
@ -42,7 +41,7 @@ The table below shows the compatibility of various quantization implementations
|
|||||||
- ✅︎
|
- ✅︎
|
||||||
- ✗
|
- ✗
|
||||||
- ✗
|
- ✗
|
||||||
* - Marlin (GPTQ/AWQ/FP8)
|
* - Marlin (GPTQ/AWQ/FP8)
|
||||||
- ✗
|
- ✗
|
||||||
- ✗
|
- ✗
|
||||||
- ✅︎
|
- ✅︎
|
||||||
@ -53,7 +52,7 @@ The table below shows the compatibility of various quantization implementations
|
|||||||
- ✗
|
- ✗
|
||||||
- ✗
|
- ✗
|
||||||
- ✗
|
- ✗
|
||||||
* - INT8 (W8A8)
|
* - INT8 (W8A8)
|
||||||
- ✗
|
- ✗
|
||||||
- ✅︎
|
- ✅︎
|
||||||
- ✅︎
|
- ✅︎
|
||||||
@ -64,7 +63,7 @@ The table below shows the compatibility of various quantization implementations
|
|||||||
- ✅︎
|
- ✅︎
|
||||||
- ✗
|
- ✗
|
||||||
- ✗
|
- ✗
|
||||||
* - FP8 (W8A8)
|
* - FP8 (W8A8)
|
||||||
- ✗
|
- ✗
|
||||||
- ✗
|
- ✗
|
||||||
- ✗
|
- ✗
|
||||||
@ -75,7 +74,7 @@ The table below shows the compatibility of various quantization implementations
|
|||||||
- ✗
|
- ✗
|
||||||
- ✗
|
- ✗
|
||||||
- ✗
|
- ✗
|
||||||
* - AQLM
|
* - AQLM
|
||||||
- ✅︎
|
- ✅︎
|
||||||
- ✅︎
|
- ✅︎
|
||||||
- ✅︎
|
- ✅︎
|
||||||
@ -86,7 +85,7 @@ The table below shows the compatibility of various quantization implementations
|
|||||||
- ✗
|
- ✗
|
||||||
- ✗
|
- ✗
|
||||||
- ✗
|
- ✗
|
||||||
* - bitsandbytes
|
* - bitsandbytes
|
||||||
- ✅︎
|
- ✅︎
|
||||||
- ✅︎
|
- ✅︎
|
||||||
- ✅︎
|
- ✅︎
|
||||||
@ -97,7 +96,7 @@ The table below shows the compatibility of various quantization implementations
|
|||||||
- ✗
|
- ✗
|
||||||
- ✗
|
- ✗
|
||||||
- ✗
|
- ✗
|
||||||
* - DeepSpeedFP
|
* - DeepSpeedFP
|
||||||
- ✅︎
|
- ✅︎
|
||||||
- ✅︎
|
- ✅︎
|
||||||
- ✅︎
|
- ✅︎
|
||||||
@ -108,7 +107,7 @@ The table below shows the compatibility of various quantization implementations
|
|||||||
- ✗
|
- ✗
|
||||||
- ✗
|
- ✗
|
||||||
- ✗
|
- ✗
|
||||||
* - GGUF
|
* - GGUF
|
||||||
- ✅︎
|
- ✅︎
|
||||||
- ✅︎
|
- ✅︎
|
||||||
- ✅︎
|
- ✅︎
|
||||||
|
@ -43,208 +43,207 @@ chart **including persistent volumes** and deletes the release.
|
|||||||
|
|
||||||
## Values
|
## Values
|
||||||
|
|
||||||
```{eval-rst}
|
```{list-table}
|
||||||
.. list-table:: Values
|
:widths: 25 25 25 25
|
||||||
:widths: 25 25 25 25
|
:header-rows: 1
|
||||||
:header-rows: 1
|
|
||||||
|
|
||||||
* - Key
|
* - Key
|
||||||
- Type
|
- Type
|
||||||
- Default
|
- Default
|
||||||
- Description
|
- Description
|
||||||
* - autoscaling
|
* - autoscaling
|
||||||
- object
|
- object
|
||||||
- {"enabled":false,"maxReplicas":100,"minReplicas":1,"targetCPUUtilizationPercentage":80}
|
- {"enabled":false,"maxReplicas":100,"minReplicas":1,"targetCPUUtilizationPercentage":80}
|
||||||
- Autoscaling configuration
|
- Autoscaling configuration
|
||||||
* - autoscaling.enabled
|
* - autoscaling.enabled
|
||||||
- bool
|
- bool
|
||||||
- false
|
- false
|
||||||
- Enable autoscaling
|
- Enable autoscaling
|
||||||
* - autoscaling.maxReplicas
|
* - autoscaling.maxReplicas
|
||||||
- int
|
- int
|
||||||
- 100
|
- 100
|
||||||
- Maximum replicas
|
- Maximum replicas
|
||||||
* - autoscaling.minReplicas
|
* - autoscaling.minReplicas
|
||||||
- int
|
- int
|
||||||
- 1
|
- 1
|
||||||
- Minimum replicas
|
- Minimum replicas
|
||||||
* - autoscaling.targetCPUUtilizationPercentage
|
* - autoscaling.targetCPUUtilizationPercentage
|
||||||
- int
|
- int
|
||||||
- 80
|
- 80
|
||||||
- Target CPU utilization for autoscaling
|
- Target CPU utilization for autoscaling
|
||||||
* - configs
|
* - configs
|
||||||
- object
|
- object
|
||||||
- {}
|
- {}
|
||||||
- Configmap
|
- Configmap
|
||||||
* - containerPort
|
* - containerPort
|
||||||
- int
|
- int
|
||||||
- 8000
|
- 8000
|
||||||
- Container port
|
- Container port
|
||||||
* - customObjects
|
* - customObjects
|
||||||
- list
|
- list
|
||||||
- []
|
- []
|
||||||
- Custom Objects configuration
|
- Custom Objects configuration
|
||||||
* - deploymentStrategy
|
* - deploymentStrategy
|
||||||
- object
|
- object
|
||||||
- {}
|
- {}
|
||||||
- Deployment strategy configuration
|
- Deployment strategy configuration
|
||||||
* - externalConfigs
|
* - externalConfigs
|
||||||
- list
|
- list
|
||||||
- []
|
- []
|
||||||
- External configuration
|
- External configuration
|
||||||
* - extraContainers
|
* - extraContainers
|
||||||
- list
|
- list
|
||||||
- []
|
- []
|
||||||
- Additional containers configuration
|
- Additional containers configuration
|
||||||
* - extraInit
|
* - extraInit
|
||||||
- object
|
- object
|
||||||
- {"pvcStorage":"1Gi","s3modelpath":"relative_s3_model_path/opt-125m", "awsEc2MetadataDisabled": true}
|
- {"pvcStorage":"1Gi","s3modelpath":"relative_s3_model_path/opt-125m", "awsEc2MetadataDisabled": true}
|
||||||
- Additional configuration for the init container
|
- Additional configuration for the init container
|
||||||
* - extraInit.pvcStorage
|
* - extraInit.pvcStorage
|
||||||
- string
|
- string
|
||||||
- "50Gi"
|
- "50Gi"
|
||||||
- Storage size of the s3
|
- Storage size of the s3
|
||||||
* - extraInit.s3modelpath
|
* - extraInit.s3modelpath
|
||||||
- string
|
- string
|
||||||
- "relative_s3_model_path/opt-125m"
|
- "relative_s3_model_path/opt-125m"
|
||||||
- Path of the model on the s3 which hosts model weights and config files
|
- Path of the model on the s3 which hosts model weights and config files
|
||||||
* - extraInit.awsEc2MetadataDisabled
|
* - extraInit.awsEc2MetadataDisabled
|
||||||
- boolean
|
- boolean
|
||||||
- true
|
- true
|
||||||
- Disables the use of the Amazon EC2 instance metadata service
|
- Disables the use of the Amazon EC2 instance metadata service
|
||||||
* - extraPorts
|
* - extraPorts
|
||||||
- list
|
- list
|
||||||
- []
|
- []
|
||||||
- Additional ports configuration
|
- Additional ports configuration
|
||||||
* - gpuModels
|
* - gpuModels
|
||||||
- list
|
- list
|
||||||
- ["TYPE_GPU_USED"]
|
- ["TYPE_GPU_USED"]
|
||||||
- Type of gpu used
|
- Type of gpu used
|
||||||
* - image
|
* - image
|
||||||
- object
|
- object
|
||||||
- {"command":["vllm","serve","/data/","--served-model-name","opt-125m","--host","0.0.0.0","--port","8000"],"repository":"vllm/vllm-openai","tag":"latest"}
|
- {"command":["vllm","serve","/data/","--served-model-name","opt-125m","--host","0.0.0.0","--port","8000"],"repository":"vllm/vllm-openai","tag":"latest"}
|
||||||
- Image configuration
|
- Image configuration
|
||||||
* - image.command
|
* - image.command
|
||||||
- list
|
- list
|
||||||
- ["vllm","serve","/data/","--served-model-name","opt-125m","--host","0.0.0.0","--port","8000"]
|
- ["vllm","serve","/data/","--served-model-name","opt-125m","--host","0.0.0.0","--port","8000"]
|
||||||
- Container launch command
|
- Container launch command
|
||||||
* - image.repository
|
* - image.repository
|
||||||
- string
|
- string
|
||||||
- "vllm/vllm-openai"
|
- "vllm/vllm-openai"
|
||||||
- Image repository
|
- Image repository
|
||||||
* - image.tag
|
* - image.tag
|
||||||
- string
|
- string
|
||||||
- "latest"
|
- "latest"
|
||||||
- Image tag
|
- Image tag
|
||||||
* - livenessProbe
|
* - livenessProbe
|
||||||
- object
|
- object
|
||||||
- {"failureThreshold":3,"httpGet":{"path":"/health","port":8000},"initialDelaySeconds":15,"periodSeconds":10}
|
- {"failureThreshold":3,"httpGet":{"path":"/health","port":8000},"initialDelaySeconds":15,"periodSeconds":10}
|
||||||
- Liveness probe configuration
|
- Liveness probe configuration
|
||||||
* - livenessProbe.failureThreshold
|
* - livenessProbe.failureThreshold
|
||||||
- int
|
- int
|
||||||
- 3
|
- 3
|
||||||
- Number of times after which if a probe fails in a row, Kubernetes considers that the overall check has failed: the container is not alive
|
- Number of times after which if a probe fails in a row, Kubernetes considers that the overall check has failed: the container is not alive
|
||||||
* - livenessProbe.httpGet
|
* - livenessProbe.httpGet
|
||||||
- object
|
- object
|
||||||
- {"path":"/health","port":8000}
|
- {"path":"/health","port":8000}
|
||||||
- Configuration of the Kubelet http request on the server
|
- Configuration of the Kubelet http request on the server
|
||||||
* - livenessProbe.httpGet.path
|
* - livenessProbe.httpGet.path
|
||||||
- string
|
- string
|
||||||
- "/health"
|
- "/health"
|
||||||
- Path to access on the HTTP server
|
- Path to access on the HTTP server
|
||||||
* - livenessProbe.httpGet.port
|
* - livenessProbe.httpGet.port
|
||||||
- int
|
- int
|
||||||
- 8000
|
- 8000
|
||||||
- Name or number of the port to access on the container, on which the server is listening
|
- Name or number of the port to access on the container, on which the server is listening
|
||||||
* - livenessProbe.initialDelaySeconds
|
* - livenessProbe.initialDelaySeconds
|
||||||
- int
|
- int
|
||||||
- 15
|
- 15
|
||||||
- Number of seconds after the container has started before liveness probe is initiated
|
- Number of seconds after the container has started before liveness probe is initiated
|
||||||
* - livenessProbe.periodSeconds
|
* - livenessProbe.periodSeconds
|
||||||
- int
|
- int
|
||||||
- 10
|
- 10
|
||||||
- How often (in seconds) to perform the liveness probe
|
- How often (in seconds) to perform the liveness probe
|
||||||
* - maxUnavailablePodDisruptionBudget
|
* - maxUnavailablePodDisruptionBudget
|
||||||
- string
|
- string
|
||||||
- ""
|
- ""
|
||||||
- Disruption Budget Configuration
|
- Disruption Budget Configuration
|
||||||
* - readinessProbe
|
* - readinessProbe
|
||||||
- object
|
- object
|
||||||
- {"failureThreshold":3,"httpGet":{"path":"/health","port":8000},"initialDelaySeconds":5,"periodSeconds":5}
|
- {"failureThreshold":3,"httpGet":{"path":"/health","port":8000},"initialDelaySeconds":5,"periodSeconds":5}
|
||||||
- Readiness probe configuration
|
- Readiness probe configuration
|
||||||
* - readinessProbe.failureThreshold
|
* - readinessProbe.failureThreshold
|
||||||
- int
|
- int
|
||||||
- 3
|
- 3
|
||||||
- Number of times after which if a probe fails in a row, Kubernetes considers that the overall check has failed: the container is not ready
|
- Number of times after which if a probe fails in a row, Kubernetes considers that the overall check has failed: the container is not ready
|
||||||
* - readinessProbe.httpGet
|
* - readinessProbe.httpGet
|
||||||
- object
|
- object
|
||||||
- {"path":"/health","port":8000}
|
- {"path":"/health","port":8000}
|
||||||
- Configuration of the Kubelet http request on the server
|
- Configuration of the Kubelet http request on the server
|
||||||
* - readinessProbe.httpGet.path
|
* - readinessProbe.httpGet.path
|
||||||
- string
|
- string
|
||||||
- "/health"
|
- "/health"
|
||||||
- Path to access on the HTTP server
|
- Path to access on the HTTP server
|
||||||
* - readinessProbe.httpGet.port
|
* - readinessProbe.httpGet.port
|
||||||
- int
|
- int
|
||||||
- 8000
|
- 8000
|
||||||
- Name or number of the port to access on the container, on which the server is listening
|
- Name or number of the port to access on the container, on which the server is listening
|
||||||
* - readinessProbe.initialDelaySeconds
|
* - readinessProbe.initialDelaySeconds
|
||||||
- int
|
- int
|
||||||
- 5
|
- 5
|
||||||
- Number of seconds after the container has started before readiness probe is initiated
|
- Number of seconds after the container has started before readiness probe is initiated
|
||||||
* - readinessProbe.periodSeconds
|
* - readinessProbe.periodSeconds
|
||||||
- int
|
- int
|
||||||
- 5
|
- 5
|
||||||
- How often (in seconds) to perform the readiness probe
|
- How often (in seconds) to perform the readiness probe
|
||||||
* - replicaCount
|
* - replicaCount
|
||||||
- int
|
- int
|
||||||
- 1
|
- 1
|
||||||
- Number of replicas
|
- Number of replicas
|
||||||
* - resources
|
* - resources
|
||||||
- object
|
- object
|
||||||
- {"limits":{"cpu":4,"memory":"16Gi","nvidia.com/gpu":1},"requests":{"cpu":4,"memory":"16Gi","nvidia.com/gpu":1}}
|
- {"limits":{"cpu":4,"memory":"16Gi","nvidia.com/gpu":1},"requests":{"cpu":4,"memory":"16Gi","nvidia.com/gpu":1}}
|
||||||
- Resource configuration
|
- Resource configuration
|
||||||
* - resources.limits."nvidia.com/gpu"
|
* - resources.limits."nvidia.com/gpu"
|
||||||
- int
|
- int
|
||||||
- 1
|
- 1
|
||||||
- Number of gpus used
|
- Number of gpus used
|
||||||
* - resources.limits.cpu
|
* - resources.limits.cpu
|
||||||
- int
|
- int
|
||||||
- 4
|
- 4
|
||||||
- Number of CPUs
|
- Number of CPUs
|
||||||
* - resources.limits.memory
|
* - resources.limits.memory
|
||||||
- string
|
- string
|
||||||
- "16Gi"
|
- "16Gi"
|
||||||
- CPU memory configuration
|
- CPU memory configuration
|
||||||
* - resources.requests."nvidia.com/gpu"
|
* - resources.requests."nvidia.com/gpu"
|
||||||
- int
|
- int
|
||||||
- 1
|
- 1
|
||||||
- Number of gpus used
|
- Number of gpus used
|
||||||
* - resources.requests.cpu
|
* - resources.requests.cpu
|
||||||
- int
|
- int
|
||||||
- 4
|
- 4
|
||||||
- Number of CPUs
|
- Number of CPUs
|
||||||
* - resources.requests.memory
|
* - resources.requests.memory
|
||||||
- string
|
- string
|
||||||
- "16Gi"
|
- "16Gi"
|
||||||
- CPU memory configuration
|
- CPU memory configuration
|
||||||
* - secrets
|
* - secrets
|
||||||
- object
|
- object
|
||||||
- {}
|
- {}
|
||||||
- Secrets configuration
|
- Secrets configuration
|
||||||
* - serviceName
|
* - serviceName
|
||||||
- string
|
- string
|
||||||
-
|
-
|
||||||
- Service name
|
- Service name
|
||||||
* - servicePort
|
* - servicePort
|
||||||
- int
|
- int
|
||||||
- 80
|
- 80
|
||||||
- Service port
|
- Service port
|
||||||
* - labels.environment
|
* - labels.environment
|
||||||
- string
|
- string
|
||||||
- test
|
- test
|
||||||
- Environment name
|
- Environment name
|
||||||
* - labels.release
|
* - labels.release
|
||||||
- string
|
- string
|
||||||
- test
|
- test
|
||||||
- Release name
|
- Release name
|
||||||
|
Loading…
x
Reference in New Issue
Block a user