[Doc] Convert list tables to MyST (#11594)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
This commit is contained in:
Cyrus Leung 2024-12-29 15:56:22 +08:00 committed by GitHub
parent 4fb8e329fd
commit 32b4c63f02
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
6 changed files with 951 additions and 965 deletions

View File

@ -197,4 +197,4 @@ if __name__ == '__main__':
## Known Issues
- In `v0.5.2`, `v0.5.3`, and `v0.5.3.post1`, there is a bug caused by [zmq](https://github.com/zeromq/pyzmq/issues/2000) , which can occasionally cause vLLM to hang depending on the machine configuration. The solution is to upgrade to the latest version of `vllm` to include the [fix](gh-pr:6759).
- To circumvent a NCCL [bug](https://github.com/NVIDIA/nccl/issues/1234) , all vLLM processes will set an environment variable ``NCCL_CUMEM_ENABLE=0`` to disable NCCL's ``cuMem`` allocator. It does not affect performance but only gives memory benefits. When external processes want to set up a NCCL connection with vLLM's processes, they should also set this environment variable, otherwise, inconsistent environment setup will cause NCCL to hang or crash, as observed in the [RLHF integration](https://github.com/OpenRLHF/OpenRLHF/pull/604) and the [discussion](gh-issue:5723#issuecomment-2554389656) .
- To circumvent a NCCL [bug](https://github.com/NVIDIA/nccl/issues/1234) , all vLLM processes will set an environment variable `NCCL_CUMEM_ENABLE=0` to disable NCCL's `cuMem` allocator. It does not affect performance but only gives memory benefits. When external processes want to set up a NCCL connection with vLLM's processes, they should also set this environment variable, otherwise, inconsistent environment setup will cause NCCL to hang or crash, as observed in the [RLHF integration](https://github.com/OpenRLHF/OpenRLHF/pull/604) and the [discussion](gh-issue:5723#issuecomment-2554389656) .

View File

@ -141,24 +141,23 @@ Gaudi2 devices. Configurations that are not listed may or may not work.
Currently in vLLM for HPU we support four execution modes, depending on selected HPU PyTorch Bridge backend (via `PT_HPU_LAZY_MODE` environment variable), and `--enforce-eager` flag.
```{eval-rst}
.. list-table:: vLLM execution modes
:widths: 25 25 50
:header-rows: 1
```{list-table} vLLM execution modes
:widths: 25 25 50
:header-rows: 1
* - ``PT_HPU_LAZY_MODE``
- ``enforce_eager``
* - `PT_HPU_LAZY_MODE`
- `enforce_eager`
- execution mode
* - 0
* - 0
- 0
- torch.compile
* - 0
* - 0
- 1
- PyTorch eager mode
* - 1
* - 1
- 0
- HPU Graphs
* - 1
* - 1
- 1
- PyTorch lazy mode
```

View File

@ -68,30 +68,29 @@ gcloud alpha compute tpus queued-resources create QUEUED_RESOURCE_ID \
--service-account SERVICE_ACCOUNT
```
```{eval-rst}
.. list-table:: Parameter descriptions
:header-rows: 1
```{list-table} Parameter descriptions
:header-rows: 1
* - Parameter name
* - Parameter name
- Description
* - QUEUED_RESOURCE_ID
* - QUEUED_RESOURCE_ID
- The user-assigned ID of the queued resource request.
* - TPU_NAME
* - TPU_NAME
- The user-assigned name of the TPU which is created when the queued
resource request is allocated.
* - PROJECT_ID
* - PROJECT_ID
- Your Google Cloud project
* - ZONE
* - ZONE
- The GCP zone where you want to create your Cloud TPU. The value you use
depends on the version of TPUs you are using. For more information, see
`TPU regions and zones <https://cloud.google.com/tpu/docs/regions-zones>`_
* - ACCELERATOR_TYPE
* - ACCELERATOR_TYPE
- The TPU version you want to use. Specify the TPU version, for example
`v5litepod-4` specifies a v5e TPU with 4 cores. For more information,
see `TPU versions <https://cloud.devsite.corp.google.com/tpu/docs/system-architecture-tpu-vm#versions>`_.
* - RUNTIME_VERSION
* - RUNTIME_VERSION
- The TPU VM runtime version to use. For more information see `TPU VM images <https://cloud.google.com/tpu/docs/runtimes>`_.
* - SERVICE_ACCOUNT
* - SERVICE_ACCOUNT
- The email address for your service account. You can find it in the IAM
Cloud Console under *Service Accounts*. For example:
`tpu-service-account@<your_project_ID>.iam.gserviceaccount.com`

View File

@ -72,289 +72,288 @@ See [this page](#generative-models) for more information on how to use generativ
#### Text Generation (`--task generate`)
```{eval-rst}
.. list-table::
:widths: 25 25 50 5 5
:header-rows: 1
```{list-table}
:widths: 25 25 50 5 5
:header-rows: 1
* - Architecture
* - Architecture
- Models
- Example HF Models
- :ref:`LoRA <lora-adapter>`
- :ref:`PP <distributed-serving>`
* - :code:`AquilaForCausalLM`
- [LoRA](#lora-adapter)
- [PP](#distributed-serving)
* - `AquilaForCausalLM`
- Aquila, Aquila2
- :code:`BAAI/Aquila-7B`, :code:`BAAI/AquilaChat-7B`, etc.
- `BAAI/Aquila-7B`, `BAAI/AquilaChat-7B`, etc.
- ✅︎
- ✅︎
* - :code:`ArcticForCausalLM`
* - `ArcticForCausalLM`
- Arctic
- :code:`Snowflake/snowflake-arctic-base`, :code:`Snowflake/snowflake-arctic-instruct`, etc.
- `Snowflake/snowflake-arctic-base`, `Snowflake/snowflake-arctic-instruct`, etc.
-
- ✅︎
* - :code:`BaiChuanForCausalLM`
* - `BaiChuanForCausalLM`
- Baichuan2, Baichuan
- :code:`baichuan-inc/Baichuan2-13B-Chat`, :code:`baichuan-inc/Baichuan-7B`, etc.
- `baichuan-inc/Baichuan2-13B-Chat`, `baichuan-inc/Baichuan-7B`, etc.
- ✅︎
- ✅︎
* - :code:`BloomForCausalLM`
* - `BloomForCausalLM`
- BLOOM, BLOOMZ, BLOOMChat
- :code:`bigscience/bloom`, :code:`bigscience/bloomz`, etc.
- `bigscience/bloom`, `bigscience/bloomz`, etc.
-
- ✅︎
* - :code:`BartForConditionalGeneration`
* - `BartForConditionalGeneration`
- BART
- :code:`facebook/bart-base`, :code:`facebook/bart-large-cnn`, etc.
- `facebook/bart-base`, `facebook/bart-large-cnn`, etc.
-
-
* - :code:`ChatGLMModel`
* - `ChatGLMModel`
- ChatGLM
- :code:`THUDM/chatglm2-6b`, :code:`THUDM/chatglm3-6b`, etc.
- `THUDM/chatglm2-6b`, `THUDM/chatglm3-6b`, etc.
- ✅︎
- ✅︎
* - :code:`CohereForCausalLM`, :code:`Cohere2ForCausalLM`
* - `CohereForCausalLM`, `Cohere2ForCausalLM`
- Command-R
- :code:`CohereForAI/c4ai-command-r-v01`, :code:`CohereForAI/c4ai-command-r7b-12-2024`, etc.
- `CohereForAI/c4ai-command-r-v01`, `CohereForAI/c4ai-command-r7b-12-2024`, etc.
- ✅︎
- ✅︎
* - :code:`DbrxForCausalLM`
* - `DbrxForCausalLM`
- DBRX
- :code:`databricks/dbrx-base`, :code:`databricks/dbrx-instruct`, etc.
- `databricks/dbrx-base`, `databricks/dbrx-instruct`, etc.
-
- ✅︎
* - :code:`DeciLMForCausalLM`
* - `DeciLMForCausalLM`
- DeciLM
- :code:`Deci/DeciLM-7B`, :code:`Deci/DeciLM-7B-instruct`, etc.
- `Deci/DeciLM-7B`, `Deci/DeciLM-7B-instruct`, etc.
-
- ✅︎
* - :code:`DeepseekForCausalLM`
* - `DeepseekForCausalLM`
- DeepSeek
- :code:`deepseek-ai/deepseek-llm-67b-base`, :code:`deepseek-ai/deepseek-llm-7b-chat` etc.
- `deepseek-ai/deepseek-llm-67b-base`, `deepseek-ai/deepseek-llm-7b-chat` etc.
-
- ✅︎
* - :code:`DeepseekV2ForCausalLM`
* - `DeepseekV2ForCausalLM`
- DeepSeek-V2
- :code:`deepseek-ai/DeepSeek-V2`, :code:`deepseek-ai/DeepSeek-V2-Chat` etc.
- `deepseek-ai/DeepSeek-V2`, `deepseek-ai/DeepSeek-V2-Chat` etc.
-
- ✅︎
* - :code:`DeepseekV3ForCausalLM`
* - `DeepseekV3ForCausalLM`
- DeepSeek-V3
- :code:`deepseek-ai/DeepSeek-V3-Base`, :code:`deepseek-ai/DeepSeek-V3` etc.
- `deepseek-ai/DeepSeek-V3-Base`, `deepseek-ai/DeepSeek-V3` etc.
-
- ✅︎
* - :code:`ExaoneForCausalLM`
* - `ExaoneForCausalLM`
- EXAONE-3
- :code:`LGAI-EXAONE/EXAONE-3.0-7.8B-Instruct`, etc.
- `LGAI-EXAONE/EXAONE-3.0-7.8B-Instruct`, etc.
- ✅︎
- ✅︎
* - :code:`FalconForCausalLM`
* - `FalconForCausalLM`
- Falcon
- :code:`tiiuae/falcon-7b`, :code:`tiiuae/falcon-40b`, :code:`tiiuae/falcon-rw-7b`, etc.
- `tiiuae/falcon-7b`, `tiiuae/falcon-40b`, `tiiuae/falcon-rw-7b`, etc.
-
- ✅︎
* - :code:`FalconMambaForCausalLM`
* - `FalconMambaForCausalLM`
- FalconMamba
- :code:`tiiuae/falcon-mamba-7b`, :code:`tiiuae/falcon-mamba-7b-instruct`, etc.
- `tiiuae/falcon-mamba-7b`, `tiiuae/falcon-mamba-7b-instruct`, etc.
- ✅︎
- ✅︎
* - :code:`GemmaForCausalLM`
* - `GemmaForCausalLM`
- Gemma
- :code:`google/gemma-2b`, :code:`google/gemma-7b`, etc.
- `google/gemma-2b`, `google/gemma-7b`, etc.
- ✅︎
- ✅︎
* - :code:`Gemma2ForCausalLM`
* - `Gemma2ForCausalLM`
- Gemma2
- :code:`google/gemma-2-9b`, :code:`google/gemma-2-27b`, etc.
- `google/gemma-2-9b`, `google/gemma-2-27b`, etc.
- ✅︎
- ✅︎
* - :code:`GlmForCausalLM`
* - `GlmForCausalLM`
- GLM-4
- :code:`THUDM/glm-4-9b-chat-hf`, etc.
- `THUDM/glm-4-9b-chat-hf`, etc.
- ✅︎
- ✅︎
* - :code:`GPT2LMHeadModel`
* - `GPT2LMHeadModel`
- GPT-2
- :code:`gpt2`, :code:`gpt2-xl`, etc.
- `gpt2`, `gpt2-xl`, etc.
-
- ✅︎
* - :code:`GPTBigCodeForCausalLM`
* - `GPTBigCodeForCausalLM`
- StarCoder, SantaCoder, WizardCoder
- :code:`bigcode/starcoder`, :code:`bigcode/gpt_bigcode-santacoder`, :code:`WizardLM/WizardCoder-15B-V1.0`, etc.
- `bigcode/starcoder`, `bigcode/gpt_bigcode-santacoder`, `WizardLM/WizardCoder-15B-V1.0`, etc.
- ✅︎
- ✅︎
* - :code:`GPTJForCausalLM`
* - `GPTJForCausalLM`
- GPT-J
- :code:`EleutherAI/gpt-j-6b`, :code:`nomic-ai/gpt4all-j`, etc.
- `EleutherAI/gpt-j-6b`, `nomic-ai/gpt4all-j`, etc.
-
- ✅︎
* - :code:`GPTNeoXForCausalLM`
* - `GPTNeoXForCausalLM`
- GPT-NeoX, Pythia, OpenAssistant, Dolly V2, StableLM
- :code:`EleutherAI/gpt-neox-20b`, :code:`EleutherAI/pythia-12b`, :code:`OpenAssistant/oasst-sft-4-pythia-12b-epoch-3.5`, :code:`databricks/dolly-v2-12b`, :code:`stabilityai/stablelm-tuned-alpha-7b`, etc.
- `EleutherAI/gpt-neox-20b`, `EleutherAI/pythia-12b`, `OpenAssistant/oasst-sft-4-pythia-12b-epoch-3.5`, `databricks/dolly-v2-12b`, `stabilityai/stablelm-tuned-alpha-7b`, etc.
-
- ✅︎
* - :code:`GraniteForCausalLM`
* - `GraniteForCausalLM`
- Granite 3.0, Granite 3.1, PowerLM
- :code:`ibm-granite/granite-3.0-2b-base`, :code:`ibm-granite/granite-3.1-8b-instruct`, :code:`ibm/PowerLM-3b`, etc.
- `ibm-granite/granite-3.0-2b-base`, `ibm-granite/granite-3.1-8b-instruct`, `ibm/PowerLM-3b`, etc.
- ✅︎
- ✅︎
* - :code:`GraniteMoeForCausalLM`
* - `GraniteMoeForCausalLM`
- Granite 3.0 MoE, PowerMoE
- :code:`ibm-granite/granite-3.0-1b-a400m-base`, :code:`ibm-granite/granite-3.0-3b-a800m-instruct`, :code:`ibm/PowerMoE-3b`, etc.
- `ibm-granite/granite-3.0-1b-a400m-base`, `ibm-granite/granite-3.0-3b-a800m-instruct`, `ibm/PowerMoE-3b`, etc.
- ✅︎
- ✅︎
* - :code:`GritLM`
* - `GritLM`
- GritLM
- :code:`parasail-ai/GritLM-7B-vllm`.
- `parasail-ai/GritLM-7B-vllm`.
- ✅︎
- ✅︎
* - :code:`InternLMForCausalLM`
* - `InternLMForCausalLM`
- InternLM
- :code:`internlm/internlm-7b`, :code:`internlm/internlm-chat-7b`, etc.
- `internlm/internlm-7b`, `internlm/internlm-chat-7b`, etc.
- ✅︎
- ✅︎
* - :code:`InternLM2ForCausalLM`
* - `InternLM2ForCausalLM`
- InternLM2
- :code:`internlm/internlm2-7b`, :code:`internlm/internlm2-chat-7b`, etc.
- `internlm/internlm2-7b`, `internlm/internlm2-chat-7b`, etc.
- ✅︎
- ✅︎
* - :code:`JAISLMHeadModel`
* - `JAISLMHeadModel`
- Jais
- :code:`inceptionai/jais-13b`, :code:`inceptionai/jais-13b-chat`, :code:`inceptionai/jais-30b-v3`, :code:`inceptionai/jais-30b-chat-v3`, etc.
- `inceptionai/jais-13b`, `inceptionai/jais-13b-chat`, `inceptionai/jais-30b-v3`, `inceptionai/jais-30b-chat-v3`, etc.
-
- ✅︎
* - :code:`JambaForCausalLM`
* - `JambaForCausalLM`
- Jamba
- :code:`ai21labs/AI21-Jamba-1.5-Large`, :code:`ai21labs/AI21-Jamba-1.5-Mini`, :code:`ai21labs/Jamba-v0.1`, etc.
- `ai21labs/AI21-Jamba-1.5-Large`, `ai21labs/AI21-Jamba-1.5-Mini`, `ai21labs/Jamba-v0.1`, etc.
- ✅︎
- ✅︎
* - :code:`LlamaForCausalLM`
* - `LlamaForCausalLM`
- Llama 3.1, Llama 3, Llama 2, LLaMA, Yi
- :code:`meta-llama/Meta-Llama-3.1-405B-Instruct`, :code:`meta-llama/Meta-Llama-3.1-70B`, :code:`meta-llama/Meta-Llama-3-70B-Instruct`, :code:`meta-llama/Llama-2-70b-hf`, :code:`01-ai/Yi-34B`, etc.
- `meta-llama/Meta-Llama-3.1-405B-Instruct`, `meta-llama/Meta-Llama-3.1-70B`, `meta-llama/Meta-Llama-3-70B-Instruct`, `meta-llama/Llama-2-70b-hf`, `01-ai/Yi-34B`, etc.
- ✅︎
- ✅︎
* - :code:`MambaForCausalLM`
* - `MambaForCausalLM`
- Mamba
- :code:`state-spaces/mamba-130m-hf`, :code:`state-spaces/mamba-790m-hf`, :code:`state-spaces/mamba-2.8b-hf`, etc.
- `state-spaces/mamba-130m-hf`, `state-spaces/mamba-790m-hf`, `state-spaces/mamba-2.8b-hf`, etc.
-
- ✅︎
* - :code:`MiniCPMForCausalLM`
* - `MiniCPMForCausalLM`
- MiniCPM
- :code:`openbmb/MiniCPM-2B-sft-bf16`, :code:`openbmb/MiniCPM-2B-dpo-bf16`, :code:`openbmb/MiniCPM-S-1B-sft`, etc.
- `openbmb/MiniCPM-2B-sft-bf16`, `openbmb/MiniCPM-2B-dpo-bf16`, `openbmb/MiniCPM-S-1B-sft`, etc.
- ✅︎
- ✅︎
* - :code:`MiniCPM3ForCausalLM`
* - `MiniCPM3ForCausalLM`
- MiniCPM3
- :code:`openbmb/MiniCPM3-4B`, etc.
- `openbmb/MiniCPM3-4B`, etc.
- ✅︎
- ✅︎
* - :code:`MistralForCausalLM`
* - `MistralForCausalLM`
- Mistral, Mistral-Instruct
- :code:`mistralai/Mistral-7B-v0.1`, :code:`mistralai/Mistral-7B-Instruct-v0.1`, etc.
- `mistralai/Mistral-7B-v0.1`, `mistralai/Mistral-7B-Instruct-v0.1`, etc.
- ✅︎
- ✅︎
* - :code:`MixtralForCausalLM`
* - `MixtralForCausalLM`
- Mixtral-8x7B, Mixtral-8x7B-Instruct
- :code:`mistralai/Mixtral-8x7B-v0.1`, :code:`mistralai/Mixtral-8x7B-Instruct-v0.1`, :code:`mistral-community/Mixtral-8x22B-v0.1`, etc.
- `mistralai/Mixtral-8x7B-v0.1`, `mistralai/Mixtral-8x7B-Instruct-v0.1`, `mistral-community/Mixtral-8x22B-v0.1`, etc.
- ✅︎
- ✅︎
* - :code:`MPTForCausalLM`
* - `MPTForCausalLM`
- MPT, MPT-Instruct, MPT-Chat, MPT-StoryWriter
- :code:`mosaicml/mpt-7b`, :code:`mosaicml/mpt-7b-storywriter`, :code:`mosaicml/mpt-30b`, etc.
- `mosaicml/mpt-7b`, `mosaicml/mpt-7b-storywriter`, `mosaicml/mpt-30b`, etc.
-
- ✅︎
* - :code:`NemotronForCausalLM`
* - `NemotronForCausalLM`
- Nemotron-3, Nemotron-4, Minitron
- :code:`nvidia/Minitron-8B-Base`, :code:`mgoin/Nemotron-4-340B-Base-hf-FP8`, etc.
- `nvidia/Minitron-8B-Base`, `mgoin/Nemotron-4-340B-Base-hf-FP8`, etc.
- ✅︎
- ✅︎
* - :code:`OLMoForCausalLM`
* - `OLMoForCausalLM`
- OLMo
- :code:`allenai/OLMo-1B-hf`, :code:`allenai/OLMo-7B-hf`, etc.
- `allenai/OLMo-1B-hf`, `allenai/OLMo-7B-hf`, etc.
-
- ✅︎
* - :code:`OLMo2ForCausalLM`
* - `OLMo2ForCausalLM`
- OLMo2
- :code:`allenai/OLMo2-7B-1124`, etc.
- `allenai/OLMo2-7B-1124`, etc.
-
- ✅︎
* - :code:`OLMoEForCausalLM`
* - `OLMoEForCausalLM`
- OLMoE
- :code:`allenai/OLMoE-1B-7B-0924`, :code:`allenai/OLMoE-1B-7B-0924-Instruct`, etc.
- `allenai/OLMoE-1B-7B-0924`, `allenai/OLMoE-1B-7B-0924-Instruct`, etc.
- ✅︎
- ✅︎
* - :code:`OPTForCausalLM`
* - `OPTForCausalLM`
- OPT, OPT-IML
- :code:`facebook/opt-66b`, :code:`facebook/opt-iml-max-30b`, etc.
- `facebook/opt-66b`, `facebook/opt-iml-max-30b`, etc.
-
- ✅︎
* - :code:`OrionForCausalLM`
* - `OrionForCausalLM`
- Orion
- :code:`OrionStarAI/Orion-14B-Base`, :code:`OrionStarAI/Orion-14B-Chat`, etc.
- `OrionStarAI/Orion-14B-Base`, `OrionStarAI/Orion-14B-Chat`, etc.
-
- ✅︎
* - :code:`PhiForCausalLM`
* - `PhiForCausalLM`
- Phi
- :code:`microsoft/phi-1_5`, :code:`microsoft/phi-2`, etc.
- `microsoft/phi-1_5`, `microsoft/phi-2`, etc.
- ✅︎
- ✅︎
* - :code:`Phi3ForCausalLM`
* - `Phi3ForCausalLM`
- Phi-3
- :code:`microsoft/Phi-3-mini-4k-instruct`, :code:`microsoft/Phi-3-mini-128k-instruct`, :code:`microsoft/Phi-3-medium-128k-instruct`, etc.
- `microsoft/Phi-3-mini-4k-instruct`, `microsoft/Phi-3-mini-128k-instruct`, `microsoft/Phi-3-medium-128k-instruct`, etc.
- ✅︎
- ✅︎
* - :code:`Phi3SmallForCausalLM`
* - `Phi3SmallForCausalLM`
- Phi-3-Small
- :code:`microsoft/Phi-3-small-8k-instruct`, :code:`microsoft/Phi-3-small-128k-instruct`, etc.
- `microsoft/Phi-3-small-8k-instruct`, `microsoft/Phi-3-small-128k-instruct`, etc.
-
- ✅︎
* - :code:`PhiMoEForCausalLM`
* - `PhiMoEForCausalLM`
- Phi-3.5-MoE
- :code:`microsoft/Phi-3.5-MoE-instruct`, etc.
- `microsoft/Phi-3.5-MoE-instruct`, etc.
- ✅︎
- ✅︎
* - :code:`PersimmonForCausalLM`
* - `PersimmonForCausalLM`
- Persimmon
- :code:`adept/persimmon-8b-base`, :code:`adept/persimmon-8b-chat`, etc.
- `adept/persimmon-8b-base`, `adept/persimmon-8b-chat`, etc.
-
- ✅︎
* - :code:`QWenLMHeadModel`
* - `QWenLMHeadModel`
- Qwen
- :code:`Qwen/Qwen-7B`, :code:`Qwen/Qwen-7B-Chat`, etc.
- `Qwen/Qwen-7B`, `Qwen/Qwen-7B-Chat`, etc.
- ✅︎
- ✅︎
* - :code:`Qwen2ForCausalLM`
* - `Qwen2ForCausalLM`
- Qwen2
- :code:`Qwen/QwQ-32B-Preview`, :code:`Qwen/Qwen2-7B-Instruct`, :code:`Qwen/Qwen2-7B`, etc.
- `Qwen/QwQ-32B-Preview`, `Qwen/Qwen2-7B-Instruct`, `Qwen/Qwen2-7B`, etc.
- ✅︎
- ✅︎
* - :code:`Qwen2MoeForCausalLM`
* - `Qwen2MoeForCausalLM`
- Qwen2MoE
- :code:`Qwen/Qwen1.5-MoE-A2.7B`, :code:`Qwen/Qwen1.5-MoE-A2.7B-Chat`, etc.
- `Qwen/Qwen1.5-MoE-A2.7B`, `Qwen/Qwen1.5-MoE-A2.7B-Chat`, etc.
-
- ✅︎
* - :code:`StableLmForCausalLM`
* - `StableLmForCausalLM`
- StableLM
- :code:`stabilityai/stablelm-3b-4e1t`, :code:`stabilityai/stablelm-base-alpha-7b-v2`, etc.
- `stabilityai/stablelm-3b-4e1t`, `stabilityai/stablelm-base-alpha-7b-v2`, etc.
-
- ✅︎
* - :code:`Starcoder2ForCausalLM`
* - `Starcoder2ForCausalLM`
- Starcoder2
- :code:`bigcode/starcoder2-3b`, :code:`bigcode/starcoder2-7b`, :code:`bigcode/starcoder2-15b`, etc.
- `bigcode/starcoder2-3b`, `bigcode/starcoder2-7b`, `bigcode/starcoder2-15b`, etc.
-
- ✅︎
* - :code:`SolarForCausalLM`
* - `SolarForCausalLM`
- Solar Pro
- :code:`upstage/solar-pro-preview-instruct`, etc.
- `upstage/solar-pro-preview-instruct`, etc.
- ✅︎
- ✅︎
* - :code:`TeleChat2ForCausalLM`
* - `TeleChat2ForCausalLM`
- TeleChat2
- :code:`TeleAI/TeleChat2-3B`, :code:`TeleAI/TeleChat2-7B`, :code:`TeleAI/TeleChat2-35B`, etc.
- `TeleAI/TeleChat2-3B`, `TeleAI/TeleChat2-7B`, `TeleAI/TeleChat2-35B`, etc.
- ✅︎
- ✅︎
* - :code:`XverseForCausalLM`
* - `XverseForCausalLM`
- XVERSE
- :code:`xverse/XVERSE-7B-Chat`, :code:`xverse/XVERSE-13B-Chat`, :code:`xverse/XVERSE-65B-Chat`, etc.
- `xverse/XVERSE-7B-Chat`, `xverse/XVERSE-13B-Chat`, `xverse/XVERSE-65B-Chat`, etc.
- ✅︎
- ✅︎
```
@ -374,49 +373,48 @@ you should explicitly specify the task type to ensure that the model is used in
#### Text Embedding (`--task embed`)
```{eval-rst}
.. list-table::
:widths: 25 25 50 5 5
:header-rows: 1
```{list-table}
:widths: 25 25 50 5 5
:header-rows: 1
* - Architecture
* - Architecture
- Models
- Example HF Models
- :ref:`LoRA <lora-adapter>`
- :ref:`PP <distributed-serving>`
* - :code:`BertModel`
- [LoRA](#lora-adapter)
- [PP](#distributed-serving)
* - `BertModel`
- BERT-based
- :code:`BAAI/bge-base-en-v1.5`, etc.
- `BAAI/bge-base-en-v1.5`, etc.
-
-
* - :code:`Gemma2Model`
* - `Gemma2Model`
- Gemma2-based
- :code:`BAAI/bge-multilingual-gemma2`, etc.
- `BAAI/bge-multilingual-gemma2`, etc.
-
- ✅︎
* - :code:`GritLM`
* - `GritLM`
- GritLM
- :code:`parasail-ai/GritLM-7B-vllm`.
- `parasail-ai/GritLM-7B-vllm`.
- ✅︎
- ✅︎
* - :code:`LlamaModel`, :code:`LlamaForCausalLM`, :code:`MistralModel`, etc.
* - `LlamaModel`, `LlamaForCausalLM`, `MistralModel`, etc.
- Llama-based
- :code:`intfloat/e5-mistral-7b-instruct`, etc.
- `intfloat/e5-mistral-7b-instruct`, etc.
- ✅︎
- ✅︎
* - :code:`Qwen2Model`, :code:`Qwen2ForCausalLM`
* - `Qwen2Model`, `Qwen2ForCausalLM`
- Qwen2-based
- :code:`ssmits/Qwen2-7B-Instruct-embed-base` (see note), :code:`Alibaba-NLP/gte-Qwen2-7B-instruct` (see note), etc.
- `ssmits/Qwen2-7B-Instruct-embed-base` (see note), `Alibaba-NLP/gte-Qwen2-7B-instruct` (see note), etc.
- ✅︎
- ✅︎
* - :code:`RobertaModel`, :code:`RobertaForMaskedLM`
* - `RobertaModel`, `RobertaForMaskedLM`
- RoBERTa-based
- :code:`sentence-transformers/all-roberta-large-v1`, :code:`sentence-transformers/all-roberta-large-v1`, etc.
- `sentence-transformers/all-roberta-large-v1`, `sentence-transformers/all-roberta-large-v1`, etc.
-
-
* - :code:`XLMRobertaModel`
* - `XLMRobertaModel`
- XLM-RoBERTa-based
- :code:`intfloat/multilingual-e5-large`, etc.
- `intfloat/multilingual-e5-large`, etc.
-
-
```
@ -440,29 +438,28 @@ of the whole prompt are extracted from the normalized hidden state corresponding
#### Reward Modeling (`--task reward`)
```{eval-rst}
.. list-table::
:widths: 25 25 50 5 5
:header-rows: 1
```{list-table}
:widths: 25 25 50 5 5
:header-rows: 1
* - Architecture
* - Architecture
- Models
- Example HF Models
- :ref:`LoRA <lora-adapter>`
- :ref:`PP <distributed-serving>`
* - :code:`InternLM2ForRewardModel`
- [LoRA](#lora-adapter)
- [PP](#distributed-serving)
* - `InternLM2ForRewardModel`
- InternLM2-based
- :code:`internlm/internlm2-1_8b-reward`, :code:`internlm/internlm2-7b-reward`, etc.
- `internlm/internlm2-1_8b-reward`, `internlm/internlm2-7b-reward`, etc.
- ✅︎
- ✅︎
* - :code:`LlamaForCausalLM`
* - `LlamaForCausalLM`
- Llama-based
- :code:`peiyi9979/math-shepherd-mistral-7b-prm`, etc.
- `peiyi9979/math-shepherd-mistral-7b-prm`, etc.
- ✅︎
- ✅︎
* - :code:`Qwen2ForRewardModel`
* - `Qwen2ForRewardModel`
- Qwen2-based
- :code:`Qwen/Qwen2.5-Math-RM-72B`, etc.
- `Qwen/Qwen2.5-Math-RM-72B`, etc.
- ✅︎
- ✅︎
```
@ -477,24 +474,23 @@ e.g.: {code}`--override-pooler-config '{"pooling_type": "STEP", "step_tag_id": 1
#### Classification (`--task classify`)
```{eval-rst}
.. list-table::
:widths: 25 25 50 5 5
:header-rows: 1
```{list-table}
:widths: 25 25 50 5 5
:header-rows: 1
* - Architecture
* - Architecture
- Models
- Example HF Models
- :ref:`LoRA <lora-adapter>`
- :ref:`PP <distributed-serving>`
* - :code:`JambaForSequenceClassification`
- [LoRA](#lora-adapter)
- [PP](#distributed-serving)
* - `JambaForSequenceClassification`
- Jamba
- :code:`ai21labs/Jamba-tiny-reward-dev`, etc.
- `ai21labs/Jamba-tiny-reward-dev`, etc.
- ✅︎
- ✅︎
* - :code:`Qwen2ForSequenceClassification`
* - `Qwen2ForSequenceClassification`
- Qwen2-based
- :code:`jason9693/Qwen2.5-1.5B-apeach`, etc.
- `jason9693/Qwen2.5-1.5B-apeach`, etc.
- ✅︎
- ✅︎
```
@ -504,29 +500,28 @@ If your model is not in the above list, we will try to automatically convert the
#### Sentence Pair Scoring (`--task score`)
```{eval-rst}
.. list-table::
:widths: 25 25 50 5 5
:header-rows: 1
```{list-table}
:widths: 25 25 50 5 5
:header-rows: 1
* - Architecture
* - Architecture
- Models
- Example HF Models
- :ref:`LoRA <lora-adapter>`
- :ref:`PP <distributed-serving>`
* - :code:`BertForSequenceClassification`
- [LoRA](#lora-adapter)
- [PP](#distributed-serving)
* - `BertForSequenceClassification`
- BERT-based
- :code:`cross-encoder/ms-marco-MiniLM-L-6-v2`, etc.
- `cross-encoder/ms-marco-MiniLM-L-6-v2`, etc.
-
-
* - :code:`RobertaForSequenceClassification`
* - `RobertaForSequenceClassification`
- RoBERTa-based
- :code:`cross-encoder/quora-roberta-base`, etc.
- `cross-encoder/quora-roberta-base`, etc.
-
-
* - :code:`XLMRobertaForSequenceClassification`
* - `XLMRobertaForSequenceClassification`
- XLM-RoBERTa-based
- :code:`BAAI/bge-reranker-v2-m3`, etc.
- `BAAI/bge-reranker-v2-m3`, etc.
-
-
```
@ -558,186 +553,182 @@ See [this page](#generative-models) for more information on how to use generativ
#### Text Generation (`--task generate`)
```{eval-rst}
.. list-table::
:widths: 25 25 15 20 5 5 5
:header-rows: 1
```{list-table}
:widths: 25 25 15 20 5 5 5
:header-rows: 1
* - Architecture
* - Architecture
- Models
- Inputs
- Example HF Models
- :ref:`LoRA <lora-adapter>`
- :ref:`PP <distributed-serving>`
- V1
* - :code:`AriaForConditionalGeneration`
- [LoRA](#lora-adapter)
- [PP](#distributed-serving)
- [V1](gh-issue:8779)
* - `AriaForConditionalGeneration`
- Aria
- T + I
- :code:`rhymes-ai/Aria`
- `rhymes-ai/Aria`
-
- ✅︎
-
* - :code:`Blip2ForConditionalGeneration`
* - `Blip2ForConditionalGeneration`
- BLIP-2
- T + I\ :sup:`E`
- :code:`Salesforce/blip2-opt-2.7b`, :code:`Salesforce/blip2-opt-6.7b`, etc.
- T + I<sup>E</sup>
- `Salesforce/blip2-opt-2.7b`, `Salesforce/blip2-opt-6.7b`, etc.
-
- ✅︎
-
* - :code:`ChameleonForConditionalGeneration`
* - `ChameleonForConditionalGeneration`
- Chameleon
- T + I
- :code:`facebook/chameleon-7b` etc.
- `facebook/chameleon-7b` etc.
-
- ✅︎
-
* - :code:`FuyuForCausalLM`
* - `FuyuForCausalLM`
- Fuyu
- T + I
- :code:`adept/fuyu-8b` etc.
- `adept/fuyu-8b` etc.
-
- ✅︎
-
* - :code:`ChatGLMModel`
* - `ChatGLMModel`
- GLM-4V
- T + I
- :code:`THUDM/glm-4v-9b` etc.
- `THUDM/glm-4v-9b` etc.
- ✅︎
- ✅︎
-
* - :code:`H2OVLChatModel`
* - `H2OVLChatModel`
- H2OVL
- T + I\ :sup:`E+`
- :code:`h2oai/h2ovl-mississippi-800m`, :code:`h2oai/h2ovl-mississippi-2b`, etc.
- T + I<sup>E+</sup>
- `h2oai/h2ovl-mississippi-800m`, `h2oai/h2ovl-mississippi-2b`, etc.
-
- ✅︎
-
* - :code:`Idefics3ForConditionalGeneration`
* - `Idefics3ForConditionalGeneration`
- Idefics3
- T + I
- :code:`HuggingFaceM4/Idefics3-8B-Llama3` etc.
- `HuggingFaceM4/Idefics3-8B-Llama3` etc.
- ✅︎
-
-
* - :code:`InternVLChatModel`
* - `InternVLChatModel`
- InternVL 2.5, Mono-InternVL, InternVL 2.0
- T + I\ :sup:`E+`
- :code:`OpenGVLab/InternVL2_5-4B`, :code:`OpenGVLab/Mono-InternVL-2B`, :code:`OpenGVLab/InternVL2-4B`, etc.
- T + I<sup>E+</sup>
- `OpenGVLab/InternVL2_5-4B`, `OpenGVLab/Mono-InternVL-2B`, `OpenGVLab/InternVL2-4B`, etc.
-
- ✅︎
- ✅︎
* - :code:`LlavaForConditionalGeneration`
* - `LlavaForConditionalGeneration`
- LLaVA-1.5
- T + I\ :sup:`E+`
- :code:`llava-hf/llava-1.5-7b-hf`, :code:`TIGER-Lab/Mantis-8B-siglip-llama3` (see note), etc.
- T + I<sup>E+</sup>
- `llava-hf/llava-1.5-7b-hf`, `TIGER-Lab/Mantis-8B-siglip-llama3` (see note), etc.
-
- ✅︎
- ✅︎
* - :code:`LlavaNextForConditionalGeneration`
* - `LlavaNextForConditionalGeneration`
- LLaVA-NeXT
- T + I\ :sup:`E+`
- :code:`llava-hf/llava-v1.6-mistral-7b-hf`, :code:`llava-hf/llava-v1.6-vicuna-7b-hf`, etc.
- T + I<sup>E+</sup>
- `llava-hf/llava-v1.6-mistral-7b-hf`, `llava-hf/llava-v1.6-vicuna-7b-hf`, etc.
-
- ✅︎
-
* - :code:`LlavaNextVideoForConditionalGeneration`
* - `LlavaNextVideoForConditionalGeneration`
- LLaVA-NeXT-Video
- T + V
- :code:`llava-hf/LLaVA-NeXT-Video-7B-hf`, etc.
- `llava-hf/LLaVA-NeXT-Video-7B-hf`, etc.
-
- ✅︎
-
* - :code:`LlavaOnevisionForConditionalGeneration`
* - `LlavaOnevisionForConditionalGeneration`
- LLaVA-Onevision
- T + I\ :sup:`+` + V\ :sup:`+`
- :code:`llava-hf/llava-onevision-qwen2-7b-ov-hf`, :code:`llava-hf/llava-onevision-qwen2-0.5b-ov-hf`, etc.
- T + I<sup>+</sup> + V<sup>+</sup>
- `llava-hf/llava-onevision-qwen2-7b-ov-hf`, `llava-hf/llava-onevision-qwen2-0.5b-ov-hf`, etc.
-
- ✅︎
-
* - :code:`MiniCPMV`
* - `MiniCPMV`
- MiniCPM-V
- T + I\ :sup:`E+`
- :code:`openbmb/MiniCPM-V-2` (see note), :code:`openbmb/MiniCPM-Llama3-V-2_5`, :code:`openbmb/MiniCPM-V-2_6`, etc.
- T + I<sup>E+</sup>
- `openbmb/MiniCPM-V-2` (see note), `openbmb/MiniCPM-Llama3-V-2_5`, `openbmb/MiniCPM-V-2_6`, etc.
- ✅︎
- ✅︎
-
* - :code:`MllamaForConditionalGeneration`
* - `MllamaForConditionalGeneration`
- Llama 3.2
- T + I\ :sup:`+`
- :code:`meta-llama/Llama-3.2-90B-Vision-Instruct`, :code:`meta-llama/Llama-3.2-11B-Vision`, etc.
- T + I<sup>+</sup>
- `meta-llama/Llama-3.2-90B-Vision-Instruct`, `meta-llama/Llama-3.2-11B-Vision`, etc.
-
-
-
* - :code:`MolmoForCausalLM`
* - `MolmoForCausalLM`
- Molmo
- T + I
- :code:`allenai/Molmo-7B-D-0924`, :code:`allenai/Molmo-72B-0924`, etc.
- `allenai/Molmo-7B-D-0924`, `allenai/Molmo-72B-0924`, etc.
-
- ✅︎
- ✅︎
* - :code:`NVLM_D_Model`
* - `NVLM_D_Model`
- NVLM-D 1.0
- T + I\ :sup:`E+`
- :code:`nvidia/NVLM-D-72B`, etc.
- T + I<sup>E+</sup>
- `nvidia/NVLM-D-72B`, etc.
-
- ✅︎
- ✅︎
* - :code:`PaliGemmaForConditionalGeneration`
* - `PaliGemmaForConditionalGeneration`
- PaliGemma, PaliGemma 2
- T + I\ :sup:`E`
- :code:`google/paligemma-3b-pt-224`, :code:`google/paligemma-3b-mix-224`, :code:`google/paligemma2-3b-ft-docci-448`, etc.
- T + I<sup>E</sup>
- `google/paligemma-3b-pt-224`, `google/paligemma-3b-mix-224`, `google/paligemma2-3b-ft-docci-448`, etc.
-
- ✅︎
-
* - :code:`Phi3VForCausalLM`
* - `Phi3VForCausalLM`
- Phi-3-Vision, Phi-3.5-Vision
- T + I\ :sup:`E+`
- :code:`microsoft/Phi-3-vision-128k-instruct`, :code:`microsoft/Phi-3.5-vision-instruct` etc.
- T + I<sup>E+</sup>
- `microsoft/Phi-3-vision-128k-instruct`, `microsoft/Phi-3.5-vision-instruct` etc.
-
- ✅︎
- ✅︎
* - :code:`PixtralForConditionalGeneration`
* - `PixtralForConditionalGeneration`
- Pixtral
- T + I\ :sup:`+`
- :code:`mistralai/Pixtral-12B-2409`, :code:`mistral-community/pixtral-12b` etc.
- T + I<sup>+</sup>
- `mistralai/Pixtral-12B-2409`, `mistral-community/pixtral-12b` etc.
-
- ✅︎
- ✅︎
* - :code:`QWenLMHeadModel`
* - `QWenLMHeadModel`
- Qwen-VL
- T + I\ :sup:`E+`
- :code:`Qwen/Qwen-VL`, :code:`Qwen/Qwen-VL-Chat`, etc.
- T + I<sup>E+</sup>
- `Qwen/Qwen-VL`, `Qwen/Qwen-VL-Chat`, etc.
- ✅︎
- ✅︎
-
* - :code:`Qwen2AudioForConditionalGeneration`
* - `Qwen2AudioForConditionalGeneration`
- Qwen2-Audio
- T + A\ :sup:`+`
- :code:`Qwen/Qwen2-Audio-7B-Instruct`
- T + A<sup>+</sup>
- `Qwen/Qwen2-Audio-7B-Instruct`
-
- ✅︎
-
* - :code:`Qwen2VLForConditionalGeneration`
* - `Qwen2VLForConditionalGeneration`
- Qwen2-VL
- T + I\ :sup:`E+` + V\ :sup:`E+`
- :code:`Qwen/QVQ-72B-Preview`, :code:`Qwen/Qwen2-VL-7B-Instruct`, :code:`Qwen/Qwen2-VL-72B-Instruct`, etc.
- T + I<sup>E+</sup> + V<sup>E+</sup>
- `Qwen/QVQ-72B-Preview`, `Qwen/Qwen2-VL-7B-Instruct`, `Qwen/Qwen2-VL-72B-Instruct`, etc.
- ✅︎
- ✅︎
-
* - :code:`UltravoxModel`
* - `UltravoxModel`
- Ultravox
- T + A\ :sup:`E+`
- :code:`fixie-ai/ultravox-v0_3`
- T + A<sup>E+</sup>
- `fixie-ai/ultravox-v0_3`
-
- ✅︎
-
```
```{eval-rst}
:sup:`E` Pre-computed embeddings can be inputted for this modality.
:sup:`+` Multiple items can be inputted per text prompt for this modality.
```
<sup>E</sup> Pre-computed embeddings can be inputted for this modality.
<sup>+</sup> Multiple items can be inputted per text prompt for this modality.
````{important}
To enable multiple multi-modal items per text prompt, you have to set {code}`limit_mm_per_prompt` (offline inference)
@ -787,38 +778,37 @@ To get the best results, you should use pooling models that are specifically tra
The following table lists those that are tested in vLLM.
```{eval-rst}
.. list-table::
:widths: 25 25 15 25 5 5
:header-rows: 1
```{list-table}
:widths: 25 25 15 25 5 5
:header-rows: 1
* - Architecture
* - Architecture
- Models
- Inputs
- Example HF Models
- :ref:`LoRA <lora-adapter>`
- :ref:`PP <distributed-serving>`
* - :code:`LlavaNextForConditionalGeneration`
- [LoRA](#lora-adapter)
- [PP](#distributed-serving)
* - `LlavaNextForConditionalGeneration`
- LLaVA-NeXT-based
- T / I
- :code:`royokong/e5-v`
- `royokong/e5-v`
-
- ✅︎
* - :code:`Phi3VForCausalLM`
* - `Phi3VForCausalLM`
- Phi-3-Vision-based
- T + I
- :code:`TIGER-Lab/VLM2Vec-Full`
- `TIGER-Lab/VLM2Vec-Full`
- 🚧
- ✅︎
* - :code:`Qwen2VLForConditionalGeneration`
* - `Qwen2VLForConditionalGeneration`
- Qwen2-VL-based
- T + I
- :code:`MrLight/dse-qwen2-2b-mrl-v1`
- `MrLight/dse-qwen2-2b-mrl-v1`
-
- ✅︎
```
______________________________________________________________________
_________________
# Model Support Policy

View File

@ -4,12 +4,11 @@
The table below shows the compatibility of various quantization implementations with different hardware platforms in vLLM:
```{eval-rst}
.. list-table::
:header-rows: 1
:widths: 20 8 8 8 8 8 8 8 8 8 8
```{list-table}
:header-rows: 1
:widths: 20 8 8 8 8 8 8 8 8 8 8
* - Implementation
* - Implementation
- Volta
- Turing
- Ampere
@ -20,7 +19,7 @@ The table below shows the compatibility of various quantization implementations
- x86 CPU
- AWS Inferentia
- Google TPU
* - AWQ
* - AWQ
- ✗
- ✅︎
- ✅︎
@ -31,7 +30,7 @@ The table below shows the compatibility of various quantization implementations
- ✅︎
- ✗
- ✗
* - GPTQ
* - GPTQ
- ✅︎
- ✅︎
- ✅︎
@ -42,7 +41,7 @@ The table below shows the compatibility of various quantization implementations
- ✅︎
- ✗
- ✗
* - Marlin (GPTQ/AWQ/FP8)
* - Marlin (GPTQ/AWQ/FP8)
- ✗
- ✗
- ✅︎
@ -53,7 +52,7 @@ The table below shows the compatibility of various quantization implementations
- ✗
- ✗
- ✗
* - INT8 (W8A8)
* - INT8 (W8A8)
- ✗
- ✅︎
- ✅︎
@ -64,7 +63,7 @@ The table below shows the compatibility of various quantization implementations
- ✅︎
- ✗
- ✗
* - FP8 (W8A8)
* - FP8 (W8A8)
- ✗
- ✗
- ✗
@ -75,7 +74,7 @@ The table below shows the compatibility of various quantization implementations
- ✗
- ✗
- ✗
* - AQLM
* - AQLM
- ✅︎
- ✅︎
- ✅︎
@ -86,7 +85,7 @@ The table below shows the compatibility of various quantization implementations
- ✗
- ✗
- ✗
* - bitsandbytes
* - bitsandbytes
- ✅︎
- ✅︎
- ✅︎
@ -97,7 +96,7 @@ The table below shows the compatibility of various quantization implementations
- ✗
- ✗
- ✗
* - DeepSpeedFP
* - DeepSpeedFP
- ✅︎
- ✅︎
- ✅︎
@ -108,7 +107,7 @@ The table below shows the compatibility of various quantization implementations
- ✗
- ✗
- ✗
* - GGUF
* - GGUF
- ✅︎
- ✅︎
- ✅︎

View File

@ -43,208 +43,207 @@ chart **including persistent volumes** and deletes the release.
## Values
```{eval-rst}
.. list-table:: Values
:widths: 25 25 25 25
:header-rows: 1
```{list-table}
:widths: 25 25 25 25
:header-rows: 1
* - Key
* - Key
- Type
- Default
- Description
* - autoscaling
* - autoscaling
- object
- {"enabled":false,"maxReplicas":100,"minReplicas":1,"targetCPUUtilizationPercentage":80}
- Autoscaling configuration
* - autoscaling.enabled
* - autoscaling.enabled
- bool
- false
- Enable autoscaling
* - autoscaling.maxReplicas
* - autoscaling.maxReplicas
- int
- 100
- Maximum replicas
* - autoscaling.minReplicas
* - autoscaling.minReplicas
- int
- 1
- Minimum replicas
* - autoscaling.targetCPUUtilizationPercentage
* - autoscaling.targetCPUUtilizationPercentage
- int
- 80
- Target CPU utilization for autoscaling
* - configs
* - configs
- object
- {}
- Configmap
* - containerPort
* - containerPort
- int
- 8000
- Container port
* - customObjects
* - customObjects
- list
- []
- Custom Objects configuration
* - deploymentStrategy
* - deploymentStrategy
- object
- {}
- Deployment strategy configuration
* - externalConfigs
* - externalConfigs
- list
- []
- External configuration
* - extraContainers
* - extraContainers
- list
- []
- Additional containers configuration
* - extraInit
* - extraInit
- object
- {"pvcStorage":"1Gi","s3modelpath":"relative_s3_model_path/opt-125m", "awsEc2MetadataDisabled": true}
- Additional configuration for the init container
* - extraInit.pvcStorage
* - extraInit.pvcStorage
- string
- "50Gi"
- Storage size of the s3
* - extraInit.s3modelpath
* - extraInit.s3modelpath
- string
- "relative_s3_model_path/opt-125m"
- Path of the model on the s3 which hosts model weights and config files
* - extraInit.awsEc2MetadataDisabled
* - extraInit.awsEc2MetadataDisabled
- boolean
- true
- Disables the use of the Amazon EC2 instance metadata service
* - extraPorts
* - extraPorts
- list
- []
- Additional ports configuration
* - gpuModels
* - gpuModels
- list
- ["TYPE_GPU_USED"]
- Type of gpu used
* - image
* - image
- object
- {"command":["vllm","serve","/data/","--served-model-name","opt-125m","--host","0.0.0.0","--port","8000"],"repository":"vllm/vllm-openai","tag":"latest"}
- Image configuration
* - image.command
* - image.command
- list
- ["vllm","serve","/data/","--served-model-name","opt-125m","--host","0.0.0.0","--port","8000"]
- Container launch command
* - image.repository
* - image.repository
- string
- "vllm/vllm-openai"
- Image repository
* - image.tag
* - image.tag
- string
- "latest"
- Image tag
* - livenessProbe
* - livenessProbe
- object
- {"failureThreshold":3,"httpGet":{"path":"/health","port":8000},"initialDelaySeconds":15,"periodSeconds":10}
- Liveness probe configuration
* - livenessProbe.failureThreshold
* - livenessProbe.failureThreshold
- int
- 3
- Number of times after which if a probe fails in a row, Kubernetes considers that the overall check has failed: the container is not alive
* - livenessProbe.httpGet
* - livenessProbe.httpGet
- object
- {"path":"/health","port":8000}
- Configuration of the Kubelet http request on the server
* - livenessProbe.httpGet.path
* - livenessProbe.httpGet.path
- string
- "/health"
- Path to access on the HTTP server
* - livenessProbe.httpGet.port
* - livenessProbe.httpGet.port
- int
- 8000
- Name or number of the port to access on the container, on which the server is listening
* - livenessProbe.initialDelaySeconds
* - livenessProbe.initialDelaySeconds
- int
- 15
- Number of seconds after the container has started before liveness probe is initiated
* - livenessProbe.periodSeconds
* - livenessProbe.periodSeconds
- int
- 10
- How often (in seconds) to perform the liveness probe
* - maxUnavailablePodDisruptionBudget
* - maxUnavailablePodDisruptionBudget
- string
- ""
- Disruption Budget Configuration
* - readinessProbe
* - readinessProbe
- object
- {"failureThreshold":3,"httpGet":{"path":"/health","port":8000},"initialDelaySeconds":5,"periodSeconds":5}
- Readiness probe configuration
* - readinessProbe.failureThreshold
* - readinessProbe.failureThreshold
- int
- 3
- Number of times after which if a probe fails in a row, Kubernetes considers that the overall check has failed: the container is not ready
* - readinessProbe.httpGet
* - readinessProbe.httpGet
- object
- {"path":"/health","port":8000}
- Configuration of the Kubelet http request on the server
* - readinessProbe.httpGet.path
* - readinessProbe.httpGet.path
- string
- "/health"
- Path to access on the HTTP server
* - readinessProbe.httpGet.port
* - readinessProbe.httpGet.port
- int
- 8000
- Name or number of the port to access on the container, on which the server is listening
* - readinessProbe.initialDelaySeconds
* - readinessProbe.initialDelaySeconds
- int
- 5
- Number of seconds after the container has started before readiness probe is initiated
* - readinessProbe.periodSeconds
* - readinessProbe.periodSeconds
- int
- 5
- How often (in seconds) to perform the readiness probe
* - replicaCount
* - replicaCount
- int
- 1
- Number of replicas
* - resources
* - resources
- object
- {"limits":{"cpu":4,"memory":"16Gi","nvidia.com/gpu":1},"requests":{"cpu":4,"memory":"16Gi","nvidia.com/gpu":1}}
- Resource configuration
* - resources.limits."nvidia.com/gpu"
* - resources.limits."nvidia.com/gpu"
- int
- 1
- Number of gpus used
* - resources.limits.cpu
* - resources.limits.cpu
- int
- 4
- Number of CPUs
* - resources.limits.memory
* - resources.limits.memory
- string
- "16Gi"
- CPU memory configuration
* - resources.requests."nvidia.com/gpu"
* - resources.requests."nvidia.com/gpu"
- int
- 1
- Number of gpus used
* - resources.requests.cpu
* - resources.requests.cpu
- int
- 4
- Number of CPUs
* - resources.requests.memory
* - resources.requests.memory
- string
- "16Gi"
- CPU memory configuration
* - secrets
* - secrets
- object
- {}
- Secrets configuration
* - serviceName
* - serviceName
- string
-
- Service name
* - servicePort
* - servicePort
- int
- 80
- Service port
* - labels.environment
* - labels.environment
- string
- test
- Environment name
* - labels.release
* - labels.release
- string
- test
- Release name