[Doc] Convert list tables to MyST (#11594)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-12-29 15:56:22 +08:00 · 2024-12-29 15:56:22 +08:00 · 32b4c63f02
commit 32b4c63f02
parent 4fb8e329fd
6 changed files with 951 additions and 965 deletions
--- a/docs/source/getting_started/debugging.md
+++ b/docs/source/getting_started/debugging.md
@ -197,4 +197,4 @@ if __name__ == '__main__':
 ## Known Issues

 - In `v0.5.2`, `v0.5.3`, and `v0.5.3.post1`, there is a bug caused by [zmq](https://github.com/zeromq/pyzmq/issues/2000) , which can occasionally cause vLLM to hang depending on the machine configuration. The solution is to upgrade to the latest version of `vllm` to include the [fix](gh-pr:6759).
- To circumvent a NCCL [bug](https://github.com/NVIDIA/nccl/issues/1234) , all vLLM processes will set an environment variable ``NCCL_CUMEM_ENABLE=0`` to disable NCCL's ``cuMem`` allocator. It does not affect performance but only gives memory benefits. When external processes want to set up a NCCL connection with vLLM's processes, they should also set this environment variable, otherwise, inconsistent environment setup will cause NCCL to hang or crash, as observed in the [RLHF integration](https://github.com/OpenRLHF/OpenRLHF/pull/604) and the [discussion](gh-issue:5723#issuecomment-2554389656) .
+- To circumvent a NCCL [bug](https://github.com/NVIDIA/nccl/issues/1234) , all vLLM processes will set an environment variable `NCCL_CUMEM_ENABLE=0` to disable NCCL's `cuMem` allocator. It does not affect performance but only gives memory benefits. When external processes want to set up a NCCL connection with vLLM's processes, they should also set this environment variable, otherwise, inconsistent environment setup will cause NCCL to hang or crash, as observed in the [RLHF integration](https://github.com/OpenRLHF/OpenRLHF/pull/604) and the [discussion](gh-issue:5723#issuecomment-2554389656) .
--- a/docs/source/getting_started/gaudi-installation.md
+++ b/docs/source/getting_started/gaudi-installation.md
@ -141,24 +141,23 @@ Gaudi2 devices. Configurations that are not listed may or may not work.

 Currently in vLLM for HPU we support four execution modes, depending on selected HPU PyTorch Bridge backend (via `PT_HPU_LAZY_MODE` environment variable), and `--enforce-eager` flag.

-```{eval-rst}
-.. list-table:: vLLM execution modes
-   :widths: 25 25 50
-   :header-rows: 1
+```{list-table} vLLM execution modes
+:widths: 25 25 50
+:header-rows: 1

-   * - ``PT_HPU_LAZY_MODE``
-     - ``enforce_eager``
+* - `PT_HPU_LAZY_MODE`
+  - `enforce_eager`
  - execution mode
-   * - 0
+* - 0
  - 0
  - torch.compile
-   * - 0
+* - 0
  - 1
  - PyTorch eager mode
-   * - 1
+* - 1
  - 0
  - HPU Graphs
-   * - 1
+* - 1
  - 1
  - PyTorch lazy mode
 ```
--- a/docs/source/getting_started/tpu-installation.md
+++ b/docs/source/getting_started/tpu-installation.md
@ -68,30 +68,29 @@ gcloud alpha compute tpus queued-resources create QUEUED_RESOURCE_ID \
 --service-account SERVICE_ACCOUNT
 ```

-```{eval-rst}
-.. list-table:: Parameter descriptions
-    :header-rows: 1
+```{list-table} Parameter descriptions
+:header-rows: 1

-    * - Parameter name
+* - Parameter name
  - Description
-    * - QUEUED_RESOURCE_ID
+* - QUEUED_RESOURCE_ID
  - The user-assigned ID of the queued resource request.
-    * - TPU_NAME
+* - TPU_NAME
  - The user-assigned name of the TPU which is created when the queued
    resource request is allocated.
-    * - PROJECT_ID
+* - PROJECT_ID
  - Your Google Cloud project
-    * - ZONE
+* - ZONE
  - The GCP zone where you want to create your Cloud TPU. The value you use
    depends on the version of TPUs you are using. For more information, see
    `TPU regions and zones <https://cloud.google.com/tpu/docs/regions-zones>`_
-    * - ACCELERATOR_TYPE
+* - ACCELERATOR_TYPE
  - The TPU version you want to use. Specify the TPU version, for example
    `v5litepod-4` specifies a v5e TPU with 4 cores. For more information,
    see `TPU versions <https://cloud.devsite.corp.google.com/tpu/docs/system-architecture-tpu-vm#versions>`_.
-    * - RUNTIME_VERSION
+* - RUNTIME_VERSION
  - The TPU VM runtime version to use. For more information see `TPU VM images <https://cloud.google.com/tpu/docs/runtimes>`_.
-    * - SERVICE_ACCOUNT
+* - SERVICE_ACCOUNT
  - The email address for your service account. You can find it in the IAM
    Cloud Console under *Service Accounts*. For example:
    `tpu-service-account@<your_project_ID>.iam.gserviceaccount.com`
--- a/docs/source/models/supported_models.md
+++ b/docs/source/models/supported_models.md
@ -72,289 +72,288 @@ See [this page](#generative-models) for more information on how to use generativ

 #### Text Generation (`--task generate`)

-```{eval-rst}
-.. list-table::
-  :widths: 25 25 50 5 5
-  :header-rows: 1
+```{list-table}
+:widths: 25 25 50 5 5
+:header-rows: 1

-  * - Architecture
+* - Architecture
  - Models
  - Example HF Models
-    - :ref:`LoRA <lora-adapter>`
-    - :ref:`PP <distributed-serving>`
-  * - :code:`AquilaForCausalLM`
+  - [LoRA](#lora-adapter)
+  - [PP](#distributed-serving)
+* - `AquilaForCausalLM`
  - Aquila, Aquila2
-    - :code:`BAAI/Aquila-7B`, :code:`BAAI/AquilaChat-7B`, etc.
+  - `BAAI/Aquila-7B`, `BAAI/AquilaChat-7B`, etc.
  - ✅︎
  - ✅︎
-  * - :code:`ArcticForCausalLM`
+* - `ArcticForCausalLM`
  - Arctic
-    - :code:`Snowflake/snowflake-arctic-base`, :code:`Snowflake/snowflake-arctic-instruct`, etc.
+  - `Snowflake/snowflake-arctic-base`, `Snowflake/snowflake-arctic-instruct`, etc.
  -
  - ✅︎
-  * - :code:`BaiChuanForCausalLM`
+* - `BaiChuanForCausalLM`
  - Baichuan2, Baichuan
-    - :code:`baichuan-inc/Baichuan2-13B-Chat`, :code:`baichuan-inc/Baichuan-7B`, etc.
+  - `baichuan-inc/Baichuan2-13B-Chat`, `baichuan-inc/Baichuan-7B`, etc.
  - ✅︎
  - ✅︎
-  * - :code:`BloomForCausalLM`
+* - `BloomForCausalLM`
  - BLOOM, BLOOMZ, BLOOMChat
-    - :code:`bigscience/bloom`, :code:`bigscience/bloomz`, etc.
+  - `bigscience/bloom`, `bigscience/bloomz`, etc.
  -
  - ✅︎
-  * - :code:`BartForConditionalGeneration`
+* - `BartForConditionalGeneration`
  - BART
-    - :code:`facebook/bart-base`, :code:`facebook/bart-large-cnn`, etc.
+  - `facebook/bart-base`, `facebook/bart-large-cnn`, etc.
  -
  -
-  * - :code:`ChatGLMModel`
+* - `ChatGLMModel`
  - ChatGLM
-    - :code:`THUDM/chatglm2-6b`, :code:`THUDM/chatglm3-6b`, etc.
+  - `THUDM/chatglm2-6b`, `THUDM/chatglm3-6b`, etc.
  - ✅︎
  - ✅︎
-  * - :code:`CohereForCausalLM`, :code:`Cohere2ForCausalLM`
+* - `CohereForCausalLM`, `Cohere2ForCausalLM`
  - Command-R
-    - :code:`CohereForAI/c4ai-command-r-v01`, :code:`CohereForAI/c4ai-command-r7b-12-2024`, etc.
+  - `CohereForAI/c4ai-command-r-v01`, `CohereForAI/c4ai-command-r7b-12-2024`, etc.
  - ✅︎
  - ✅︎
-  * - :code:`DbrxForCausalLM`
+* - `DbrxForCausalLM`
  - DBRX
-    - :code:`databricks/dbrx-base`, :code:`databricks/dbrx-instruct`, etc.
+  - `databricks/dbrx-base`, `databricks/dbrx-instruct`, etc.
  -
  - ✅︎
-  * - :code:`DeciLMForCausalLM`
+* - `DeciLMForCausalLM`
  - DeciLM
-    - :code:`Deci/DeciLM-7B`, :code:`Deci/DeciLM-7B-instruct`, etc.
+  - `Deci/DeciLM-7B`, `Deci/DeciLM-7B-instruct`, etc.
  -
  - ✅︎
-  * - :code:`DeepseekForCausalLM`
+* - `DeepseekForCausalLM`
  - DeepSeek
-    - :code:`deepseek-ai/deepseek-llm-67b-base`, :code:`deepseek-ai/deepseek-llm-7b-chat` etc.
+  - `deepseek-ai/deepseek-llm-67b-base`, `deepseek-ai/deepseek-llm-7b-chat` etc.
  -
  - ✅︎
-  * - :code:`DeepseekV2ForCausalLM`
+* - `DeepseekV2ForCausalLM`
  - DeepSeek-V2
-    - :code:`deepseek-ai/DeepSeek-V2`, :code:`deepseek-ai/DeepSeek-V2-Chat` etc.
+  - `deepseek-ai/DeepSeek-V2`, `deepseek-ai/DeepSeek-V2-Chat` etc.
  -
  - ✅︎
-  * - :code:`DeepseekV3ForCausalLM`
+* - `DeepseekV3ForCausalLM`
  - DeepSeek-V3
-    - :code:`deepseek-ai/DeepSeek-V3-Base`, :code:`deepseek-ai/DeepSeek-V3` etc.
+  - `deepseek-ai/DeepSeek-V3-Base`, `deepseek-ai/DeepSeek-V3` etc.
  -
  - ✅︎
-  * - :code:`ExaoneForCausalLM`
+* - `ExaoneForCausalLM`
  - EXAONE-3
-    - :code:`LGAI-EXAONE/EXAONE-3.0-7.8B-Instruct`, etc.
+  - `LGAI-EXAONE/EXAONE-3.0-7.8B-Instruct`, etc.
  - ✅︎
  - ✅︎
-  * - :code:`FalconForCausalLM`
+* - `FalconForCausalLM`
  - Falcon
-    - :code:`tiiuae/falcon-7b`, :code:`tiiuae/falcon-40b`, :code:`tiiuae/falcon-rw-7b`, etc.
+  - `tiiuae/falcon-7b`, `tiiuae/falcon-40b`, `tiiuae/falcon-rw-7b`, etc.
  -
  - ✅︎
-  * - :code:`FalconMambaForCausalLM`
+* - `FalconMambaForCausalLM`
  - FalconMamba
-    - :code:`tiiuae/falcon-mamba-7b`, :code:`tiiuae/falcon-mamba-7b-instruct`, etc.
+  - `tiiuae/falcon-mamba-7b`, `tiiuae/falcon-mamba-7b-instruct`, etc.
  - ✅︎
  - ✅︎
-  * - :code:`GemmaForCausalLM`
+* - `GemmaForCausalLM`
  - Gemma
-    - :code:`google/gemma-2b`, :code:`google/gemma-7b`, etc.
+  - `google/gemma-2b`, `google/gemma-7b`, etc.
  - ✅︎
  - ✅︎
-  * - :code:`Gemma2ForCausalLM`
+* - `Gemma2ForCausalLM`
  - Gemma2
-    - :code:`google/gemma-2-9b`, :code:`google/gemma-2-27b`, etc.
+  - `google/gemma-2-9b`, `google/gemma-2-27b`, etc.
  - ✅︎
  - ✅︎
-  * - :code:`GlmForCausalLM`
+* - `GlmForCausalLM`
  - GLM-4
-    - :code:`THUDM/glm-4-9b-chat-hf`, etc.
+  - `THUDM/glm-4-9b-chat-hf`, etc.
  - ✅︎
  - ✅︎
-  * - :code:`GPT2LMHeadModel`
+* - `GPT2LMHeadModel`
  - GPT-2
-    - :code:`gpt2`, :code:`gpt2-xl`, etc.
+  - `gpt2`, `gpt2-xl`, etc.
  -
  - ✅︎
-  * - :code:`GPTBigCodeForCausalLM`
+* - `GPTBigCodeForCausalLM`
  - StarCoder, SantaCoder, WizardCoder
-    - :code:`bigcode/starcoder`, :code:`bigcode/gpt_bigcode-santacoder`, :code:`WizardLM/WizardCoder-15B-V1.0`, etc.
+  - `bigcode/starcoder`, `bigcode/gpt_bigcode-santacoder`, `WizardLM/WizardCoder-15B-V1.0`, etc.
  - ✅︎
  - ✅︎
-  * - :code:`GPTJForCausalLM`
+* - `GPTJForCausalLM`
  - GPT-J
-    - :code:`EleutherAI/gpt-j-6b`, :code:`nomic-ai/gpt4all-j`, etc.
+  - `EleutherAI/gpt-j-6b`, `nomic-ai/gpt4all-j`, etc.
  -
  - ✅︎
-  * - :code:`GPTNeoXForCausalLM`
+* - `GPTNeoXForCausalLM`
  - GPT-NeoX, Pythia, OpenAssistant, Dolly V2, StableLM
-    - :code:`EleutherAI/gpt-neox-20b`, :code:`EleutherAI/pythia-12b`, :code:`OpenAssistant/oasst-sft-4-pythia-12b-epoch-3.5`, :code:`databricks/dolly-v2-12b`, :code:`stabilityai/stablelm-tuned-alpha-7b`, etc.
+  - `EleutherAI/gpt-neox-20b`, `EleutherAI/pythia-12b`, `OpenAssistant/oasst-sft-4-pythia-12b-epoch-3.5`, `databricks/dolly-v2-12b`, `stabilityai/stablelm-tuned-alpha-7b`, etc.
  -
  - ✅︎
-  * - :code:`GraniteForCausalLM`
+* - `GraniteForCausalLM`
  - Granite 3.0, Granite 3.1, PowerLM
-    - :code:`ibm-granite/granite-3.0-2b-base`, :code:`ibm-granite/granite-3.1-8b-instruct`, :code:`ibm/PowerLM-3b`, etc.
+  - `ibm-granite/granite-3.0-2b-base`, `ibm-granite/granite-3.1-8b-instruct`, `ibm/PowerLM-3b`, etc.
  - ✅︎
  - ✅︎
-  * - :code:`GraniteMoeForCausalLM`
+* - `GraniteMoeForCausalLM`
  - Granite 3.0 MoE, PowerMoE
-    - :code:`ibm-granite/granite-3.0-1b-a400m-base`, :code:`ibm-granite/granite-3.0-3b-a800m-instruct`, :code:`ibm/PowerMoE-3b`, etc.
+  - `ibm-granite/granite-3.0-1b-a400m-base`, `ibm-granite/granite-3.0-3b-a800m-instruct`, `ibm/PowerMoE-3b`, etc.
  - ✅︎
  - ✅︎
-  * - :code:`GritLM`
+* - `GritLM`
  - GritLM
-    - :code:`parasail-ai/GritLM-7B-vllm`.
+  - `parasail-ai/GritLM-7B-vllm`.
  - ✅︎
  - ✅︎
-  * - :code:`InternLMForCausalLM`
+* - `InternLMForCausalLM`
  - InternLM
-    - :code:`internlm/internlm-7b`, :code:`internlm/internlm-chat-7b`, etc.
+  - `internlm/internlm-7b`, `internlm/internlm-chat-7b`, etc.
  - ✅︎
  - ✅︎
-  * - :code:`InternLM2ForCausalLM`
+* - `InternLM2ForCausalLM`
  - InternLM2
-    - :code:`internlm/internlm2-7b`, :code:`internlm/internlm2-chat-7b`, etc.
+  - `internlm/internlm2-7b`, `internlm/internlm2-chat-7b`, etc.
  - ✅︎
  - ✅︎
-  * - :code:`JAISLMHeadModel`
+* - `JAISLMHeadModel`
  - Jais
-    - :code:`inceptionai/jais-13b`, :code:`inceptionai/jais-13b-chat`, :code:`inceptionai/jais-30b-v3`, :code:`inceptionai/jais-30b-chat-v3`, etc.
+  - `inceptionai/jais-13b`, `inceptionai/jais-13b-chat`, `inceptionai/jais-30b-v3`, `inceptionai/jais-30b-chat-v3`, etc.
  -
  - ✅︎
-  * - :code:`JambaForCausalLM`
+* - `JambaForCausalLM`
  - Jamba
-    - :code:`ai21labs/AI21-Jamba-1.5-Large`, :code:`ai21labs/AI21-Jamba-1.5-Mini`, :code:`ai21labs/Jamba-v0.1`, etc.
+  - `ai21labs/AI21-Jamba-1.5-Large`, `ai21labs/AI21-Jamba-1.5-Mini`, `ai21labs/Jamba-v0.1`, etc.
  - ✅︎
  - ✅︎
-  * - :code:`LlamaForCausalLM`
+* - `LlamaForCausalLM`
  - Llama 3.1, Llama 3, Llama 2, LLaMA, Yi
-    - :code:`meta-llama/Meta-Llama-3.1-405B-Instruct`, :code:`meta-llama/Meta-Llama-3.1-70B`, :code:`meta-llama/Meta-Llama-3-70B-Instruct`, :code:`meta-llama/Llama-2-70b-hf`, :code:`01-ai/Yi-34B`, etc.
+  - `meta-llama/Meta-Llama-3.1-405B-Instruct`, `meta-llama/Meta-Llama-3.1-70B`, `meta-llama/Meta-Llama-3-70B-Instruct`, `meta-llama/Llama-2-70b-hf`, `01-ai/Yi-34B`, etc.
  - ✅︎
  - ✅︎
-  * - :code:`MambaForCausalLM`
+* - `MambaForCausalLM`
  - Mamba
-    - :code:`state-spaces/mamba-130m-hf`, :code:`state-spaces/mamba-790m-hf`, :code:`state-spaces/mamba-2.8b-hf`, etc.
+  - `state-spaces/mamba-130m-hf`, `state-spaces/mamba-790m-hf`, `state-spaces/mamba-2.8b-hf`, etc.
  -
  - ✅︎
-  * - :code:`MiniCPMForCausalLM`
+* - `MiniCPMForCausalLM`
  - MiniCPM
-    - :code:`openbmb/MiniCPM-2B-sft-bf16`, :code:`openbmb/MiniCPM-2B-dpo-bf16`, :code:`openbmb/MiniCPM-S-1B-sft`, etc.
+  - `openbmb/MiniCPM-2B-sft-bf16`, `openbmb/MiniCPM-2B-dpo-bf16`, `openbmb/MiniCPM-S-1B-sft`, etc.
  - ✅︎
  - ✅︎
-  * - :code:`MiniCPM3ForCausalLM`
+* - `MiniCPM3ForCausalLM`
  - MiniCPM3
-    - :code:`openbmb/MiniCPM3-4B`, etc.
+  - `openbmb/MiniCPM3-4B`, etc.
  - ✅︎
  - ✅︎
-  * - :code:`MistralForCausalLM`
+* - `MistralForCausalLM`
  - Mistral, Mistral-Instruct
-    - :code:`mistralai/Mistral-7B-v0.1`, :code:`mistralai/Mistral-7B-Instruct-v0.1`, etc.
+  - `mistralai/Mistral-7B-v0.1`, `mistralai/Mistral-7B-Instruct-v0.1`, etc.
  - ✅︎
  - ✅︎
-  * - :code:`MixtralForCausalLM`
+* - `MixtralForCausalLM`
  - Mixtral-8x7B, Mixtral-8x7B-Instruct
-    - :code:`mistralai/Mixtral-8x7B-v0.1`, :code:`mistralai/Mixtral-8x7B-Instruct-v0.1`, :code:`mistral-community/Mixtral-8x22B-v0.1`, etc.
+  - `mistralai/Mixtral-8x7B-v0.1`, `mistralai/Mixtral-8x7B-Instruct-v0.1`, `mistral-community/Mixtral-8x22B-v0.1`, etc.
  - ✅︎
  - ✅︎
-  * - :code:`MPTForCausalLM`
+* - `MPTForCausalLM`
  - MPT, MPT-Instruct, MPT-Chat, MPT-StoryWriter
-    - :code:`mosaicml/mpt-7b`, :code:`mosaicml/mpt-7b-storywriter`, :code:`mosaicml/mpt-30b`, etc.
+  - `mosaicml/mpt-7b`, `mosaicml/mpt-7b-storywriter`, `mosaicml/mpt-30b`, etc.
  -
  - ✅︎
-  * - :code:`NemotronForCausalLM`
+* - `NemotronForCausalLM`
  - Nemotron-3, Nemotron-4, Minitron
-    - :code:`nvidia/Minitron-8B-Base`, :code:`mgoin/Nemotron-4-340B-Base-hf-FP8`, etc.
+  - `nvidia/Minitron-8B-Base`, `mgoin/Nemotron-4-340B-Base-hf-FP8`, etc.
  - ✅︎
  - ✅︎
-  * - :code:`OLMoForCausalLM`
+* - `OLMoForCausalLM`
  - OLMo
-    - :code:`allenai/OLMo-1B-hf`, :code:`allenai/OLMo-7B-hf`, etc.
+  - `allenai/OLMo-1B-hf`, `allenai/OLMo-7B-hf`, etc.
  -
  - ✅︎
-  * - :code:`OLMo2ForCausalLM`
+* - `OLMo2ForCausalLM`
  - OLMo2
-    - :code:`allenai/OLMo2-7B-1124`, etc.
+  - `allenai/OLMo2-7B-1124`, etc.
  -
  - ✅︎
-  * - :code:`OLMoEForCausalLM`
+* - `OLMoEForCausalLM`
  - OLMoE
-    - :code:`allenai/OLMoE-1B-7B-0924`, :code:`allenai/OLMoE-1B-7B-0924-Instruct`, etc.
+  - `allenai/OLMoE-1B-7B-0924`, `allenai/OLMoE-1B-7B-0924-Instruct`, etc.
  - ✅︎
  - ✅︎
-  * - :code:`OPTForCausalLM`
+* - `OPTForCausalLM`
  - OPT, OPT-IML
-    - :code:`facebook/opt-66b`, :code:`facebook/opt-iml-max-30b`, etc.
+  - `facebook/opt-66b`, `facebook/opt-iml-max-30b`, etc.
  -
  - ✅︎
-  * - :code:`OrionForCausalLM`
+* - `OrionForCausalLM`
  - Orion
-    - :code:`OrionStarAI/Orion-14B-Base`, :code:`OrionStarAI/Orion-14B-Chat`, etc.
+  - `OrionStarAI/Orion-14B-Base`, `OrionStarAI/Orion-14B-Chat`, etc.
  -
  - ✅︎
-  * - :code:`PhiForCausalLM`
+* - `PhiForCausalLM`
  - Phi
-    - :code:`microsoft/phi-1_5`, :code:`microsoft/phi-2`, etc.
+  - `microsoft/phi-1_5`, `microsoft/phi-2`, etc.
  - ✅︎
  - ✅︎
-  * - :code:`Phi3ForCausalLM`
+* - `Phi3ForCausalLM`
  - Phi-3
-    - :code:`microsoft/Phi-3-mini-4k-instruct`, :code:`microsoft/Phi-3-mini-128k-instruct`, :code:`microsoft/Phi-3-medium-128k-instruct`, etc.
+  - `microsoft/Phi-3-mini-4k-instruct`, `microsoft/Phi-3-mini-128k-instruct`, `microsoft/Phi-3-medium-128k-instruct`, etc.
  - ✅︎
  - ✅︎
-  * - :code:`Phi3SmallForCausalLM`
+* - `Phi3SmallForCausalLM`
  - Phi-3-Small
-    - :code:`microsoft/Phi-3-small-8k-instruct`, :code:`microsoft/Phi-3-small-128k-instruct`, etc.
+  - `microsoft/Phi-3-small-8k-instruct`, `microsoft/Phi-3-small-128k-instruct`, etc.
  -
  - ✅︎
-  * - :code:`PhiMoEForCausalLM`
+* - `PhiMoEForCausalLM`
  - Phi-3.5-MoE
-    - :code:`microsoft/Phi-3.5-MoE-instruct`, etc.
+  - `microsoft/Phi-3.5-MoE-instruct`, etc.
  - ✅︎
  - ✅︎
-  * - :code:`PersimmonForCausalLM`
+* - `PersimmonForCausalLM`
  - Persimmon
-    - :code:`adept/persimmon-8b-base`, :code:`adept/persimmon-8b-chat`, etc.
+  - `adept/persimmon-8b-base`, `adept/persimmon-8b-chat`, etc.
  -
  - ✅︎
-  * - :code:`QWenLMHeadModel`
+* - `QWenLMHeadModel`
  - Qwen
-    - :code:`Qwen/Qwen-7B`, :code:`Qwen/Qwen-7B-Chat`, etc.
+  - `Qwen/Qwen-7B`, `Qwen/Qwen-7B-Chat`, etc.
  - ✅︎
  - ✅︎
-  * - :code:`Qwen2ForCausalLM`
+* - `Qwen2ForCausalLM`
  - Qwen2
-    - :code:`Qwen/QwQ-32B-Preview`, :code:`Qwen/Qwen2-7B-Instruct`, :code:`Qwen/Qwen2-7B`, etc.
+  - `Qwen/QwQ-32B-Preview`, `Qwen/Qwen2-7B-Instruct`, `Qwen/Qwen2-7B`, etc.
  - ✅︎
  - ✅︎
-  * - :code:`Qwen2MoeForCausalLM`
+* - `Qwen2MoeForCausalLM`
  - Qwen2MoE
-    - :code:`Qwen/Qwen1.5-MoE-A2.7B`, :code:`Qwen/Qwen1.5-MoE-A2.7B-Chat`, etc.
+  - `Qwen/Qwen1.5-MoE-A2.7B`, `Qwen/Qwen1.5-MoE-A2.7B-Chat`, etc.
  -
  - ✅︎
-  * - :code:`StableLmForCausalLM`
+* - `StableLmForCausalLM`
  - StableLM
-    - :code:`stabilityai/stablelm-3b-4e1t`, :code:`stabilityai/stablelm-base-alpha-7b-v2`, etc.
+  - `stabilityai/stablelm-3b-4e1t`, `stabilityai/stablelm-base-alpha-7b-v2`, etc.
  -
  - ✅︎
-  * - :code:`Starcoder2ForCausalLM`
+* - `Starcoder2ForCausalLM`
  - Starcoder2
-    - :code:`bigcode/starcoder2-3b`, :code:`bigcode/starcoder2-7b`, :code:`bigcode/starcoder2-15b`, etc.
+  - `bigcode/starcoder2-3b`, `bigcode/starcoder2-7b`, `bigcode/starcoder2-15b`, etc.
  -
  - ✅︎
-  * - :code:`SolarForCausalLM`
+* - `SolarForCausalLM`
  - Solar Pro
-    - :code:`upstage/solar-pro-preview-instruct`, etc.
+  - `upstage/solar-pro-preview-instruct`, etc.
  - ✅︎
  - ✅︎
-  * - :code:`TeleChat2ForCausalLM`
+* - `TeleChat2ForCausalLM`
  - TeleChat2
-    - :code:`TeleAI/TeleChat2-3B`, :code:`TeleAI/TeleChat2-7B`, :code:`TeleAI/TeleChat2-35B`, etc.
+  - `TeleAI/TeleChat2-3B`, `TeleAI/TeleChat2-7B`, `TeleAI/TeleChat2-35B`, etc.
  - ✅︎
  - ✅︎
-  * - :code:`XverseForCausalLM`
+* - `XverseForCausalLM`
  - XVERSE
-    - :code:`xverse/XVERSE-7B-Chat`, :code:`xverse/XVERSE-13B-Chat`, :code:`xverse/XVERSE-65B-Chat`, etc.
+  - `xverse/XVERSE-7B-Chat`, `xverse/XVERSE-13B-Chat`, `xverse/XVERSE-65B-Chat`, etc.
  - ✅︎
  - ✅︎
 ```
@ -374,49 +373,48 @@ you should explicitly specify the task type to ensure that the model is used in

 #### Text Embedding (`--task embed`)

-```{eval-rst}
-.. list-table::
-  :widths: 25 25 50 5 5
-  :header-rows: 1
+```{list-table}
+:widths: 25 25 50 5 5
+:header-rows: 1

-  * - Architecture
+* - Architecture
  - Models
  - Example HF Models
-    - :ref:`LoRA <lora-adapter>`
-    - :ref:`PP <distributed-serving>`
-  * - :code:`BertModel`
+  - [LoRA](#lora-adapter)
+  - [PP](#distributed-serving)
+* - `BertModel`
  - BERT-based
-    - :code:`BAAI/bge-base-en-v1.5`, etc.
+  - `BAAI/bge-base-en-v1.5`, etc.
  -
  -
-  * - :code:`Gemma2Model`
+* - `Gemma2Model`
  - Gemma2-based
-    - :code:`BAAI/bge-multilingual-gemma2`, etc.
+  - `BAAI/bge-multilingual-gemma2`, etc.
  -
  - ✅︎
-  * - :code:`GritLM`
+* - `GritLM`
  - GritLM
-    - :code:`parasail-ai/GritLM-7B-vllm`.
+  - `parasail-ai/GritLM-7B-vllm`.
  - ✅︎
  - ✅︎
-  * - :code:`LlamaModel`, :code:`LlamaForCausalLM`, :code:`MistralModel`, etc.
+* - `LlamaModel`, `LlamaForCausalLM`, `MistralModel`, etc.
  - Llama-based
-    - :code:`intfloat/e5-mistral-7b-instruct`, etc.
+  - `intfloat/e5-mistral-7b-instruct`, etc.
  - ✅︎
  - ✅︎
-  * - :code:`Qwen2Model`, :code:`Qwen2ForCausalLM`
+* - `Qwen2Model`, `Qwen2ForCausalLM`
  - Qwen2-based
-    - :code:`ssmits/Qwen2-7B-Instruct-embed-base` (see note), :code:`Alibaba-NLP/gte-Qwen2-7B-instruct` (see note), etc.
+  - `ssmits/Qwen2-7B-Instruct-embed-base` (see note), `Alibaba-NLP/gte-Qwen2-7B-instruct` (see note), etc.
  - ✅︎
  - ✅︎
-  * - :code:`RobertaModel`, :code:`RobertaForMaskedLM`
+* - `RobertaModel`, `RobertaForMaskedLM`
  - RoBERTa-based
-    - :code:`sentence-transformers/all-roberta-large-v1`, :code:`sentence-transformers/all-roberta-large-v1`, etc.
+  - `sentence-transformers/all-roberta-large-v1`, `sentence-transformers/all-roberta-large-v1`, etc.
  -
  -
-  * - :code:`XLMRobertaModel`
+* - `XLMRobertaModel`
  - XLM-RoBERTa-based
-    - :code:`intfloat/multilingual-e5-large`, etc.
+  - `intfloat/multilingual-e5-large`, etc.
  -
  -
 ```
@ -440,29 +438,28 @@ of the whole prompt are extracted from the normalized hidden state corresponding

 #### Reward Modeling (`--task reward`)

-```{eval-rst}
-.. list-table::
-  :widths: 25 25 50 5 5
-  :header-rows: 1
+```{list-table}
+:widths: 25 25 50 5 5
+:header-rows: 1

-  * - Architecture
+* - Architecture
  - Models
  - Example HF Models
-    - :ref:`LoRA <lora-adapter>`
-    - :ref:`PP <distributed-serving>`
-  * - :code:`InternLM2ForRewardModel`
+  - [LoRA](#lora-adapter)
+  - [PP](#distributed-serving)
+* - `InternLM2ForRewardModel`
  - InternLM2-based
-    - :code:`internlm/internlm2-1_8b-reward`, :code:`internlm/internlm2-7b-reward`, etc.
+  - `internlm/internlm2-1_8b-reward`, `internlm/internlm2-7b-reward`, etc.
  - ✅︎
  - ✅︎
-  * - :code:`LlamaForCausalLM`
+* - `LlamaForCausalLM`
  - Llama-based
-    - :code:`peiyi9979/math-shepherd-mistral-7b-prm`, etc.
+  - `peiyi9979/math-shepherd-mistral-7b-prm`, etc.
  - ✅︎
  - ✅︎
-  * - :code:`Qwen2ForRewardModel`
+* - `Qwen2ForRewardModel`
  - Qwen2-based
-    - :code:`Qwen/Qwen2.5-Math-RM-72B`, etc.
+  - `Qwen/Qwen2.5-Math-RM-72B`, etc.
  - ✅︎
  - ✅︎
 ```
@ -477,24 +474,23 @@ e.g.: {code}`--override-pooler-config '{"pooling_type": "STEP", "step_tag_id": 1

 #### Classification (`--task classify`)

-```{eval-rst}
-.. list-table::
-  :widths: 25 25 50 5 5
-  :header-rows: 1
+```{list-table}
+:widths: 25 25 50 5 5
+:header-rows: 1

-  * - Architecture
+* - Architecture
  - Models
  - Example HF Models
-    - :ref:`LoRA <lora-adapter>`
-    - :ref:`PP <distributed-serving>`
-  * - :code:`JambaForSequenceClassification`
+  - [LoRA](#lora-adapter)
+  - [PP](#distributed-serving)
+* - `JambaForSequenceClassification`
  - Jamba
-    - :code:`ai21labs/Jamba-tiny-reward-dev`, etc.
+  - `ai21labs/Jamba-tiny-reward-dev`, etc.
  - ✅︎
  - ✅︎
-  * - :code:`Qwen2ForSequenceClassification`
+* - `Qwen2ForSequenceClassification`
  - Qwen2-based
-    - :code:`jason9693/Qwen2.5-1.5B-apeach`, etc.
+  - `jason9693/Qwen2.5-1.5B-apeach`, etc.
  - ✅︎
  - ✅︎
 ```
@ -504,29 +500,28 @@ If your model is not in the above list, we will try to automatically convert the

 #### Sentence Pair Scoring (`--task score`)

-```{eval-rst}
-.. list-table::
-  :widths: 25 25 50 5 5
-  :header-rows: 1
+```{list-table}
+:widths: 25 25 50 5 5
+:header-rows: 1

-  * - Architecture
+* - Architecture
  - Models
  - Example HF Models
-    - :ref:`LoRA <lora-adapter>`
-    - :ref:`PP <distributed-serving>`
-  * - :code:`BertForSequenceClassification`
+  - [LoRA](#lora-adapter)
+  - [PP](#distributed-serving)
+* - `BertForSequenceClassification`
  - BERT-based
-    - :code:`cross-encoder/ms-marco-MiniLM-L-6-v2`, etc.
+  - `cross-encoder/ms-marco-MiniLM-L-6-v2`, etc.
  -
  -
-  * - :code:`RobertaForSequenceClassification`
+* - `RobertaForSequenceClassification`
  - RoBERTa-based
-    - :code:`cross-encoder/quora-roberta-base`, etc.
+  - `cross-encoder/quora-roberta-base`, etc.
  -
  -
-  * - :code:`XLMRobertaForSequenceClassification`
+* - `XLMRobertaForSequenceClassification`
  - XLM-RoBERTa-based
-    - :code:`BAAI/bge-reranker-v2-m3`, etc.
+  - `BAAI/bge-reranker-v2-m3`, etc.
  -
  -
 ```
@ -558,186 +553,182 @@ See [this page](#generative-models) for more information on how to use generativ

 #### Text Generation (`--task generate`)

-```{eval-rst}
-.. list-table::
-  :widths: 25 25 15 20 5 5 5
-  :header-rows: 1
+```{list-table}
+:widths: 25 25 15 20 5 5 5
+:header-rows: 1

-  * - Architecture
+* - Architecture
  - Models
  - Inputs
  - Example HF Models
-    - :ref:`LoRA <lora-adapter>`
-    - :ref:`PP <distributed-serving>`
-    - V1
-  * - :code:`AriaForConditionalGeneration`
+  - [LoRA](#lora-adapter)
+  - [PP](#distributed-serving)
+  - [V1](gh-issue:8779)
+* - `AriaForConditionalGeneration`
  - Aria
  - T + I
-    - :code:`rhymes-ai/Aria`
+  - `rhymes-ai/Aria`
  -
  - ✅︎
  -
-  * - :code:`Blip2ForConditionalGeneration`
+* - `Blip2ForConditionalGeneration`
  - BLIP-2
-    - T + I\ :sup:`E`
-    - :code:`Salesforce/blip2-opt-2.7b`, :code:`Salesforce/blip2-opt-6.7b`, etc.
+  - T + I<sup>E</sup>
+  - `Salesforce/blip2-opt-2.7b`, `Salesforce/blip2-opt-6.7b`, etc.
  -
  - ✅︎
  -
-  * - :code:`ChameleonForConditionalGeneration`
+* - `ChameleonForConditionalGeneration`
  - Chameleon
  - T + I
-    - :code:`facebook/chameleon-7b` etc.
+  - `facebook/chameleon-7b` etc.
  -
  - ✅︎
  -
-  * - :code:`FuyuForCausalLM`
+* - `FuyuForCausalLM`
  - Fuyu
  - T + I
-    - :code:`adept/fuyu-8b` etc.
+  - `adept/fuyu-8b` etc.
  -
  - ✅︎
  -
-  * - :code:`ChatGLMModel`
+* - `ChatGLMModel`
  - GLM-4V
  - T + I
-    - :code:`THUDM/glm-4v-9b` etc.
+  - `THUDM/glm-4v-9b` etc.
  - ✅︎
  - ✅︎
  -
-  * - :code:`H2OVLChatModel`
+* - `H2OVLChatModel`
  - H2OVL
-    - T + I\ :sup:`E+`
-    - :code:`h2oai/h2ovl-mississippi-800m`, :code:`h2oai/h2ovl-mississippi-2b`, etc.
+  - T + I<sup>E+</sup>
+  - `h2oai/h2ovl-mississippi-800m`, `h2oai/h2ovl-mississippi-2b`, etc.
  -
  - ✅︎
  -
-  * - :code:`Idefics3ForConditionalGeneration`
+* - `Idefics3ForConditionalGeneration`
  - Idefics3
  - T + I
-    - :code:`HuggingFaceM4/Idefics3-8B-Llama3` etc.
+  - `HuggingFaceM4/Idefics3-8B-Llama3` etc.
  - ✅︎
  -
  -
-  * - :code:`InternVLChatModel`
+* - `InternVLChatModel`
  - InternVL 2.5, Mono-InternVL, InternVL 2.0
-    - T + I\ :sup:`E+`
-    - :code:`OpenGVLab/InternVL2_5-4B`, :code:`OpenGVLab/Mono-InternVL-2B`, :code:`OpenGVLab/InternVL2-4B`, etc.
+  - T + I<sup>E+</sup>
+  - `OpenGVLab/InternVL2_5-4B`, `OpenGVLab/Mono-InternVL-2B`, `OpenGVLab/InternVL2-4B`, etc.
  -
  - ✅︎
  - ✅︎
-  * - :code:`LlavaForConditionalGeneration`
+* - `LlavaForConditionalGeneration`
  - LLaVA-1.5
-    - T + I\ :sup:`E+`
-    - :code:`llava-hf/llava-1.5-7b-hf`, :code:`TIGER-Lab/Mantis-8B-siglip-llama3` (see note), etc.
+  - T + I<sup>E+</sup>
+  - `llava-hf/llava-1.5-7b-hf`, `TIGER-Lab/Mantis-8B-siglip-llama3` (see note), etc.
  -
  - ✅︎
  - ✅︎
-  * - :code:`LlavaNextForConditionalGeneration`
+* - `LlavaNextForConditionalGeneration`
  - LLaVA-NeXT
-    - T + I\ :sup:`E+`
-    - :code:`llava-hf/llava-v1.6-mistral-7b-hf`, :code:`llava-hf/llava-v1.6-vicuna-7b-hf`, etc.
+  - T + I<sup>E+</sup>
+  - `llava-hf/llava-v1.6-mistral-7b-hf`, `llava-hf/llava-v1.6-vicuna-7b-hf`, etc.
  -
  - ✅︎
  -
-  * - :code:`LlavaNextVideoForConditionalGeneration`
+* - `LlavaNextVideoForConditionalGeneration`
  - LLaVA-NeXT-Video
  - T + V
-    - :code:`llava-hf/LLaVA-NeXT-Video-7B-hf`, etc.
+  - `llava-hf/LLaVA-NeXT-Video-7B-hf`, etc.
  -
  - ✅︎
  -
-  * - :code:`LlavaOnevisionForConditionalGeneration`
+* - `LlavaOnevisionForConditionalGeneration`
  - LLaVA-Onevision
-    - T + I\ :sup:`+` + V\ :sup:`+`
-    - :code:`llava-hf/llava-onevision-qwen2-7b-ov-hf`, :code:`llava-hf/llava-onevision-qwen2-0.5b-ov-hf`, etc.
+  - T + I<sup>+</sup> + V<sup>+</sup>
+  - `llava-hf/llava-onevision-qwen2-7b-ov-hf`, `llava-hf/llava-onevision-qwen2-0.5b-ov-hf`, etc.
  -
  - ✅︎
  -
-  * - :code:`MiniCPMV`
+* - `MiniCPMV`
  - MiniCPM-V
-    - T + I\ :sup:`E+`
-    - :code:`openbmb/MiniCPM-V-2` (see note), :code:`openbmb/MiniCPM-Llama3-V-2_5`, :code:`openbmb/MiniCPM-V-2_6`, etc.
+  - T + I<sup>E+</sup>
+  - `openbmb/MiniCPM-V-2` (see note), `openbmb/MiniCPM-Llama3-V-2_5`, `openbmb/MiniCPM-V-2_6`, etc.
  - ✅︎
  - ✅︎
  -
-  * - :code:`MllamaForConditionalGeneration`
+* - `MllamaForConditionalGeneration`
  - Llama 3.2
-    - T + I\ :sup:`+`
-    - :code:`meta-llama/Llama-3.2-90B-Vision-Instruct`, :code:`meta-llama/Llama-3.2-11B-Vision`, etc.
+  - T + I<sup>+</sup>
+  - `meta-llama/Llama-3.2-90B-Vision-Instruct`, `meta-llama/Llama-3.2-11B-Vision`, etc.
  -
  -
  -
-  * - :code:`MolmoForCausalLM`
+* - `MolmoForCausalLM`
  - Molmo
  - T + I
-    - :code:`allenai/Molmo-7B-D-0924`, :code:`allenai/Molmo-72B-0924`, etc.
+  - `allenai/Molmo-7B-D-0924`, `allenai/Molmo-72B-0924`, etc.
  -
  - ✅︎
  - ✅︎
-  * - :code:`NVLM_D_Model`
+* - `NVLM_D_Model`
  - NVLM-D 1.0
-    - T + I\ :sup:`E+`
-    - :code:`nvidia/NVLM-D-72B`, etc.
+  - T + I<sup>E+</sup>
+  - `nvidia/NVLM-D-72B`, etc.
  -
  - ✅︎
  - ✅︎
-  * - :code:`PaliGemmaForConditionalGeneration`
+* - `PaliGemmaForConditionalGeneration`
  - PaliGemma, PaliGemma 2
-    - T + I\ :sup:`E`
-    - :code:`google/paligemma-3b-pt-224`, :code:`google/paligemma-3b-mix-224`, :code:`google/paligemma2-3b-ft-docci-448`, etc.
+  - T + I<sup>E</sup>
+  - `google/paligemma-3b-pt-224`, `google/paligemma-3b-mix-224`, `google/paligemma2-3b-ft-docci-448`, etc.
  -
  - ✅︎
  -
-  * - :code:`Phi3VForCausalLM`
+* - `Phi3VForCausalLM`
  - Phi-3-Vision, Phi-3.5-Vision
-    - T + I\ :sup:`E+`
-    - :code:`microsoft/Phi-3-vision-128k-instruct`, :code:`microsoft/Phi-3.5-vision-instruct` etc.
+  - T + I<sup>E+</sup>
+  - `microsoft/Phi-3-vision-128k-instruct`, `microsoft/Phi-3.5-vision-instruct` etc.
  -
  - ✅︎
  - ✅︎
-  * - :code:`PixtralForConditionalGeneration`
+* - `PixtralForConditionalGeneration`
  - Pixtral
-    - T + I\ :sup:`+`
-    - :code:`mistralai/Pixtral-12B-2409`, :code:`mistral-community/pixtral-12b` etc.
+  - T + I<sup>+</sup>
+  - `mistralai/Pixtral-12B-2409`, `mistral-community/pixtral-12b` etc.
  -
  - ✅︎
  - ✅︎
-  * - :code:`QWenLMHeadModel`
+* - `QWenLMHeadModel`
  - Qwen-VL
-    - T + I\ :sup:`E+`
-    - :code:`Qwen/Qwen-VL`, :code:`Qwen/Qwen-VL-Chat`, etc.
+  - T + I<sup>E+</sup>
+  - `Qwen/Qwen-VL`, `Qwen/Qwen-VL-Chat`, etc.
  - ✅︎
  - ✅︎
  -
-  * - :code:`Qwen2AudioForConditionalGeneration`
+* - `Qwen2AudioForConditionalGeneration`
  - Qwen2-Audio
-    - T + A\ :sup:`+`
-    - :code:`Qwen/Qwen2-Audio-7B-Instruct`
+  - T + A<sup>+</sup>
+  - `Qwen/Qwen2-Audio-7B-Instruct`
  -
  - ✅︎
  -
-  * - :code:`Qwen2VLForConditionalGeneration`
+* - `Qwen2VLForConditionalGeneration`
  - Qwen2-VL
-    - T + I\ :sup:`E+` + V\ :sup:`E+`
-    - :code:`Qwen/QVQ-72B-Preview`, :code:`Qwen/Qwen2-VL-7B-Instruct`, :code:`Qwen/Qwen2-VL-72B-Instruct`, etc.
+  - T + I<sup>E+</sup> + V<sup>E+</sup>
+  - `Qwen/QVQ-72B-Preview`, `Qwen/Qwen2-VL-7B-Instruct`, `Qwen/Qwen2-VL-72B-Instruct`, etc.
  - ✅︎
  - ✅︎
  -
-  * - :code:`UltravoxModel`
+* - `UltravoxModel`
  - Ultravox
-    - T + A\ :sup:`E+`
-    - :code:`fixie-ai/ultravox-v0_3`
+  - T + A<sup>E+</sup>
+  - `fixie-ai/ultravox-v0_3`
  -
  - ✅︎
  -
 ```

-```{eval-rst}
-:sup:`E` Pre-computed embeddings can be inputted for this modality.
-
-:sup:`+` Multiple items can be inputted per text prompt for this modality.
-```
+<sup>E</sup> Pre-computed embeddings can be inputted for this modality.  
+<sup>+</sup> Multiple items can be inputted per text prompt for this modality.

 ````{important}
 To enable multiple multi-modal items per text prompt, you have to set {code}`limit_mm_per_prompt` (offline inference)
@ -787,38 +778,37 @@ To get the best results, you should use pooling models that are specifically tra

 The following table lists those that are tested in vLLM.

-```{eval-rst}
-.. list-table::
-  :widths: 25 25 15 25 5 5
-  :header-rows: 1
+```{list-table}
+:widths: 25 25 15 25 5 5
+:header-rows: 1

-  * - Architecture
+* - Architecture
  - Models
  - Inputs
  - Example HF Models
-    - :ref:`LoRA <lora-adapter>`
-    - :ref:`PP <distributed-serving>`
-  * - :code:`LlavaNextForConditionalGeneration`
+  - [LoRA](#lora-adapter)
+  - [PP](#distributed-serving)
+* - `LlavaNextForConditionalGeneration`
  - LLaVA-NeXT-based
  - T / I
-    - :code:`royokong/e5-v`
+  - `royokong/e5-v`
  -
  - ✅︎
-  * - :code:`Phi3VForCausalLM`
+* - `Phi3VForCausalLM`
  - Phi-3-Vision-based
  - T + I
-    - :code:`TIGER-Lab/VLM2Vec-Full`
+  - `TIGER-Lab/VLM2Vec-Full`
  - 🚧
  - ✅︎
-  * - :code:`Qwen2VLForConditionalGeneration`
+* - `Qwen2VLForConditionalGeneration`
  - Qwen2-VL-based
  - T + I
-    - :code:`MrLight/dse-qwen2-2b-mrl-v1`
+  - `MrLight/dse-qwen2-2b-mrl-v1`
  -
  - ✅︎
 ```

-______________________________________________________________________
+_________________

 # Model Support Policy

--- a/docs/source/quantization/supported_hardware.md
+++ b/docs/source/quantization/supported_hardware.md
@ -4,12 +4,11 @@

 The table below shows the compatibility of various quantization implementations with different hardware platforms in vLLM:

-```{eval-rst}
-.. list-table::
-   :header-rows: 1
-   :widths: 20 8 8 8 8 8 8 8 8 8 8
+```{list-table}
+:header-rows: 1
+:widths: 20 8 8 8 8 8 8 8 8 8 8

-   * - Implementation
+* - Implementation
  - Volta
  - Turing
  - Ampere
@ -20,7 +19,7 @@ The table below shows the compatibility of various quantization implementations
  - x86 CPU
  - AWS Inferentia
  - Google TPU
-   * - AWQ
+* - AWQ
  - ✗
  - ✅︎
  - ✅︎
@ -31,7 +30,7 @@ The table below shows the compatibility of various quantization implementations
  - ✅︎
  - ✗
  - ✗
-   * - GPTQ
+* - GPTQ
  - ✅︎
  - ✅︎
  - ✅︎
@ -42,7 +41,7 @@ The table below shows the compatibility of various quantization implementations
  - ✅︎
  - ✗
  - ✗
-   * - Marlin (GPTQ/AWQ/FP8)
+* - Marlin (GPTQ/AWQ/FP8)
  - ✗
  - ✗
  - ✅︎
@ -53,7 +52,7 @@ The table below shows the compatibility of various quantization implementations
  - ✗
  - ✗
  - ✗
-   * - INT8 (W8A8)
+* - INT8 (W8A8)
  - ✗
  - ✅︎
  - ✅︎
@ -64,7 +63,7 @@ The table below shows the compatibility of various quantization implementations
  - ✅︎
  - ✗
  - ✗
-   * - FP8 (W8A8)
+* - FP8 (W8A8)
  - ✗
  - ✗
  - ✗
@ -75,7 +74,7 @@ The table below shows the compatibility of various quantization implementations
  - ✗
  - ✗
  - ✗
-   * - AQLM
+* - AQLM
  - ✅︎
  - ✅︎
  - ✅︎
@ -86,7 +85,7 @@ The table below shows the compatibility of various quantization implementations
  - ✗
  - ✗
  - ✗
-   * - bitsandbytes
+* - bitsandbytes
  - ✅︎
  - ✅︎
  - ✅︎
@ -97,7 +96,7 @@ The table below shows the compatibility of various quantization implementations
  - ✗
  - ✗
  - ✗
-   * - DeepSpeedFP
+* - DeepSpeedFP
  - ✅︎
  - ✅︎
  - ✅︎
@ -108,7 +107,7 @@ The table below shows the compatibility of various quantization implementations
  - ✗
  - ✗
  - ✗
-   * - GGUF
+* - GGUF
  - ✅︎
  - ✅︎
  - ✅︎
--- a/docs/source/serving/deploying_with_helm.md
+++ b/docs/source/serving/deploying_with_helm.md
@ -43,208 +43,207 @@ chart **including persistent volumes** and deletes the release.

 ## Values

-```{eval-rst}
-.. list-table:: Values
-   :widths: 25 25 25 25
-   :header-rows: 1
+```{list-table}
+:widths: 25 25 25 25
+:header-rows: 1

-   * - Key
+* - Key
  - Type
  - Default
  - Description
-   * - autoscaling
+* - autoscaling
  - object
  - {"enabled":false,"maxReplicas":100,"minReplicas":1,"targetCPUUtilizationPercentage":80}
  - Autoscaling configuration
-   * - autoscaling.enabled
+* - autoscaling.enabled
  - bool
  - false
  - Enable autoscaling
-   * - autoscaling.maxReplicas
+* - autoscaling.maxReplicas
  - int
  - 100
  - Maximum replicas
-   * - autoscaling.minReplicas
+* - autoscaling.minReplicas
  - int
  - 1
  - Minimum replicas
-   * - autoscaling.targetCPUUtilizationPercentage
+* - autoscaling.targetCPUUtilizationPercentage
  - int
  - 80
  - Target CPU utilization for autoscaling
-   * - configs
+* - configs
  - object
  - {}
  - Configmap
-   * - containerPort
+* - containerPort
  - int
  - 8000
  - Container port
-   * - customObjects
+* - customObjects
  - list
  - []
  - Custom Objects configuration
-   * - deploymentStrategy
+* - deploymentStrategy
  - object
  - {}
  - Deployment strategy configuration
-   * - externalConfigs
+* - externalConfigs
  - list
  - []
  - External configuration
-   * - extraContainers
+* - extraContainers
  - list
  - []
  - Additional containers configuration
-   * - extraInit
+* - extraInit
  - object
  - {"pvcStorage":"1Gi","s3modelpath":"relative_s3_model_path/opt-125m", "awsEc2MetadataDisabled": true}
  - Additional configuration for the init container
-   * - extraInit.pvcStorage
+* - extraInit.pvcStorage
  - string
  - "50Gi"
  - Storage size of the s3
-   * - extraInit.s3modelpath
+* - extraInit.s3modelpath
  - string
  - "relative_s3_model_path/opt-125m"
  - Path of the model on the s3 which hosts model weights and config files
-   * - extraInit.awsEc2MetadataDisabled
+* - extraInit.awsEc2MetadataDisabled
  - boolean
  - true
  - Disables the use of the Amazon EC2 instance metadata service
-   * - extraPorts
+* - extraPorts
  - list
  - []
  - Additional ports configuration
-   * - gpuModels
+* - gpuModels
  - list
  - ["TYPE_GPU_USED"]
  - Type of gpu used
-   * - image
+* - image
  - object
  - {"command":["vllm","serve","/data/","--served-model-name","opt-125m","--host","0.0.0.0","--port","8000"],"repository":"vllm/vllm-openai","tag":"latest"}
  - Image configuration
-   * - image.command
+* - image.command
  - list
  - ["vllm","serve","/data/","--served-model-name","opt-125m","--host","0.0.0.0","--port","8000"]
  - Container launch command
-   * - image.repository
+* - image.repository
  - string
  - "vllm/vllm-openai"
  - Image repository
-   * - image.tag
+* - image.tag
  - string
  - "latest"
  - Image tag
-   * - livenessProbe
+* - livenessProbe
  - object
  - {"failureThreshold":3,"httpGet":{"path":"/health","port":8000},"initialDelaySeconds":15,"periodSeconds":10}
  - Liveness probe configuration
-   * - livenessProbe.failureThreshold
+* - livenessProbe.failureThreshold
  - int
  - 3
  - Number of times after which if a probe fails in a row, Kubernetes considers that the overall check has failed: the container is not alive
-   * - livenessProbe.httpGet
+* - livenessProbe.httpGet
  - object
  - {"path":"/health","port":8000}
  - Configuration of the Kubelet http request on the server
-   * - livenessProbe.httpGet.path
+* - livenessProbe.httpGet.path
  - string
  - "/health"
  - Path to access on the HTTP server
-   * - livenessProbe.httpGet.port
+* - livenessProbe.httpGet.port
  - int
  - 8000
  - Name or number of the port to access on the container, on which the server is listening
-   * - livenessProbe.initialDelaySeconds
+* - livenessProbe.initialDelaySeconds
  - int
  - 15
  - Number of seconds after the container has started before liveness probe is initiated
-   * - livenessProbe.periodSeconds
+* - livenessProbe.periodSeconds
  - int
  - 10
  - How often (in seconds) to perform the liveness probe
-   * - maxUnavailablePodDisruptionBudget
+* - maxUnavailablePodDisruptionBudget
  - string
  - ""
  - Disruption Budget Configuration
-   * - readinessProbe
+* - readinessProbe
  - object
  - {"failureThreshold":3,"httpGet":{"path":"/health","port":8000},"initialDelaySeconds":5,"periodSeconds":5}
  - Readiness probe configuration
-   * - readinessProbe.failureThreshold
+* - readinessProbe.failureThreshold
  - int
  - 3
  - Number of times after which if a probe fails in a row, Kubernetes considers that the overall check has failed: the container is not ready
-   * - readinessProbe.httpGet
+* - readinessProbe.httpGet
  - object
  - {"path":"/health","port":8000}
  - Configuration of the Kubelet http request on the server
-   * - readinessProbe.httpGet.path
+* - readinessProbe.httpGet.path
  - string
  - "/health"
  - Path to access on the HTTP server
-   * - readinessProbe.httpGet.port
+* - readinessProbe.httpGet.port
  - int
  - 8000
  - Name or number of the port to access on the container, on which the server is listening
-   * - readinessProbe.initialDelaySeconds
+* - readinessProbe.initialDelaySeconds
  - int
  - 5
  - Number of seconds after the container has started before readiness probe is initiated
-   * - readinessProbe.periodSeconds
+* - readinessProbe.periodSeconds
  - int
  - 5
  - How often (in seconds) to perform the readiness probe
-   * - replicaCount
+* - replicaCount
  - int
  - 1
  - Number of replicas
-   * - resources
+* - resources
  - object
  - {"limits":{"cpu":4,"memory":"16Gi","nvidia.com/gpu":1},"requests":{"cpu":4,"memory":"16Gi","nvidia.com/gpu":1}}
  - Resource configuration
-   * - resources.limits."nvidia.com/gpu"
+* - resources.limits."nvidia.com/gpu"
  - int
  - 1
  - Number of gpus used
-   * - resources.limits.cpu
+* - resources.limits.cpu
  - int
  - 4
  - Number of CPUs
-   * - resources.limits.memory
+* - resources.limits.memory
  - string
  - "16Gi"
  - CPU memory configuration
-   * - resources.requests."nvidia.com/gpu"
+* - resources.requests."nvidia.com/gpu"
  - int
  - 1
  - Number of gpus used
-   * - resources.requests.cpu
+* - resources.requests.cpu
  - int
  - 4
  - Number of CPUs
-   * - resources.requests.memory
+* - resources.requests.memory
  - string
  - "16Gi"
  - CPU memory configuration
-   * - secrets
+* - secrets
  - object
  - {}
  - Secrets configuration
-   * - serviceName
+* - serviceName
  - string
  -
  - Service name
-   * - servicePort
+* - servicePort
  - int
  - 80
  - Service port
-   * - labels.environment
+* - labels.environment
  - string
  - test
  - Environment name
-   * - labels.release
+* - labels.release
  - string
  - test
  - Release name