[Misc] Provide correct Pixtral-HF chat template (#11891)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
This commit is contained in:
Cyrus Leung 2025-01-10 01:19:37 +08:00 committed by GitHub
parent bd82872211
commit 9a228348d2
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
3 changed files with 73 additions and 27 deletions

View File

@ -322,7 +322,7 @@ See [this page](#generative-models) for more information on how to use generativ
- ✅︎ - ✅︎
- ✅︎ - ✅︎
* - `Qwen2ForCausalLM` * - `Qwen2ForCausalLM`
- Qwen2 - QwQ, Qwen2
- `Qwen/QwQ-32B-Preview`, `Qwen/Qwen2-7B-Instruct`, `Qwen/Qwen2-7B`, etc. - `Qwen/QwQ-32B-Preview`, `Qwen/Qwen2-7B-Instruct`, `Qwen/Qwen2-7B`, etc.
- ✅︎ - ✅︎
- ✅︎ - ✅︎
@ -436,7 +436,7 @@ loaded. See [relevant issue on HF Transformers](https://github.com/huggingface/t
``` ```
If your model is not in the above list, we will try to automatically convert the model using If your model is not in the above list, we will try to automatically convert the model using
{func}`vllm.model_executor.models.adapters.as_embedding_model`. By default, the embeddings {func}`~vllm.model_executor.models.adapters.as_embedding_model`. By default, the embeddings
of the whole prompt are extracted from the normalized hidden state corresponding to the last token. of the whole prompt are extracted from the normalized hidden state corresponding to the last token.
#### Reward Modeling (`--task reward`) #### Reward Modeling (`--task reward`)
@ -468,7 +468,7 @@ of the whole prompt are extracted from the normalized hidden state corresponding
``` ```
If your model is not in the above list, we will try to automatically convert the model using If your model is not in the above list, we will try to automatically convert the model using
{func}`vllm.model_executor.models.adapters.as_reward_model`. By default, we return the hidden states of each token directly. {func}`~vllm.model_executor.models.adapters.as_reward_model`. By default, we return the hidden states of each token directly.
```{important} ```{important}
For process-supervised reward models such as `peiyi9979/math-shepherd-mistral-7b-prm`, the pooling config should be set explicitly, For process-supervised reward models such as `peiyi9979/math-shepherd-mistral-7b-prm`, the pooling config should be set explicitly,
@ -499,7 +499,7 @@ e.g.: `--override-pooler-config '{"pooling_type": "STEP", "step_tag_id": 123, "r
``` ```
If your model is not in the above list, we will try to automatically convert the model using If your model is not in the above list, we will try to automatically convert the model using
{func}`vllm.model_executor.models.adapters.as_classification_model`. By default, the class probabilities are extracted from the softmaxed hidden state corresponding to the last token. {func}`~vllm.model_executor.models.adapters.as_classification_model`. By default, the class probabilities are extracted from the softmaxed hidden state corresponding to the last token.
#### Sentence Pair Scoring (`--task score`) #### Sentence Pair Scoring (`--task score`)
@ -550,6 +550,28 @@ On the other hand, modalities separated by `/` are mutually exclusive.
See [this page](#multimodal-inputs) on how to pass multi-modal inputs to the model. See [this page](#multimodal-inputs) on how to pass multi-modal inputs to the model.
````{important}
To enable multiple multi-modal items per text prompt, you have to set `limit_mm_per_prompt` (offline inference)
or `--limit-mm-per-prompt` (online inference). For example, to enable passing up to 4 images per text prompt:
Offline inference:
```python
llm = LLM(
model="Qwen/Qwen2-VL-7B-Instruct",
limit_mm_per_prompt={"image": 4},
)
```
Online inference:
```bash
vllm serve Qwen/Qwen2-VL-7B-Instruct --limit-mm-per-prompt image=4
```
````
```{note}
vLLM currently only supports adding LoRA to the language backbone of multimodal models.
```
### Generative Models ### Generative Models
See [this page](#generative-models) for more information on how to use generative models. See [this page](#generative-models) for more information on how to use generative models.
@ -689,14 +711,14 @@ See [this page](#generative-models) for more information on how to use generativ
* - `Phi3VForCausalLM` * - `Phi3VForCausalLM`
- Phi-3-Vision, Phi-3.5-Vision - Phi-3-Vision, Phi-3.5-Vision
- T + I<sup>E+</sup> - T + I<sup>E+</sup>
- `microsoft/Phi-3-vision-128k-instruct`, `microsoft/Phi-3.5-vision-instruct` etc. - `microsoft/Phi-3-vision-128k-instruct`, `microsoft/Phi-3.5-vision-instruct`, etc.
- -
- ✅︎ - ✅︎
- ✅︎ - ✅︎
* - `PixtralForConditionalGeneration` * - `PixtralForConditionalGeneration`
- Pixtral - Pixtral
- T + I<sup>+</sup> - T + I<sup>+</sup>
- `mistralai/Pixtral-12B-2409`, `mistral-community/pixtral-12b` etc. - `mistralai/Pixtral-12B-2409`, `mistral-community/pixtral-12b` (see note), etc.
- -
- ✅︎ - ✅︎
- ✅︎ - ✅︎
@ -715,7 +737,7 @@ See [this page](#generative-models) for more information on how to use generativ
- ✅︎ - ✅︎
- ✅︎ - ✅︎
* - `Qwen2VLForConditionalGeneration` * - `Qwen2VLForConditionalGeneration`
- Qwen2-VL - QVQ, Qwen2-VL
- T + I<sup>E+</sup> + V<sup>E+</sup> - T + I<sup>E+</sup> + V<sup>E+</sup>
- `Qwen/QVQ-72B-Preview`, `Qwen/Qwen2-VL-7B-Instruct`, `Qwen/Qwen2-VL-72B-Instruct`, etc. - `Qwen/QVQ-72B-Preview`, `Qwen/Qwen2-VL-7B-Instruct`, `Qwen/Qwen2-VL-72B-Instruct`, etc.
- ✅︎ - ✅︎
@ -733,26 +755,6 @@ See [this page](#generative-models) for more information on how to use generativ
<sup>E</sup> Pre-computed embeddings can be inputted for this modality. <sup>E</sup> Pre-computed embeddings can be inputted for this modality.
<sup>+</sup> Multiple items can be inputted per text prompt for this modality. <sup>+</sup> Multiple items can be inputted per text prompt for this modality.
````{important}
To enable multiple multi-modal items per text prompt, you have to set `limit_mm_per_prompt` (offline inference)
or `--limit-mm-per-prompt` (online inference). For example, to enable passing up to 4 images per text prompt:
```python
llm = LLM(
model="Qwen/Qwen2-VL-7B-Instruct",
limit_mm_per_prompt={"image": 4},
)
```
```bash
vllm serve Qwen/Qwen2-VL-7B-Instruct --limit-mm-per-prompt image=4
```
````
```{note}
vLLM currently only supports adding LoRA to the language backbone of multimodal models.
```
```{note} ```{note}
To use `TIGER-Lab/Mantis-8B-siglip-llama3`, you have pass `--hf_overrides '{"architectures": ["MantisForConditionalGeneration"]}'` when running vLLM. To use `TIGER-Lab/Mantis-8B-siglip-llama3`, you have pass `--hf_overrides '{"architectures": ["MantisForConditionalGeneration"]}'` when running vLLM.
``` ```
@ -762,6 +764,11 @@ The official `openbmb/MiniCPM-V-2` doesn't work yet, so we need to use a fork (`
For more details, please see: <gh-pr:4087#issuecomment-2250397630> For more details, please see: <gh-pr:4087#issuecomment-2250397630>
``` ```
```{note}
The chat template for Pixtral-HF is incorrect (see [discussion](https://huggingface.co/mistral-community/pixtral-12b/discussions/22)).
A corrected version is available at <gh-file:examples/template_pixtral_hf.jinja>.
```
### Pooling Models ### Pooling Models
See [this page](pooling-models) for more information on how to use pooling models. See [this page](pooling-models) for more information on how to use pooling models.

View File

@ -0,0 +1,38 @@
{%- if messages[0]["role"] == "system" %}
{%- set system_message = messages[0]["content"] %}
{%- set loop_messages = messages[1:] %}
{%- else %}
{%- set loop_messages = messages %}
{%- endif %}
{{- bos_token }}
{%- for message in loop_messages %}
{%- if (message['role'] == 'user') != (loop.index0 % 2 == 0) %}
{{- raise_exception('After the optional system message, conversation roles must alternate user/assistant/user/assistant/...') }}
{%- endif %}
{%- if message["role"] == "user" %}
{%- if loop.last and system_message is defined %}
{{- "[INST]" + system_message + "\n" }}
{%- else %}
{{- "[INST]" }}
{%- endif %}
{%- if message["content"] is not string %}
{%- for chunk in message["content"] %}
{%- if chunk["type"] == "text" %}
{{- chunk["text"] }}
{%- elif chunk["type"] == "image" %}
{{- "[IMG]" }}
{%- else %}
{{- raise_exception("Unrecognized content type!") }}
{%- endif %}
{%- endfor %}
{%- else %}
{{- message["content"] }}
{%- endif %}
{{- "[/INST]" }}
{%- elif message["role"] == "assistant" %}
{{- message["content"] + eos_token}}
{%- else %}
{{- raise_exception("Only user and assistant roles are supported, with the exception of an initial optional system message!") }}
{%- endif %}
{%- endfor %}

View File

@ -758,6 +758,7 @@ def test_resolve_content_format_hf_defined(model, expected_format):
("template_falcon.jinja", "string"), ("template_falcon.jinja", "string"),
("template_inkbot.jinja", "string"), ("template_inkbot.jinja", "string"),
("template_llava.jinja", "string"), ("template_llava.jinja", "string"),
("template_pixtral_hf.jinja", "openai"),
("template_vlm2vec.jinja", "openai"), ("template_vlm2vec.jinja", "openai"),
("tool_chat_template_granite_20b_fc.jinja", "string"), ("tool_chat_template_granite_20b_fc.jinja", "string"),
("tool_chat_template_hermes.jinja", "string"), ("tool_chat_template_hermes.jinja", "string"),