[Misc] Provide correct Pixtral-HF chat template (#11891)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
This commit is contained in:
parent
bd82872211
commit
9a228348d2
@ -322,7 +322,7 @@ See [this page](#generative-models) for more information on how to use generativ
|
|||||||
- ✅︎
|
- ✅︎
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - `Qwen2ForCausalLM`
|
* - `Qwen2ForCausalLM`
|
||||||
- Qwen2
|
- QwQ, Qwen2
|
||||||
- `Qwen/QwQ-32B-Preview`, `Qwen/Qwen2-7B-Instruct`, `Qwen/Qwen2-7B`, etc.
|
- `Qwen/QwQ-32B-Preview`, `Qwen/Qwen2-7B-Instruct`, `Qwen/Qwen2-7B`, etc.
|
||||||
- ✅︎
|
- ✅︎
|
||||||
- ✅︎
|
- ✅︎
|
||||||
@ -436,7 +436,7 @@ loaded. See [relevant issue on HF Transformers](https://github.com/huggingface/t
|
|||||||
```
|
```
|
||||||
|
|
||||||
If your model is not in the above list, we will try to automatically convert the model using
|
If your model is not in the above list, we will try to automatically convert the model using
|
||||||
{func}`vllm.model_executor.models.adapters.as_embedding_model`. By default, the embeddings
|
{func}`~vllm.model_executor.models.adapters.as_embedding_model`. By default, the embeddings
|
||||||
of the whole prompt are extracted from the normalized hidden state corresponding to the last token.
|
of the whole prompt are extracted from the normalized hidden state corresponding to the last token.
|
||||||
|
|
||||||
#### Reward Modeling (`--task reward`)
|
#### Reward Modeling (`--task reward`)
|
||||||
@ -468,7 +468,7 @@ of the whole prompt are extracted from the normalized hidden state corresponding
|
|||||||
```
|
```
|
||||||
|
|
||||||
If your model is not in the above list, we will try to automatically convert the model using
|
If your model is not in the above list, we will try to automatically convert the model using
|
||||||
{func}`vllm.model_executor.models.adapters.as_reward_model`. By default, we return the hidden states of each token directly.
|
{func}`~vllm.model_executor.models.adapters.as_reward_model`. By default, we return the hidden states of each token directly.
|
||||||
|
|
||||||
```{important}
|
```{important}
|
||||||
For process-supervised reward models such as `peiyi9979/math-shepherd-mistral-7b-prm`, the pooling config should be set explicitly,
|
For process-supervised reward models such as `peiyi9979/math-shepherd-mistral-7b-prm`, the pooling config should be set explicitly,
|
||||||
@ -499,7 +499,7 @@ e.g.: `--override-pooler-config '{"pooling_type": "STEP", "step_tag_id": 123, "r
|
|||||||
```
|
```
|
||||||
|
|
||||||
If your model is not in the above list, we will try to automatically convert the model using
|
If your model is not in the above list, we will try to automatically convert the model using
|
||||||
{func}`vllm.model_executor.models.adapters.as_classification_model`. By default, the class probabilities are extracted from the softmaxed hidden state corresponding to the last token.
|
{func}`~vllm.model_executor.models.adapters.as_classification_model`. By default, the class probabilities are extracted from the softmaxed hidden state corresponding to the last token.
|
||||||
|
|
||||||
#### Sentence Pair Scoring (`--task score`)
|
#### Sentence Pair Scoring (`--task score`)
|
||||||
|
|
||||||
@ -550,6 +550,28 @@ On the other hand, modalities separated by `/` are mutually exclusive.
|
|||||||
|
|
||||||
See [this page](#multimodal-inputs) on how to pass multi-modal inputs to the model.
|
See [this page](#multimodal-inputs) on how to pass multi-modal inputs to the model.
|
||||||
|
|
||||||
|
````{important}
|
||||||
|
To enable multiple multi-modal items per text prompt, you have to set `limit_mm_per_prompt` (offline inference)
|
||||||
|
or `--limit-mm-per-prompt` (online inference). For example, to enable passing up to 4 images per text prompt:
|
||||||
|
|
||||||
|
Offline inference:
|
||||||
|
```python
|
||||||
|
llm = LLM(
|
||||||
|
model="Qwen/Qwen2-VL-7B-Instruct",
|
||||||
|
limit_mm_per_prompt={"image": 4},
|
||||||
|
)
|
||||||
|
```
|
||||||
|
|
||||||
|
Online inference:
|
||||||
|
```bash
|
||||||
|
vllm serve Qwen/Qwen2-VL-7B-Instruct --limit-mm-per-prompt image=4
|
||||||
|
```
|
||||||
|
````
|
||||||
|
|
||||||
|
```{note}
|
||||||
|
vLLM currently only supports adding LoRA to the language backbone of multimodal models.
|
||||||
|
```
|
||||||
|
|
||||||
### Generative Models
|
### Generative Models
|
||||||
|
|
||||||
See [this page](#generative-models) for more information on how to use generative models.
|
See [this page](#generative-models) for more information on how to use generative models.
|
||||||
@ -689,14 +711,14 @@ See [this page](#generative-models) for more information on how to use generativ
|
|||||||
* - `Phi3VForCausalLM`
|
* - `Phi3VForCausalLM`
|
||||||
- Phi-3-Vision, Phi-3.5-Vision
|
- Phi-3-Vision, Phi-3.5-Vision
|
||||||
- T + I<sup>E+</sup>
|
- T + I<sup>E+</sup>
|
||||||
- `microsoft/Phi-3-vision-128k-instruct`, `microsoft/Phi-3.5-vision-instruct` etc.
|
- `microsoft/Phi-3-vision-128k-instruct`, `microsoft/Phi-3.5-vision-instruct`, etc.
|
||||||
-
|
-
|
||||||
- ✅︎
|
- ✅︎
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - `PixtralForConditionalGeneration`
|
* - `PixtralForConditionalGeneration`
|
||||||
- Pixtral
|
- Pixtral
|
||||||
- T + I<sup>+</sup>
|
- T + I<sup>+</sup>
|
||||||
- `mistralai/Pixtral-12B-2409`, `mistral-community/pixtral-12b` etc.
|
- `mistralai/Pixtral-12B-2409`, `mistral-community/pixtral-12b` (see note), etc.
|
||||||
-
|
-
|
||||||
- ✅︎
|
- ✅︎
|
||||||
- ✅︎
|
- ✅︎
|
||||||
@ -715,7 +737,7 @@ See [this page](#generative-models) for more information on how to use generativ
|
|||||||
- ✅︎
|
- ✅︎
|
||||||
- ✅︎
|
- ✅︎
|
||||||
* - `Qwen2VLForConditionalGeneration`
|
* - `Qwen2VLForConditionalGeneration`
|
||||||
- Qwen2-VL
|
- QVQ, Qwen2-VL
|
||||||
- T + I<sup>E+</sup> + V<sup>E+</sup>
|
- T + I<sup>E+</sup> + V<sup>E+</sup>
|
||||||
- `Qwen/QVQ-72B-Preview`, `Qwen/Qwen2-VL-7B-Instruct`, `Qwen/Qwen2-VL-72B-Instruct`, etc.
|
- `Qwen/QVQ-72B-Preview`, `Qwen/Qwen2-VL-7B-Instruct`, `Qwen/Qwen2-VL-72B-Instruct`, etc.
|
||||||
- ✅︎
|
- ✅︎
|
||||||
@ -733,26 +755,6 @@ See [this page](#generative-models) for more information on how to use generativ
|
|||||||
<sup>E</sup> Pre-computed embeddings can be inputted for this modality.
|
<sup>E</sup> Pre-computed embeddings can be inputted for this modality.
|
||||||
<sup>+</sup> Multiple items can be inputted per text prompt for this modality.
|
<sup>+</sup> Multiple items can be inputted per text prompt for this modality.
|
||||||
|
|
||||||
````{important}
|
|
||||||
To enable multiple multi-modal items per text prompt, you have to set `limit_mm_per_prompt` (offline inference)
|
|
||||||
or `--limit-mm-per-prompt` (online inference). For example, to enable passing up to 4 images per text prompt:
|
|
||||||
|
|
||||||
```python
|
|
||||||
llm = LLM(
|
|
||||||
model="Qwen/Qwen2-VL-7B-Instruct",
|
|
||||||
limit_mm_per_prompt={"image": 4},
|
|
||||||
)
|
|
||||||
```
|
|
||||||
|
|
||||||
```bash
|
|
||||||
vllm serve Qwen/Qwen2-VL-7B-Instruct --limit-mm-per-prompt image=4
|
|
||||||
```
|
|
||||||
````
|
|
||||||
|
|
||||||
```{note}
|
|
||||||
vLLM currently only supports adding LoRA to the language backbone of multimodal models.
|
|
||||||
```
|
|
||||||
|
|
||||||
```{note}
|
```{note}
|
||||||
To use `TIGER-Lab/Mantis-8B-siglip-llama3`, you have pass `--hf_overrides '{"architectures": ["MantisForConditionalGeneration"]}'` when running vLLM.
|
To use `TIGER-Lab/Mantis-8B-siglip-llama3`, you have pass `--hf_overrides '{"architectures": ["MantisForConditionalGeneration"]}'` when running vLLM.
|
||||||
```
|
```
|
||||||
@ -762,6 +764,11 @@ The official `openbmb/MiniCPM-V-2` doesn't work yet, so we need to use a fork (`
|
|||||||
For more details, please see: <gh-pr:4087#issuecomment-2250397630>
|
For more details, please see: <gh-pr:4087#issuecomment-2250397630>
|
||||||
```
|
```
|
||||||
|
|
||||||
|
```{note}
|
||||||
|
The chat template for Pixtral-HF is incorrect (see [discussion](https://huggingface.co/mistral-community/pixtral-12b/discussions/22)).
|
||||||
|
A corrected version is available at <gh-file:examples/template_pixtral_hf.jinja>.
|
||||||
|
```
|
||||||
|
|
||||||
### Pooling Models
|
### Pooling Models
|
||||||
|
|
||||||
See [this page](pooling-models) for more information on how to use pooling models.
|
See [this page](pooling-models) for more information on how to use pooling models.
|
||||||
|
38
examples/template_pixtral_hf.jinja
Normal file
38
examples/template_pixtral_hf.jinja
Normal file
@ -0,0 +1,38 @@
|
|||||||
|
{%- if messages[0]["role"] == "system" %}
|
||||||
|
{%- set system_message = messages[0]["content"] %}
|
||||||
|
{%- set loop_messages = messages[1:] %}
|
||||||
|
{%- else %}
|
||||||
|
{%- set loop_messages = messages %}
|
||||||
|
{%- endif %}
|
||||||
|
|
||||||
|
{{- bos_token }}
|
||||||
|
{%- for message in loop_messages %}
|
||||||
|
{%- if (message['role'] == 'user') != (loop.index0 % 2 == 0) %}
|
||||||
|
{{- raise_exception('After the optional system message, conversation roles must alternate user/assistant/user/assistant/...') }}
|
||||||
|
{%- endif %}
|
||||||
|
{%- if message["role"] == "user" %}
|
||||||
|
{%- if loop.last and system_message is defined %}
|
||||||
|
{{- "[INST]" + system_message + "\n" }}
|
||||||
|
{%- else %}
|
||||||
|
{{- "[INST]" }}
|
||||||
|
{%- endif %}
|
||||||
|
{%- if message["content"] is not string %}
|
||||||
|
{%- for chunk in message["content"] %}
|
||||||
|
{%- if chunk["type"] == "text" %}
|
||||||
|
{{- chunk["text"] }}
|
||||||
|
{%- elif chunk["type"] == "image" %}
|
||||||
|
{{- "[IMG]" }}
|
||||||
|
{%- else %}
|
||||||
|
{{- raise_exception("Unrecognized content type!") }}
|
||||||
|
{%- endif %}
|
||||||
|
{%- endfor %}
|
||||||
|
{%- else %}
|
||||||
|
{{- message["content"] }}
|
||||||
|
{%- endif %}
|
||||||
|
{{- "[/INST]" }}
|
||||||
|
{%- elif message["role"] == "assistant" %}
|
||||||
|
{{- message["content"] + eos_token}}
|
||||||
|
{%- else %}
|
||||||
|
{{- raise_exception("Only user and assistant roles are supported, with the exception of an initial optional system message!") }}
|
||||||
|
{%- endif %}
|
||||||
|
{%- endfor %}
|
@ -758,6 +758,7 @@ def test_resolve_content_format_hf_defined(model, expected_format):
|
|||||||
("template_falcon.jinja", "string"),
|
("template_falcon.jinja", "string"),
|
||||||
("template_inkbot.jinja", "string"),
|
("template_inkbot.jinja", "string"),
|
||||||
("template_llava.jinja", "string"),
|
("template_llava.jinja", "string"),
|
||||||
|
("template_pixtral_hf.jinja", "openai"),
|
||||||
("template_vlm2vec.jinja", "openai"),
|
("template_vlm2vec.jinja", "openai"),
|
||||||
("tool_chat_template_granite_20b_fc.jinja", "string"),
|
("tool_chat_template_granite_20b_fc.jinja", "string"),
|
||||||
("tool_chat_template_hermes.jinja", "string"),
|
("tool_chat_template_hermes.jinja", "string"),
|
||||||
|
Loading…
x
Reference in New Issue
Block a user