[Bugfix] Update --hf-overrides for Alibaba-NLP/gte-Qwen2 (#14609)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
This commit is contained in:
Cyrus Leung 2025-03-11 22:59:43 +08:00 committed by GitHub
parent a1c8f3796c
commit af295e9b01
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
2 changed files with 6 additions and 9 deletions

View File

@ -541,14 +541,11 @@ You should manually set mean pooling by passing `--override-pooler-config '{"poo
::: :::
:::{note} :::{note}
Unlike base Qwen2, `Alibaba-NLP/gte-Qwen2-7B-instruct` uses bi-directional attention. The HF implementation of `Alibaba-NLP/gte-Qwen2-1.5B-instruct` is hardcoded to use causal attention despite what is shown in `config.json`. To compare vLLM vs HF results,
You can set `--hf-overrides '{"is_causal": false}'` to change the attention mask accordingly. you should set `--hf-overrides '{"is_causal": true}'` in vLLM so that the two implementations are consistent with each other.
On the other hand, its 1.5B variant (`Alibaba-NLP/gte-Qwen2-1.5B-instruct`) uses causal attention For both the 1.5B and 7B variants, you also need to enable `--trust-remote-code` for the correct tokenizer to be loaded.
despite being described otherwise on its model card. See [relevant issue on HF Transformers](https://github.com/huggingface/transformers/issues/34882).
Regardless of the variant, you need to enable `--trust-remote-code` for the correct tokenizer to be
loaded. See [relevant issue on HF Transformers](https://github.com/huggingface/transformers/issues/34882).
::: :::
If your model is not in the above list, we will try to automatically convert the model using If your model is not in the above list, we will try to automatically convert the model using

View File

@ -42,8 +42,8 @@ def test_models(
if model == "ssmits/Qwen2-7B-Instruct-embed-base": if model == "ssmits/Qwen2-7B-Instruct-embed-base":
vllm_extra_kwargs["override_pooler_config"] = \ vllm_extra_kwargs["override_pooler_config"] = \
PoolerConfig(pooling_type="MEAN") PoolerConfig(pooling_type="MEAN")
if model == "Alibaba-NLP/gte-Qwen2-7B-instruct": if model == "Alibaba-NLP/gte-Qwen2-1.5B-instruct":
vllm_extra_kwargs["hf_overrides"] = {"is_causal": False} vllm_extra_kwargs["hf_overrides"] = {"is_causal": True}
# The example_prompts has ending "\n", for example: # The example_prompts has ending "\n", for example:
# "Write a short story about a robot that dreams for the first time.\n" # "Write a short story about a robot that dreams for the first time.\n"