Sanju C Sudhakaran
2880e21e3d
[Hardware][Intel-Gaudi] Enable long-contexts + LoRA support for Intel Gaudi ( #12812 )
...
Signed-off-by: Sanju C Sudhakaran <scsudhakaran@habana.ai>
2025-02-08 17:15:30 +08:00
wangxiyuan
407b5537db
[Build] Make pypi install work on CPU platform ( #12874 )
2025-02-08 01:15:15 -08:00
Woosuk Kwon
4ea48fb35c
[V1][Minor] Move cascade attn logic outside _prepare_inputs ( #12943 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-02-08 00:39:09 -08:00
Shaoting
e31498bdcb
[Misc] Add offline test for disaggregated prefill ( #12418 )
2025-02-08 08:38:20 +00:00
youkaichao
91dd8f7aa6
[bugfix] respect distributed_executor_backend in world_size=1 ( #12934 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-02-08 16:17:08 +08:00
zifeitong
d01f66b039
[Bugfix] Fix multi-round chat error when mistral tokenizer is used ( #12859 )
...
Signed-off-by: Zifei Tong <zifeitong@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-02-08 07:04:34 +00:00
Ke Zhao
cc01223f3b
[Misc] Fix typo in the example file ( #12896 )
...
Signed-off-by: Zhao Ke <yingxiongraomingzk@gmail.com>
2025-02-08 06:56:43 +00:00
Jee Jee Li
306923da82
[Bugfix] Fix Qwen2_5_VLForConditionalGeneration packed_modules_mapping ( #12905 )
2025-02-07 21:02:53 -08:00
Woosuk Kwon
3243158336
[V1] Move KV block hashes from Request to KVCacheManager ( #12922 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-02-07 19:14:10 -08:00
Woosuk Kwon
b21f0f9d17
[V1][Minor] Remove outdated comment ( #12928 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-02-07 19:07:37 -08:00
Lu Fang
45cbc4991d
[Bugfix] Fix disagg hang caused by the prefill and decode communication issues ( #12723 )
...
Signed-off-by: Lu Fang <lufang@fb.com>
2025-02-07 16:39:50 -08:00
Robert Shaw
932c6b7461
[V1] LM Eval With Streaming Integration Tests ( #11590 )
2025-02-07 15:07:03 -08:00
TJian
eaa92d4437
[ROCm] [Feature] [Doc] [Dockerfile] [BugFix] Support Per-Token-Activation Per-Channel-Weight FP8 Quantization Inferencing ( #12501 )
2025-02-07 08:13:43 -08:00
afeldman-nm
0630d4537a
[V1] Logprobs and prompt logprobs support ( #9880 )
...
This PR is adding support for sample logprobs & prompt logprobs to vLLM v1.
New behavior:
- During model execution, model runner computes sample logprobs (if user-provided logprobs setting is not None) and prompt logprobs (if user-provided prompt_logprobs setting is not None). For both sample and prompt logprobs, the engine core returns 3 vectors: token ids, token logprob values, token ranks. Ranks reflect tokens' 1-indexed positions in the vocabulary vector after sorting the vocabulary by log probability in descending order.
- In scheduler.update_from_output(), sample and prompt logprobs are incorporated into the EngineCoreOutput data structure which is transferred to the engine client. If multiprocessing is enabled, then sample and prompt logprobs will be (de)serialized when the EngineCoreOutput data structure is (de)serialized.
- During output processing, the LogprobsProcessor transforms the triplet of token ids, token logprobs values, and token ranks into the OpenAI-compatible List[Dict[token id,Logprob]] format (for sample and prompt logprobs respectively.)
- Each Logprob instance (whether sample- or prompt-) consists of a token's log-probability, rank, and detokenized string representation. Note that logprob detokenization is handled by the LogprobsProcessor not the detokenizer.
Signed-off-by: Andrew Feldman <afeldman@neuralmagic.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
2025-02-07 07:26:20 -08:00
Amit Garg
538fab93cd
PR #12718 ( #12718 )
2025-02-07 06:22:37 -08:00
Cyrus Leung
ce26b16268
[Misc] Remove unnecessary detokenization in multimodal processing ( #12868 )
2025-02-07 06:21:17 -08:00
Lu Fang
1918aa1b80
[MISC][EASY] Break check file names into entry and args in the pre-commit hooks ( #12880 )
...
Signed-off-by: Lu Fang <lufang@fb.com>
2025-02-07 13:04:39 +00:00
Maximilien de Bayser
6e1fc61f0f
Prevent unecessary requests to huggingface hub ( #12837 )
2025-02-06 21:37:41 -08:00
Szymon Ożóg
aa375dca9f
[Bugfix] Missing quant_config in deepseek embedding layer ( #12836 )
2025-02-06 21:35:09 -08:00
ZSL98
433c4a4923
Make vllm compatible with verl ( #12824 )
...
Co-authored-by: zhangshulai <zhangshulai@bytedance.com>
2025-02-07 11:54:20 +08:00
Lucas Wilkinson
ef533d25fb
[Bugfix] FA2 illegal memory access ( #12848 )
2025-02-06 19:54:07 -08:00
Kevin H. Luu
b260782357
[misc] Revert # 12833 ( #12857 )
...
Signed-off-by: <>
Co-authored-by: EC2 Default User <ec2-user@ip-172-31-20-117.us-west-2.compute.internal>
2025-02-06 16:29:12 -08:00
Lu Fang
741429a4cd
[MISC] Check space in the file names in the pre commit checks ( #12804 )
...
Signed-off-by: Lu Fang <lufang@fb.com>
2025-02-06 15:36:21 -08:00
Yu Chin Fabian Lim
aff404571b
Add Bamba Model ( #10909 )
...
Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
2025-02-06 15:22:42 -08:00
Varun Sundar Rabindranath
467a96a541
[V1] LoRA Support ( #10957 )
...
Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
2025-02-06 09:32:51 -08:00
Isotr0py
8108ac841d
[Bugfix] Fix unsupported FA version check for Turing GPU ( #12828 )
2025-02-06 09:18:22 -08:00
Jitse Klomp
afe74f7a96
[Doc] double quote cmake package in build.inc.md ( #12840 )
2025-02-06 09:17:55 -08:00
youkaichao
09b95e36ab
[torch.compile] PyTorch 2.6 and nightly compatibility ( #12393 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-02-07 01:09:07 +08:00
Isotr0py
85ac82d228
[Kernel] Make rotary_embedding ops more flexible with input shape ( #12777 )
2025-02-06 08:46:13 -08:00
Cyrus Leung
1e57b1ee63
[Misc] Remove unnecessary decode call ( #12833 )
2025-02-06 08:45:44 -08:00
Kevin H. Luu
e152f29502
[misc] Reduce number of config file requests to HuggingFace ( #12797 )
...
Signed-off-by: EC2 Default User <ec2-user@ip-172-31-20-117.us-west-2.compute.internal>
Signed-off-by: <>
Co-authored-by: EC2 Default User <ec2-user@ip-172-31-20-117.us-west-2.compute.internal>
2025-02-06 14:59:18 +00:00
Lucas Wilkinson
c786e757fa
[Attention] Use FA3 for MLA on Hopper ( #12807 )
...
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
2025-02-06 11:43:12 +00:00
Simon Mo
cefd56ee35
[Docs] Add Google Cloud Slides ( #12814 )
2025-02-06 01:02:38 -08:00
Dipika Sikka
7ca9934fe7
[Misc] Update w2 scale loading for GPTQMarlinMoE ( #12757 )
2025-02-06 01:02:14 -08:00
youkaichao
0408efc6d0
[Misc] Improve error message for incorrect pynvml ( #12809 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-02-06 15:23:50 +08:00
Michael Goin
449d1bce02
[Misc] Remove duplicated DeepSeek V2/V3 model definition ( #12793 )
2025-02-05 23:16:20 -08:00
Harry Mellor
1a6fcad4c9
Improve TransformersModel
UX ( #12785 )
2025-02-05 22:24:57 -08:00
Lu Fang
56534cd577
[Bugfix] Fix the test_ultravox.py's license ( #12806 )
...
Signed-off-by: Lu Fang <lufang@fb.com>
2025-02-06 13:25:54 +08:00
Sumit Vij
d88506dda4
[Model] LoRA Support for Ultravox model ( #11253 )
2025-02-05 19:54:13 -08:00
Lu Fang
9cdea30b4f
[Misc][Easy] Remove the space from the file name
2025-02-05 19:23:35 -08:00
Lucas Wilkinson
76abd0c881
[Bugfix] Better FP8 supported defaults
2025-02-05 19:22:19 -08:00
Gregory Shtrasberg
5b19b93082
[ROCm][Kernel] Using the correct warp_size value
2025-02-05 19:15:08 -08:00
Cyrus Leung
75404d041b
[VLM] Update compatibility with transformers 4.49
2025-02-05 19:09:45 -08:00
Roger Wang
bf3b79efb8
[VLM] Qwen2.5-VL
2025-02-05 13:31:38 -08:00
Russell Bryant
9a5b1554b4
[Docs] Drop duplicate [source] links
2025-02-05 13:30:50 -08:00
Cyrus Leung
a4ce74c14a
[VLM] Use shared field to pass token ids to model
2025-02-05 13:30:46 -08:00
Rahul Tuli
3b2005e1db
Add: Support for Sparse24Bitmask Compressed Models
2025-02-05 13:30:43 -08:00
Sanju C Sudhakaran
af8486de49
[Hardware][Intel-Gaudi] Enable FusedSDPA support for Intel Gaudi (HPU)
2025-02-05 13:29:45 -08:00
Chen Zhang
4c3aac51e1
Merging PR #12536
...
Merged via CLI script
2025-02-05 13:24:26 -08:00
youkaichao
bc1bdecebf
[core][distributed] exact ray placement control ( #12732 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-02-06 02:03:19 +08:00