20231088/vllm - vllm - Luminance Code Repo

20231088/vllm

Author	SHA1	Message	Date
Sanju C Sudhakaran	2880e21e3d	[Hardware][Intel-Gaudi] Enable long-contexts + LoRA support for Intel Gaudi (#12812 ) Signed-off-by: Sanju C Sudhakaran <scsudhakaran@habana.ai>	2025-02-08 17:15:30 +08:00
wangxiyuan	407b5537db	[Build] Make pypi install work on CPU platform (#12874 )	2025-02-08 01:15:15 -08:00
Woosuk Kwon	4ea48fb35c	[V1][Minor] Move cascade attn logic outside _prepare_inputs (#12943 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-02-08 00:39:09 -08:00
Shaoting	e31498bdcb	[Misc] Add offline test for disaggregated prefill (#12418 )	2025-02-08 08:38:20 +00:00
youkaichao	91dd8f7aa6	[bugfix] respect distributed_executor_backend in world_size=1 (#12934 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-02-08 16:17:08 +08:00
zifeitong	d01f66b039	[Bugfix] Fix multi-round chat error when mistral tokenizer is used (#12859 ) Signed-off-by: Zifei Tong <zifeitong@gmail.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2025-02-08 07:04:34 +00:00
Ke Zhao	cc01223f3b	[Misc] Fix typo in the example file (#12896 ) Signed-off-by: Zhao Ke <yingxiongraomingzk@gmail.com>	2025-02-08 06:56:43 +00:00
Jee Jee Li	306923da82	[Bugfix] Fix Qwen2_5_VLForConditionalGeneration packed_modules_mapping (#12905 )	2025-02-07 21:02:53 -08:00
Woosuk Kwon	3243158336	[V1] Move KV block hashes from Request to KVCacheManager (#12922 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-02-07 19:14:10 -08:00
Woosuk Kwon	b21f0f9d17	[V1][Minor] Remove outdated comment (#12928 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-02-07 19:07:37 -08:00
Lu Fang	45cbc4991d	[Bugfix] Fix disagg hang caused by the prefill and decode communication issues (#12723 ) Signed-off-by: Lu Fang <lufang@fb.com>	2025-02-07 16:39:50 -08:00
Robert Shaw	932c6b7461	[V1] LM Eval With Streaming Integration Tests (#11590 )	2025-02-07 15:07:03 -08:00
TJian	eaa92d4437	[ROCm] [Feature] [Doc] [Dockerfile] [BugFix] Support Per-Token-Activation Per-Channel-Weight FP8 Quantization Inferencing (#12501 )	2025-02-07 08:13:43 -08:00
afeldman-nm	0630d4537a	[V1] Logprobs and prompt logprobs support (#9880 ) This PR is adding support for sample logprobs & prompt logprobs to vLLM v1. New behavior: - During model execution, model runner computes sample logprobs (if user-provided logprobs setting is not None) and prompt logprobs (if user-provided prompt_logprobs setting is not None). For both sample and prompt logprobs, the engine core returns 3 vectors: token ids, token logprob values, token ranks. Ranks reflect tokens' 1-indexed positions in the vocabulary vector after sorting the vocabulary by log probability in descending order. - In scheduler.update_from_output(), sample and prompt logprobs are incorporated into the EngineCoreOutput data structure which is transferred to the engine client. If multiprocessing is enabled, then sample and prompt logprobs will be (de)serialized when the EngineCoreOutput data structure is (de)serialized. - During output processing, the LogprobsProcessor transforms the triplet of token ids, token logprobs values, and token ranks into the OpenAI-compatible List[Dict[token id,Logprob]] format (for sample and prompt logprobs respectively.) - Each Logprob instance (whether sample- or prompt-) consists of a token's log-probability, rank, and detokenized string representation. Note that logprob detokenization is handled by the LogprobsProcessor not the detokenizer. Signed-off-by: Andrew Feldman <afeldman@neuralmagic.com> Signed-off-by: Nick Hill <nhill@redhat.com> Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com> Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-02-07 07:26:20 -08:00
Amit Garg	538fab93cd	PR #12718 (#12718 )	2025-02-07 06:22:37 -08:00
Cyrus Leung	ce26b16268	[Misc] Remove unnecessary detokenization in multimodal processing (#12868 )	2025-02-07 06:21:17 -08:00
Lu Fang	1918aa1b80	[MISC][EASY] Break check file names into entry and args in the pre-commit hooks (#12880 ) Signed-off-by: Lu Fang <lufang@fb.com>	2025-02-07 13:04:39 +00:00
Maximilien de Bayser	6e1fc61f0f	Prevent unecessary requests to huggingface hub (#12837 )	2025-02-06 21:37:41 -08:00
Szymon Ożóg	aa375dca9f	[Bugfix] Missing quant_config in deepseek embedding layer (#12836 )	2025-02-06 21:35:09 -08:00
ZSL98	433c4a4923	Make vllm compatible with verl (#12824 ) Co-authored-by: zhangshulai <zhangshulai@bytedance.com>	2025-02-07 11:54:20 +08:00
Lucas Wilkinson	ef533d25fb	[Bugfix] FA2 illegal memory access (#12848 )	2025-02-06 19:54:07 -08:00
Kevin H. Luu	b260782357	[misc] Revert # 12833 (#12857 ) Signed-off-by: <> Co-authored-by: EC2 Default User <ec2-user@ip-172-31-20-117.us-west-2.compute.internal>	2025-02-06 16:29:12 -08:00
Lu Fang	741429a4cd	[MISC] Check space in the file names in the pre commit checks (#12804 ) Signed-off-by: Lu Fang <lufang@fb.com>	2025-02-06 15:36:21 -08:00
Yu Chin Fabian Lim	aff404571b	Add Bamba Model (#10909 ) Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com> Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>	2025-02-06 15:22:42 -08:00
Varun Sundar Rabindranath	467a96a541	[V1] LoRA Support (#10957 ) Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com> Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>	2025-02-06 09:32:51 -08:00
Isotr0py	8108ac841d	[Bugfix] Fix unsupported FA version check for Turing GPU (#12828 )	2025-02-06 09:18:22 -08:00
Jitse Klomp	afe74f7a96	[Doc] double quote cmake package in build.inc.md (#12840 )	2025-02-06 09:17:55 -08:00
youkaichao	09b95e36ab	[torch.compile] PyTorch 2.6 and nightly compatibility (#12393 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-02-07 01:09:07 +08:00
Isotr0py	85ac82d228	[Kernel] Make rotary_embedding ops more flexible with input shape (#12777 )	2025-02-06 08:46:13 -08:00
Cyrus Leung	1e57b1ee63	[Misc] Remove unnecessary decode call (#12833 )	2025-02-06 08:45:44 -08:00
Kevin H. Luu	e152f29502	[misc] Reduce number of config file requests to HuggingFace (#12797 ) Signed-off-by: EC2 Default User <ec2-user@ip-172-31-20-117.us-west-2.compute.internal> Signed-off-by: <> Co-authored-by: EC2 Default User <ec2-user@ip-172-31-20-117.us-west-2.compute.internal>	2025-02-06 14:59:18 +00:00
Lucas Wilkinson	c786e757fa	[Attention] Use FA3 for MLA on Hopper (#12807 ) Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>	2025-02-06 11:43:12 +00:00
Simon Mo	cefd56ee35	[Docs] Add Google Cloud Slides (#12814 )	2025-02-06 01:02:38 -08:00
Dipika Sikka	7ca9934fe7	[Misc] Update w2 scale loading for GPTQMarlinMoE (#12757 )	2025-02-06 01:02:14 -08:00
youkaichao	0408efc6d0	[Misc] Improve error message for incorrect pynvml (#12809 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-02-06 15:23:50 +08:00
Michael Goin	449d1bce02	[Misc] Remove duplicated DeepSeek V2/V3 model definition (#12793 )	2025-02-05 23:16:20 -08:00
Harry Mellor	1a6fcad4c9	Improve `TransformersModel` UX (#12785 )	2025-02-05 22:24:57 -08:00
Lu Fang	56534cd577	[Bugfix] Fix the test_ultravox.py's license (#12806 ) Signed-off-by: Lu Fang <lufang@fb.com>	2025-02-06 13:25:54 +08:00
Sumit Vij	d88506dda4	[Model] LoRA Support for Ultravox model (#11253 )	2025-02-05 19:54:13 -08:00
Lu Fang	9cdea30b4f	[Misc][Easy] Remove the space from the file name	2025-02-05 19:23:35 -08:00
Lucas Wilkinson	76abd0c881	[Bugfix] Better FP8 supported defaults	2025-02-05 19:22:19 -08:00
Gregory Shtrasberg	5b19b93082	[ROCm][Kernel] Using the correct warp_size value	2025-02-05 19:15:08 -08:00
Cyrus Leung	75404d041b	[VLM] Update compatibility with transformers 4.49	2025-02-05 19:09:45 -08:00
Roger Wang	bf3b79efb8	[VLM] Qwen2.5-VL	2025-02-05 13:31:38 -08:00
Russell Bryant	9a5b1554b4	[Docs] Drop duplicate [source] links	2025-02-05 13:30:50 -08:00
Cyrus Leung	a4ce74c14a	[VLM] Use shared field to pass token ids to model	2025-02-05 13:30:46 -08:00
Rahul Tuli	3b2005e1db	Add: Support for Sparse24Bitmask Compressed Models	2025-02-05 13:30:43 -08:00
Sanju C Sudhakaran	af8486de49	[Hardware][Intel-Gaudi] Enable FusedSDPA support for Intel Gaudi (HPU)	2025-02-05 13:29:45 -08:00
Chen Zhang	4c3aac51e1	Merging PR #12536 Merged via CLI script	2025-02-05 13:24:26 -08:00
youkaichao	bc1bdecebf	[core][distributed] exact ray placement control (#12732 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-02-06 02:03:19 +08:00

1 2 3 4 5 ...

4489 Commits