20231088/vllm - vllm - Luminance Code Repo

20231088/vllm

Author	SHA1	Message	Date
youkaichao	59fff4a01a	[core] improve error handling when wake up from sleep mode (#12981 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-02-10 09:38:57 +08:00
Lu Fang	29f1d47e73	[MISC] Always import version library first in the vllm package (#12979 ) Signed-off-by: Lu Fang <lufang@fb.com>	2025-02-09 18:56:40 +08:00
youkaichao	cf797aa856	[core] port pynvml into vllm codebase (#12963 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-02-09 15:00:00 +08:00
Woosuk Kwon	24700c346b	[V1] Cache `uses_mrope` in GPUModelRunner (#12969 )	2025-02-08 15:32:32 -08:00
Patrick von Platen	d366ccc4e3	[RFC] [Mistral] FP8 format (#10130 ) Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: mgoin <mgoin64@gmail.com>	2025-02-08 14:12:53 -07:00
Woosuk Kwon	870c37481e	[V1][Minor] Remove outdated comment (#12968 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-02-08 12:48:30 -08:00
Jee Jee Li	86222a3dab	[VLM] Merged multi-modal processor for GLM4V (#12449 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-02-08 20:32:16 +00:00
youkaichao	fe743b798d	[bugfix] fix early import of flash attention (#12959 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-02-09 00:06:56 +08:00
shangmingc	913df14da3	[Bugfix] Remove unused seq_group_metadata_list from ModelInputForGPU (#12935 ) Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>	2025-02-08 14:46:19 +00:00
Cyrus Leung	8a69e0e20e	[CI/Build] Auto-fix Markdown files (#12941 )	2025-02-08 04:25:15 -08:00
Isotr0py	4c8dd12ef3	[Misc] Add qwen2.5-vl BNB support (#12944 )	2025-02-08 04:24:47 -08:00
Jun Duan	256a2d29dc	[Doc] Correct HF repository for TeleChat2 models (#12949 )	2025-02-08 01:42:15 -08:00
Liangfu Chen	c45d398e6f	[CI] Resolve transformers-neuronx version conflict (#12925 )	2025-02-08 01:41:35 -08:00
Jun Duan	011e612d92	[Misc] Log time consumption on weight downloading (#12926 )	2025-02-08 09:16:42 +00:00
Varun Sundar Rabindranath	7e1837676a	[misc] Add LoRA to benchmark_serving (#12898 ) Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com> Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>	2025-02-08 17:15:44 +08:00
Sanju C Sudhakaran	2880e21e3d	[Hardware][Intel-Gaudi] Enable long-contexts + LoRA support for Intel Gaudi (#12812 ) Signed-off-by: Sanju C Sudhakaran <scsudhakaran@habana.ai>	2025-02-08 17:15:30 +08:00
wangxiyuan	407b5537db	[Build] Make pypi install work on CPU platform (#12874 )	2025-02-08 01:15:15 -08:00
Woosuk Kwon	4ea48fb35c	[V1][Minor] Move cascade attn logic outside _prepare_inputs (#12943 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-02-08 00:39:09 -08:00
Shaoting	e31498bdcb	[Misc] Add offline test for disaggregated prefill (#12418 )	2025-02-08 08:38:20 +00:00
youkaichao	91dd8f7aa6	[bugfix] respect distributed_executor_backend in world_size=1 (#12934 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-02-08 16:17:08 +08:00
zifeitong	d01f66b039	[Bugfix] Fix multi-round chat error when mistral tokenizer is used (#12859 ) Signed-off-by: Zifei Tong <zifeitong@gmail.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2025-02-08 07:04:34 +00:00
Ke Zhao	cc01223f3b	[Misc] Fix typo in the example file (#12896 ) Signed-off-by: Zhao Ke <yingxiongraomingzk@gmail.com>	2025-02-08 06:56:43 +00:00
Jee Jee Li	306923da82	[Bugfix] Fix Qwen2_5_VLForConditionalGeneration packed_modules_mapping (#12905 )	2025-02-07 21:02:53 -08:00
Woosuk Kwon	3243158336	[V1] Move KV block hashes from Request to KVCacheManager (#12922 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-02-07 19:14:10 -08:00
Woosuk Kwon	b21f0f9d17	[V1][Minor] Remove outdated comment (#12928 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-02-07 19:07:37 -08:00
Lu Fang	45cbc4991d	[Bugfix] Fix disagg hang caused by the prefill and decode communication issues (#12723 ) Signed-off-by: Lu Fang <lufang@fb.com>	2025-02-07 16:39:50 -08:00
Robert Shaw	932c6b7461	[V1] LM Eval With Streaming Integration Tests (#11590 )	2025-02-07 15:07:03 -08:00
TJian	eaa92d4437	[ROCm] [Feature] [Doc] [Dockerfile] [BugFix] Support Per-Token-Activation Per-Channel-Weight FP8 Quantization Inferencing (#12501 )	2025-02-07 08:13:43 -08:00
afeldman-nm	0630d4537a	[V1] Logprobs and prompt logprobs support (#9880 ) This PR is adding support for sample logprobs & prompt logprobs to vLLM v1. New behavior: - During model execution, model runner computes sample logprobs (if user-provided logprobs setting is not None) and prompt logprobs (if user-provided prompt_logprobs setting is not None). For both sample and prompt logprobs, the engine core returns 3 vectors: token ids, token logprob values, token ranks. Ranks reflect tokens' 1-indexed positions in the vocabulary vector after sorting the vocabulary by log probability in descending order. - In scheduler.update_from_output(), sample and prompt logprobs are incorporated into the EngineCoreOutput data structure which is transferred to the engine client. If multiprocessing is enabled, then sample and prompt logprobs will be (de)serialized when the EngineCoreOutput data structure is (de)serialized. - During output processing, the LogprobsProcessor transforms the triplet of token ids, token logprobs values, and token ranks into the OpenAI-compatible List[Dict[token id,Logprob]] format (for sample and prompt logprobs respectively.) - Each Logprob instance (whether sample- or prompt-) consists of a token's log-probability, rank, and detokenized string representation. Note that logprob detokenization is handled by the LogprobsProcessor not the detokenizer. Signed-off-by: Andrew Feldman <afeldman@neuralmagic.com> Signed-off-by: Nick Hill <nhill@redhat.com> Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com> Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-02-07 07:26:20 -08:00
Amit Garg	538fab93cd	PR #12718 (#12718 )	2025-02-07 06:22:37 -08:00
Cyrus Leung	ce26b16268	[Misc] Remove unnecessary detokenization in multimodal processing (#12868 )	2025-02-07 06:21:17 -08:00
Lu Fang	1918aa1b80	[MISC][EASY] Break check file names into entry and args in the pre-commit hooks (#12880 ) Signed-off-by: Lu Fang <lufang@fb.com>	2025-02-07 13:04:39 +00:00
Maximilien de Bayser	6e1fc61f0f	Prevent unecessary requests to huggingface hub (#12837 )	2025-02-06 21:37:41 -08:00
Szymon Ożóg	aa375dca9f	[Bugfix] Missing quant_config in deepseek embedding layer (#12836 )	2025-02-06 21:35:09 -08:00
ZSL98	433c4a4923	Make vllm compatible with verl (#12824 ) Co-authored-by: zhangshulai <zhangshulai@bytedance.com>	2025-02-07 11:54:20 +08:00
Lucas Wilkinson	ef533d25fb	[Bugfix] FA2 illegal memory access (#12848 )	2025-02-06 19:54:07 -08:00
Kevin H. Luu	b260782357	[misc] Revert # 12833 (#12857 ) Signed-off-by: <> Co-authored-by: EC2 Default User <ec2-user@ip-172-31-20-117.us-west-2.compute.internal>	2025-02-06 16:29:12 -08:00
Lu Fang	741429a4cd	[MISC] Check space in the file names in the pre commit checks (#12804 ) Signed-off-by: Lu Fang <lufang@fb.com>	2025-02-06 15:36:21 -08:00
Yu Chin Fabian Lim	aff404571b	Add Bamba Model (#10909 ) Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com> Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>	2025-02-06 15:22:42 -08:00
Varun Sundar Rabindranath	467a96a541	[V1] LoRA Support (#10957 ) Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com> Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>	2025-02-06 09:32:51 -08:00
Isotr0py	8108ac841d	[Bugfix] Fix unsupported FA version check for Turing GPU (#12828 )	2025-02-06 09:18:22 -08:00
Jitse Klomp	afe74f7a96	[Doc] double quote cmake package in build.inc.md (#12840 )	2025-02-06 09:17:55 -08:00
youkaichao	09b95e36ab	[torch.compile] PyTorch 2.6 and nightly compatibility (#12393 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-02-07 01:09:07 +08:00
Isotr0py	85ac82d228	[Kernel] Make rotary_embedding ops more flexible with input shape (#12777 )	2025-02-06 08:46:13 -08:00
Cyrus Leung	1e57b1ee63	[Misc] Remove unnecessary decode call (#12833 )	2025-02-06 08:45:44 -08:00
Kevin H. Luu	e152f29502	[misc] Reduce number of config file requests to HuggingFace (#12797 ) Signed-off-by: EC2 Default User <ec2-user@ip-172-31-20-117.us-west-2.compute.internal> Signed-off-by: <> Co-authored-by: EC2 Default User <ec2-user@ip-172-31-20-117.us-west-2.compute.internal>	2025-02-06 14:59:18 +00:00
Lucas Wilkinson	c786e757fa	[Attention] Use FA3 for MLA on Hopper (#12807 ) Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>	2025-02-06 11:43:12 +00:00
Simon Mo	cefd56ee35	[Docs] Add Google Cloud Slides (#12814 )	2025-02-06 01:02:38 -08:00
Dipika Sikka	7ca9934fe7	[Misc] Update w2 scale loading for GPTQMarlinMoE (#12757 )	2025-02-06 01:02:14 -08:00
youkaichao	0408efc6d0	[Misc] Improve error message for incorrect pynvml (#12809 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-02-06 15:23:50 +08:00

1 2 3 4 5 ...

4504 Commits