20231088/vllm - vllm - Luminance Code Repo

20231088/vllm

Author	SHA1	Message	Date
Henry Tsang	094b7d9496	[Kernel][Build/CI] Bump CUTLASS to 3.8 and add initializers for cutlass epilogues (#13797 )	2025-02-25 18:52:03 -08:00
Chenguang Li	e1fe7591f2	[Misc]Code Cleanup (#13859 ) Signed-off-by: noemotiovon <noemotiovon@gmail.com> Co-authored-by: noemotiovon <noemotiovon@gmail.com>	2025-02-26 10:44:30 +08:00
Lily Liu	5629f26df7	[V1][Spec Decode] Change Spec Decode Rejection Sampling API (#13729 )	2025-02-25 18:14:48 -08:00
Rui Qiao	9ba28043b5	[misc] Show driver IP info when Ray fails to allocate driver worker (#13858 ) Signed-off-by: Rui Qiao <ruisearch42@gmail.com>	2025-02-26 09:53:43 +08:00
Harry Mellor	24679788ed	DeepSeek V2/V3/R1 only place `lm_head` on last pp rank (#13833 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-02-26 01:24:57 +00:00
Michael Goin	07c4353057	[Model] Support Grok1 (#13795 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-02-26 01:07:12 +00:00
Harry Mellor	34e3494e70	Fix failing `MyGemma2Embedding` test (#13820 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-02-25 12:33:03 -08:00
Liangfu Chen	f75aa72732	[Neuron] Add custom_ops for neuron backend (#13246 ) Signed-off-by: Liangfu Chen <liangfc@amazon.com> Co-authored-by: George Novack <gnovack@amazon.com> Co-authored-by: Aoyu Zhang <aoyuzhan@amazon.com>	2025-02-25 11:47:49 -08:00
Chen1022	340e39e387	Fix string parsing error (#13825 )	2025-02-25 08:20:29 -08:00
Cyrus Leung	f4133ce4e5	[Bugfix] Revert inspection code in #13743 (#13832 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-02-26 00:18:50 +08:00
Wen Sun	6522d55b6f	Fix `/v1/audio/transcriptions` Bad Request Error (#13811 )	2025-02-25 06:03:33 -08:00
Isotr0py	6ff518626c	[Bugfix] Fix deepseek-vl2 inference with more than 2 images (#13818 )	2025-02-25 06:03:02 -08:00
Nichols A. Romero	fa82074167	[Bugfix] Flush TunableOp results before worker processes are destroyed. (#13623 ) Signed-off-by: Nichols A. Romero <nick.romero@amd.com>	2025-02-25 11:08:20 +00:00
Junlin Zhou	75e9d49796	[Bugfix] Initialize attention bias on the same device as Query/Key/Value (#13468 )	2025-02-25 02:13:09 -08:00
Chen1022	32c3b6bfd1	[Misc]Clarify Error Handling for Non-existent Model Paths and HF Repo IDs (#13724 ) Signed-off-by: Chen-0210 <chenjincong11@gmail.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2025-02-25 10:12:19 +00:00
Jee Jee Li	37b6cb4985	[CI/Build] Fix V1 LoRA failure (#13767 )	2025-02-25 02:01:15 -08:00
Gregory Shtrasberg	aabeb2688f	[ROCm][Quantization][Kernel] Using HIP FP8 header (#12593 )	2025-02-25 00:39:59 -08:00
Jiayi Yao	2f42a4888c	[Feature] Support KV cache offloading and disagg prefill with LMCache connector. (#12953 )	2025-02-25 00:38:42 -08:00
Rui Qiao	3173c3b34e	[misc] Clean up ray compiled graph type hints (#13731 )	2025-02-25 00:37:08 -08:00
Shanshan Shen	2d87d7d1ac	[Bugfix] Modify modelscope api usage in transformer_utils (#13807 )	2025-02-25 00:36:07 -08:00
Russell Bryant	aab392774b	[Core] xgrammar: Expand list of unsupported jsonschema keywords (#13783 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-02-25 08:21:25 +00:00
Cyrus Leung	6724e79164	[Misc] Check that the model can be inspected upon registration (#13743 )	2025-02-25 00:18:19 -08:00
Varun Sundar Rabindranath	03f48b3db6	[Core] LoRA V1 - Add add/pin/list/remove_lora functions (#13705 )	2025-02-25 00:18:02 -08:00
Michael Goin	4d251ad00e	Fix CompressedTensorsWNA16MoE with grouped scales (#13769 )	2025-02-25 00:17:14 -08:00
Michael Goin	18e505930d	[Bugfix] Support MLA for CompressedTensorsWNA16 (#13725 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-02-25 06:10:31 +00:00
Lucas Wilkinson	4a8cfc7551	[Bugfix] Fix deepseek-v2 error: "missing 1 required positional argument: 'residual'" (#13802 )	2025-02-24 20:33:59 -08:00
Mark McLoughlin	bc32bc73aa	[V1][Metrics] Implement vllm:lora_requests_info metric (#13504 )	2025-02-24 20:01:33 -08:00
wangxiyuan	ab1091d5f2	[Misc][Attention][Quantization] init property earlier (#13733 ) Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-02-25 03:19:30 +00:00
Tyler Michael Smith	1e15aaef56	[Bugfix][Quantization] Fix FP8 + EP (#13784 ) Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>	2025-02-25 10:54:17 +08:00
cjackal	51010a1807	[Misc] set single whitespace between log sentences (#13771 ) Signed-off-by: cjackal <44624812+cjackal@users.noreply.github.com>	2025-02-25 10:26:12 +08:00
Eli Boyarski	7196a3b1db	[Doc] arg_utils.py: fixed a typo (#13785 )	2025-02-24 18:23:04 -08:00
Harry Mellor	cdc1fa12eb	Remove unused kwargs from model definitions (#13555 )	2025-02-24 17:13:52 -08:00
Robert Shaw	f61528d46d	[Misc][Chore] Clean Up `AsyncOutputProcessing` Logs (#13780 )	2025-02-24 16:39:07 -08:00
Robert Shaw	1f0ae3ed0a	[Misc] Clean Up `EngineArgs.create_engine_config` (#13734 ) Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>	2025-02-24 13:52:21 -05:00
Michael Goin	db986c19ea	Fix precommit fail in fused_moe intermediate_cache2 chunking (#13772 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-02-24 09:25:47 -08:00
Roger Wang	227578480d	Revert "[V1][Core] Fix memory issue with logits & sampling" (#13775 )	2025-02-24 09:16:05 -08:00
afeldman-nm	befc402d34	[V1] V1 engine implements parallel sampling (AsyncLLM and LLMEngine) (#10980 ) Signed-off-by: Andrew Feldman <afeldman@neuralmagic.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-02-24 08:29:41 -08:00
Nicolò Lucchesi	444b0f0f62	[Misc][Docs] Raise error when flashinfer is not installed and `VLLM_ATTENTION_BACKEND` is set (#12513 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-02-24 10:43:21 -05:00
Zhonghua Deng	ccc00515fd	[BugFix] Illegal memory access for MoE On H20 (#13693 )	2025-02-24 07:37:32 -08:00
Jongseok Park	781096e385	Expert Parallelism (EP) Support for DeepSeek V2 (#12583 )	2025-02-24 07:33:20 -08:00
Roger Meier	7940d8a6a7	[CI/Build] add python-json-logger to requirements-common (#12842 )	2025-02-24 06:10:33 -08:00
Roger Meier	c0e3ecd6d2	[Bugfix] fix(logging): add missing opening square bracket (#13011 )	2025-02-24 06:10:25 -08:00
Mengqing Cao	23eca9cf68	[model][refactor] remove cuda hard code in models and layers (#13658 )	2025-02-24 06:10:14 -08:00
Roger Wang	437b76ff59	[V1][Core] Fix memory issue with logits & sampling (#13721 )	2025-02-24 06:10:06 -08:00
Kevin H. Luu	f90a375593	[ci] Add logic to change model to S3 path only when S3 CI env var is on (#13727 ) Signed-off-by: <> Co-authored-by: EC2 Default User <ec2-user@ip-172-31-63-253.us-west-2.compute.internal>	2025-02-24 06:32:11 +00:00
Huy Do	e7ef74e26e	Fix some issues with benchmark data output (#13641 ) Signed-off-by: Huy Do <huydhn@gmail.com>	2025-02-24 10:23:18 +08:00
Nick Hill	cbae7af552	[V1][BugFix] Fix engine core client shutdown hangs (#13298 ) Even though ZMQ context.destroy() is meant to close open sockets before terminating the context, it appears to be necessary to do this explicitly or else it can hang in the context.term() method. Close zmq sockets explicitly before terminating context, make shutdown of client resource more robust, shut down engine core process prior to terminating zmq context. Signed-off-by: Nick Hill <nhill@redhat.com>	2025-02-23 13:07:43 -08:00
youkaichao	eb24dc4a45	[v1] torchrun compatibility (#13642 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-02-23 22:47:24 +08:00
Roger Wang	9bebc9512f	[Misc] Deprecate `--dataset` from `benchmark_serving.py` (#13708 ) Signed-off-by: Roger Wang <ywang@roblox.com>	2025-02-23 13:32:20 +00:00
Nick Hill	5a2ba16f5c	[Core][Distributed] Use IPC (domain socket) ZMQ socket for local comms (#13688 )	2025-02-23 02:54:29 -08:00

1 2 3 4 5 ...

4808 Commits