20231088/vllm - vllm - Luminance Code Repo

20231088/vllm

Author	SHA1	Message	Date
ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟	a31614e386	[ROCm][Quantization][Kernel] Use FP8 FNUZ when OCP flag is 0 or undefined (#13851 ) Signed-off-by: Hollow Man <hollowman@opensuse.org>	2025-02-27 10:39:10 +08:00
Lucas Wilkinson	f95903909f	[Kernel] FlashMLA integration (#13747 ) Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com> Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2025-02-27 10:35:08 +08:00
Woosuk Kwon	b382a7f28f	[BugFix] Make FP8 Linear compatible with torch.compile (#13918 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-02-26 13:48:55 -08:00
Wallas Henrique	4cb6fa0a9c	[Bugfix] Backend option to disable xgrammar any_whitespace (#12744 ) Signed-off-by: Wallas Santos <wallashss@ibm.com> Signed-off-by: Joe Runde <Joseph.Runde@ibm.com> Co-authored-by: Joe Runde <Joseph.Runde@ibm.com>	2025-02-26 10:52:34 -08:00
Chauncey	d08b285adf	[Misc] fixed qwen_vl_utils parameter error (#13906 )	2025-02-26 08:31:53 -08:00
Chenyaaang	b27122acc2	[TPU] use torch2.6 with whl package (#13860 ) Signed-off-by: Chenyaaang <llccyy1212@gmail.com>	2025-02-26 08:18:54 -05:00
Cyrus Leung	934bb99c71	[Bugfix] Update expected token counts for Ultravox tests (#13895 )	2025-02-26 04:56:50 -08:00
Joe Runde	3f808cc044	[Bugfix] Do not crash V0 engine on input errors (#13101 ) Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>	2025-02-26 19:07:29 +08:00
Brayden Zhong	ec8a5e5386	[Misc]: Add support for goodput on guided benchmarking + TPOT calculation refactor (#13736 ) Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca>	2025-02-26 19:06:47 +08:00
Florian Greinacher	215bf150a6	[Bugfix] Handle None parameters in Mistral function calls. (#13786 )	2025-02-26 03:06:21 -08:00
Harry Mellor	0ecdd98031	Add comments on accessing `kv_cache` and `attn_metadata` (#13887 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-02-26 18:41:02 +08:00
Cyrus Leung	7b700ec8c8	[Bugfix] Add test example for Ultravox v0.5 (#13890 )	2025-02-26 02:31:43 -08:00
Roger Wang	7ca1da020f	[Misc] Fix input processing for Ultravox (#13871 )	2025-02-25 23:56:34 -08:00
Jee Jee Li	5157338ed9	[Misc] Improve LoRA spelling (#13831 )	2025-02-25 23:43:01 -08:00
Seth Kimmel	e206b54331	[v0][Core] Use xgrammar shared context to avoid copy overhead for offline engine (#13837 ) Signed-off-by: Seth Kimmel <seth.kimmel3@gmail.com>	2025-02-26 14:58:24 +08:00
Sage Moore	1d35662e6d	[ROCm] Disable chunked prefill/prefix caching when running MLA on non-cuda platforms (#13844 ) Signed-off-by: Sage Moore <sage@neuralmagic.com>	2025-02-26 14:56:58 +08:00
Albert	e656f638de	[Doc] fix the incorrect module path of tensorize_vllm_model (#13863 )	2025-02-25 22:56:19 -08:00
Harry Mellor	145944cb94	Improve pipeline partitioning (#13839 )	2025-02-25 18:53:56 -08:00
Henry Tsang	094b7d9496	[Kernel][Build/CI] Bump CUTLASS to 3.8 and add initializers for cutlass epilogues (#13797 )	2025-02-25 18:52:03 -08:00
Chenguang Li	e1fe7591f2	[Misc]Code Cleanup (#13859 ) Signed-off-by: noemotiovon <noemotiovon@gmail.com> Co-authored-by: noemotiovon <noemotiovon@gmail.com>	2025-02-26 10:44:30 +08:00
Lily Liu	5629f26df7	[V1][Spec Decode] Change Spec Decode Rejection Sampling API (#13729 )	2025-02-25 18:14:48 -08:00
Rui Qiao	9ba28043b5	[misc] Show driver IP info when Ray fails to allocate driver worker (#13858 ) Signed-off-by: Rui Qiao <ruisearch42@gmail.com>	2025-02-26 09:53:43 +08:00
Harry Mellor	24679788ed	DeepSeek V2/V3/R1 only place `lm_head` on last pp rank (#13833 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-02-26 01:24:57 +00:00
Michael Goin	07c4353057	[Model] Support Grok1 (#13795 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-02-26 01:07:12 +00:00
Harry Mellor	34e3494e70	Fix failing `MyGemma2Embedding` test (#13820 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-02-25 12:33:03 -08:00
Liangfu Chen	f75aa72732	[Neuron] Add custom_ops for neuron backend (#13246 ) Signed-off-by: Liangfu Chen <liangfc@amazon.com> Co-authored-by: George Novack <gnovack@amazon.com> Co-authored-by: Aoyu Zhang <aoyuzhan@amazon.com>	2025-02-25 11:47:49 -08:00
Chen1022	340e39e387	Fix string parsing error (#13825 )	2025-02-25 08:20:29 -08:00
Cyrus Leung	f4133ce4e5	[Bugfix] Revert inspection code in #13743 (#13832 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-02-26 00:18:50 +08:00
Wen Sun	6522d55b6f	Fix `/v1/audio/transcriptions` Bad Request Error (#13811 )	2025-02-25 06:03:33 -08:00
Isotr0py	6ff518626c	[Bugfix] Fix deepseek-vl2 inference with more than 2 images (#13818 )	2025-02-25 06:03:02 -08:00
Nichols A. Romero	fa82074167	[Bugfix] Flush TunableOp results before worker processes are destroyed. (#13623 ) Signed-off-by: Nichols A. Romero <nick.romero@amd.com>	2025-02-25 11:08:20 +00:00
Junlin Zhou	75e9d49796	[Bugfix] Initialize attention bias on the same device as Query/Key/Value (#13468 )	2025-02-25 02:13:09 -08:00
Chen1022	32c3b6bfd1	[Misc]Clarify Error Handling for Non-existent Model Paths and HF Repo IDs (#13724 ) Signed-off-by: Chen-0210 <chenjincong11@gmail.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2025-02-25 10:12:19 +00:00
Jee Jee Li	37b6cb4985	[CI/Build] Fix V1 LoRA failure (#13767 )	2025-02-25 02:01:15 -08:00
Gregory Shtrasberg	aabeb2688f	[ROCm][Quantization][Kernel] Using HIP FP8 header (#12593 )	2025-02-25 00:39:59 -08:00
Jiayi Yao	2f42a4888c	[Feature] Support KV cache offloading and disagg prefill with LMCache connector. (#12953 )	2025-02-25 00:38:42 -08:00
Rui Qiao	3173c3b34e	[misc] Clean up ray compiled graph type hints (#13731 )	2025-02-25 00:37:08 -08:00
Shanshan Shen	2d87d7d1ac	[Bugfix] Modify modelscope api usage in transformer_utils (#13807 )	2025-02-25 00:36:07 -08:00
Russell Bryant	aab392774b	[Core] xgrammar: Expand list of unsupported jsonschema keywords (#13783 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-02-25 08:21:25 +00:00
Cyrus Leung	6724e79164	[Misc] Check that the model can be inspected upon registration (#13743 )	2025-02-25 00:18:19 -08:00
Varun Sundar Rabindranath	03f48b3db6	[Core] LoRA V1 - Add add/pin/list/remove_lora functions (#13705 )	2025-02-25 00:18:02 -08:00
Michael Goin	4d251ad00e	Fix CompressedTensorsWNA16MoE with grouped scales (#13769 )	2025-02-25 00:17:14 -08:00
Michael Goin	18e505930d	[Bugfix] Support MLA for CompressedTensorsWNA16 (#13725 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-02-25 06:10:31 +00:00
Lucas Wilkinson	4a8cfc7551	[Bugfix] Fix deepseek-v2 error: "missing 1 required positional argument: 'residual'" (#13802 )	2025-02-24 20:33:59 -08:00
Mark McLoughlin	bc32bc73aa	[V1][Metrics] Implement vllm:lora_requests_info metric (#13504 )	2025-02-24 20:01:33 -08:00
wangxiyuan	ab1091d5f2	[Misc][Attention][Quantization] init property earlier (#13733 ) Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-02-25 03:19:30 +00:00
Tyler Michael Smith	1e15aaef56	[Bugfix][Quantization] Fix FP8 + EP (#13784 ) Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>	2025-02-25 10:54:17 +08:00
cjackal	51010a1807	[Misc] set single whitespace between log sentences (#13771 ) Signed-off-by: cjackal <44624812+cjackal@users.noreply.github.com>	2025-02-25 10:26:12 +08:00
Eli Boyarski	7196a3b1db	[Doc] arg_utils.py: fixed a typo (#13785 )	2025-02-24 18:23:04 -08:00
Harry Mellor	cdc1fa12eb	Remove unused kwargs from model definitions (#13555 )	2025-02-24 17:13:52 -08:00

1 2 3 4 5 ...

4826 Commits