ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟
|
a31614e386
|
[ROCm][Quantization][Kernel] Use FP8 FNUZ when OCP flag is 0 or undefined (#13851)
Signed-off-by: Hollow Man <hollowman@opensuse.org>
|
2025-02-27 10:39:10 +08:00 |
|
Lucas Wilkinson
|
f95903909f
|
[Kernel] FlashMLA integration (#13747)
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-02-27 10:35:08 +08:00 |
|
Woosuk Kwon
|
b382a7f28f
|
[BugFix] Make FP8 Linear compatible with torch.compile (#13918)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-02-26 13:48:55 -08:00 |
|
Wallas Henrique
|
4cb6fa0a9c
|
[Bugfix] Backend option to disable xgrammar any_whitespace (#12744)
Signed-off-by: Wallas Santos <wallashss@ibm.com>
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
Co-authored-by: Joe Runde <Joseph.Runde@ibm.com>
|
2025-02-26 10:52:34 -08:00 |
|
Chauncey
|
d08b285adf
|
[Misc] fixed qwen_vl_utils parameter error (#13906)
|
2025-02-26 08:31:53 -08:00 |
|
Chenyaaang
|
b27122acc2
|
[TPU] use torch2.6 with whl package (#13860)
Signed-off-by: Chenyaaang <llccyy1212@gmail.com>
|
2025-02-26 08:18:54 -05:00 |
|
Cyrus Leung
|
934bb99c71
|
[Bugfix] Update expected token counts for Ultravox tests (#13895)
|
2025-02-26 04:56:50 -08:00 |
|
Joe Runde
|
3f808cc044
|
[Bugfix] Do not crash V0 engine on input errors (#13101)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
|
2025-02-26 19:07:29 +08:00 |
|
Brayden Zhong
|
ec8a5e5386
|
[Misc]: Add support for goodput on guided benchmarking + TPOT calculation refactor (#13736)
Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca>
|
2025-02-26 19:06:47 +08:00 |
|
Florian Greinacher
|
215bf150a6
|
[Bugfix] Handle None parameters in Mistral function calls. (#13786)
|
2025-02-26 03:06:21 -08:00 |
|
Harry Mellor
|
0ecdd98031
|
Add comments on accessing kv_cache and attn_metadata (#13887)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-02-26 18:41:02 +08:00 |
|
Cyrus Leung
|
7b700ec8c8
|
[Bugfix] Add test example for Ultravox v0.5 (#13890)
|
2025-02-26 02:31:43 -08:00 |
|
Roger Wang
|
7ca1da020f
|
[Misc] Fix input processing for Ultravox (#13871)
|
2025-02-25 23:56:34 -08:00 |
|
Jee Jee Li
|
5157338ed9
|
[Misc] Improve LoRA spelling (#13831)
|
2025-02-25 23:43:01 -08:00 |
|
Seth Kimmel
|
e206b54331
|
[v0][Core] Use xgrammar shared context to avoid copy overhead for offline engine (#13837)
Signed-off-by: Seth Kimmel <seth.kimmel3@gmail.com>
|
2025-02-26 14:58:24 +08:00 |
|
Sage Moore
|
1d35662e6d
|
[ROCm] Disable chunked prefill/prefix caching when running MLA on non-cuda platforms (#13844)
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-02-26 14:56:58 +08:00 |
|
Albert
|
e656f638de
|
[Doc] fix the incorrect module path of tensorize_vllm_model (#13863)
|
2025-02-25 22:56:19 -08:00 |
|
Harry Mellor
|
145944cb94
|
Improve pipeline partitioning (#13839)
|
2025-02-25 18:53:56 -08:00 |
|
Henry Tsang
|
094b7d9496
|
[Kernel][Build/CI] Bump CUTLASS to 3.8 and add initializers for cutlass epilogues (#13797)
|
2025-02-25 18:52:03 -08:00 |
|
Chenguang Li
|
e1fe7591f2
|
[Misc]Code Cleanup (#13859)
Signed-off-by: noemotiovon <noemotiovon@gmail.com>
Co-authored-by: noemotiovon <noemotiovon@gmail.com>
|
2025-02-26 10:44:30 +08:00 |
|
Lily Liu
|
5629f26df7
|
[V1][Spec Decode] Change Spec Decode Rejection Sampling API (#13729)
|
2025-02-25 18:14:48 -08:00 |
|
Rui Qiao
|
9ba28043b5
|
[misc] Show driver IP info when Ray fails to allocate driver worker (#13858)
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
|
2025-02-26 09:53:43 +08:00 |
|
Harry Mellor
|
24679788ed
|
DeepSeek V2/V3/R1 only place lm_head on last pp rank (#13833)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-02-26 01:24:57 +00:00 |
|
Michael Goin
|
07c4353057
|
[Model] Support Grok1 (#13795)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-02-26 01:07:12 +00:00 |
|
Harry Mellor
|
34e3494e70
|
Fix failing MyGemma2Embedding test (#13820)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-02-25 12:33:03 -08:00 |
|
Liangfu Chen
|
f75aa72732
|
[Neuron] Add custom_ops for neuron backend (#13246)
Signed-off-by: Liangfu Chen <liangfc@amazon.com>
Co-authored-by: George Novack <gnovack@amazon.com>
Co-authored-by: Aoyu Zhang <aoyuzhan@amazon.com>
|
2025-02-25 11:47:49 -08:00 |
|
Chen1022
|
340e39e387
|
Fix string parsing error (#13825)
|
2025-02-25 08:20:29 -08:00 |
|
Cyrus Leung
|
f4133ce4e5
|
[Bugfix] Revert inspection code in #13743 (#13832)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-02-26 00:18:50 +08:00 |
|
Wen Sun
|
6522d55b6f
|
Fix /v1/audio/transcriptions Bad Request Error (#13811)
|
2025-02-25 06:03:33 -08:00 |
|
Isotr0py
|
6ff518626c
|
[Bugfix] Fix deepseek-vl2 inference with more than 2 images (#13818)
|
2025-02-25 06:03:02 -08:00 |
|
Nichols A. Romero
|
fa82074167
|
[Bugfix] Flush TunableOp results before worker processes are destroyed. (#13623)
Signed-off-by: Nichols A. Romero <nick.romero@amd.com>
|
2025-02-25 11:08:20 +00:00 |
|
Junlin Zhou
|
75e9d49796
|
[Bugfix] Initialize attention bias on the same device as Query/Key/Value (#13468)
|
2025-02-25 02:13:09 -08:00 |
|
Chen1022
|
32c3b6bfd1
|
[Misc]Clarify Error Handling for Non-existent Model Paths and HF Repo IDs (#13724)
Signed-off-by: Chen-0210 <chenjincong11@gmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2025-02-25 10:12:19 +00:00 |
|
Jee Jee Li
|
37b6cb4985
|
[CI/Build] Fix V1 LoRA failure (#13767)
|
2025-02-25 02:01:15 -08:00 |
|
Gregory Shtrasberg
|
aabeb2688f
|
[ROCm][Quantization][Kernel] Using HIP FP8 header (#12593)
|
2025-02-25 00:39:59 -08:00 |
|
Jiayi Yao
|
2f42a4888c
|
[Feature] Support KV cache offloading and disagg prefill with LMCache connector. (#12953)
|
2025-02-25 00:38:42 -08:00 |
|
Rui Qiao
|
3173c3b34e
|
[misc] Clean up ray compiled graph type hints (#13731)
|
2025-02-25 00:37:08 -08:00 |
|
Shanshan Shen
|
2d87d7d1ac
|
[Bugfix] Modify modelscope api usage in transformer_utils (#13807)
|
2025-02-25 00:36:07 -08:00 |
|
Russell Bryant
|
aab392774b
|
[Core] xgrammar: Expand list of unsupported jsonschema keywords (#13783)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-02-25 08:21:25 +00:00 |
|
Cyrus Leung
|
6724e79164
|
[Misc] Check that the model can be inspected upon registration (#13743)
|
2025-02-25 00:18:19 -08:00 |
|
Varun Sundar Rabindranath
|
03f48b3db6
|
[Core] LoRA V1 - Add add/pin/list/remove_lora functions (#13705)
|
2025-02-25 00:18:02 -08:00 |
|
Michael Goin
|
4d251ad00e
|
Fix CompressedTensorsWNA16MoE with grouped scales (#13769)
|
2025-02-25 00:17:14 -08:00 |
|
Michael Goin
|
18e505930d
|
[Bugfix] Support MLA for CompressedTensorsWNA16 (#13725)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-02-25 06:10:31 +00:00 |
|
Lucas Wilkinson
|
4a8cfc7551
|
[Bugfix] Fix deepseek-v2 error: "missing 1 required positional argument: 'residual'" (#13802)
|
2025-02-24 20:33:59 -08:00 |
|
Mark McLoughlin
|
bc32bc73aa
|
[V1][Metrics] Implement vllm:lora_requests_info metric (#13504)
|
2025-02-24 20:01:33 -08:00 |
|
wangxiyuan
|
ab1091d5f2
|
[Misc][Attention][Quantization] init property earlier (#13733)
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
|
2025-02-25 03:19:30 +00:00 |
|
Tyler Michael Smith
|
1e15aaef56
|
[Bugfix][Quantization] Fix FP8 + EP (#13784)
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2025-02-25 10:54:17 +08:00 |
|
cjackal
|
51010a1807
|
[Misc] set single whitespace between log sentences (#13771)
Signed-off-by: cjackal <44624812+cjackal@users.noreply.github.com>
|
2025-02-25 10:26:12 +08:00 |
|
Eli Boyarski
|
7196a3b1db
|
[Doc] arg_utils.py: fixed a typo (#13785)
|
2025-02-24 18:23:04 -08:00 |
|
Harry Mellor
|
cdc1fa12eb
|
Remove unused kwargs from model definitions (#13555)
|
2025-02-24 17:13:52 -08:00 |
|