Commit Graph - vllm - Luminance Code Repo

20231088/vllm

Fork 0

e11880deea

[Bugfix] Remove triton do_bench fast_flush arg (#16256) Kebe 2025-04-08 21:51:06 +08:00
9351f91be9

[BugFix][ROCm] Fix GGUF MoE Dispatch Block_Dim for ROCm (#16247) TY-AMD 2025-04-08 20:10:26 +08:00
5a1e1c8353

[Model] use AutoWeightsLoader for phimoe,qwen2_moe,qwen3_moe (#16203) rongfu.leng 2025-04-08 19:05:47 +08:00
69ecaa7c79

[Misc] Add warning for multimodal data in LLM.beam_search (#16241) Alex Brooks 2025-04-08 05:05:27 -06:00
7f00899ff7

[Misc] format and refactor some examples (#16252) Reid 2025-04-08 18:42:32 +08:00
995e3d1f41

[Docs] Add Slides from Singapore Meetup (#16213) Simon Mo 2025-04-08 00:20:22 -07:00
b4ac449a83

[Misc] Merge the logs of pp layers partitions (#16225) Kebe 2025-04-08 15:18:15 +08:00
8e5314a468

[V1] Add disable_chunked_mm_input arg to disable partial mm input prefill (#15837) Michael Goin 2025-04-08 00:24:07 -06:00
87918e40c4

[torch.compile][TPU] Make @support_torch_compile work for XLA backend (#15782) Siyuan Liu 2025-04-07 23:23:53 -07:00
f6b32efb7f

[Bugfix] Fix and reorganize broken GGUF tests and bump gguf version (#16194) Isotr0py 2025-04-08 13:38:13 +08:00
b99733d092

[Bugfix] Do not skip "empty" parts of chats that are parsable (#16219) Michael Goin 2025-04-07 23:14:15 -06:00
05a015d6a5

Add warning for Attention backends that do not support irope yet (#16212) Yong Hoon Shin 2025-04-07 20:59:26 -07:00
ad971af8c7

[Bugfix] fix use-ep bug to enable ep by dp/tp size > 1 (#16161) zxfan-cpu 2025-04-08 11:48:47 +08:00
f2ebb6f541

[V1] Scatter and gather placeholders in the model runner (#16076) Roger Wang 2025-04-07 19:43:41 -07:00
1d01211264

Update BASE_IMAGE to 2.22 release of Neuron (#16218) Satyajith Chilappagari 2025-04-07 19:11:18 -07:00
f94ab12f79

[Misc] Update compressed-tensors to version 0.9.3 (#16196) Miles Williams 2025-04-08 03:09:06 +01:00
a865bc1ca6

[core] do not send error across process (#16174) youkaichao 2025-04-08 10:09:03 +08:00
21802c4b6d

[ROCm][Bugfix][FP8] Make fp8 quant respect fused modules mapping (#16031) Michael Goin 2025-04-07 19:28:14 -06:00
652907b354

Torchao (#14231) Driss Guessous 2025-04-07 16:39:28 -07:00
24f1c01e0f

[Bugfix][V0] XGrammar structured output supports Enum (#15878) leon-seidel 2025-04-08 00:38:25 +02:00
fad6e2538e

[Misc] add description attribute in CLI (#15921) Reid 2025-04-08 06:30:35 +08:00
7f6d47c1a2

[V1][BugFix] Exit properly if engine core fails during startup (#16137) Nick Hill 2025-04-07 15:30:15 -07:00
3147586ebd

[Bugfix] Fix guidance backend for Qwen models (#16210) Benjamin Chislett 2025-04-07 18:15:43 -04:00
ed636d99ca

[Misc] Move Llama 4 projector call into encoder execution (#16201) Roger Wang 2025-04-07 14:02:05 -07:00
090c856d76

[Misc] Human-readable max-model-len cli arg (#16181) Nicolò Lucchesi 2025-04-07 20:40:58 +02:00
ad434d4cfe

Print the warning only once (#16193) Gregory Shtrasberg 2025-04-07 14:30:06 -04:00
66d433b94f

[V1] Revert the default max_num_seqs to V0 values for most hardware (#16158) Cyrus Leung 2025-04-08 01:54:36 +08:00
027b204ff1

[Bugfix] Re-enable support for ChatGLMForConditionalGeneration (#16187) Cyrus Leung 2025-04-07 23:15:58 +08:00
55dcce91df

Upstream Llama4 Support to Main (#16113) Lu Fang 2025-04-07 08:06:27 -07:00
8017c8db7f

[Doc]Update image to latest version (#16186) Robin 2025-04-07 22:17:39 +08:00
dc3529dbf6

[Misc] improve example mlpspeculator and llm_engine_example (#16175) Reid 2025-04-07 19:53:52 +08:00
7699258ef0

[Model] Add Qwen3 and Qwen3MoE (#15289) YamPengLi 2025-04-07 19:06:41 +08:00
e9ba99f296

[V1][Structured Output] Add supports_structured_output() method to Platform (#16148) Shanshan Shen 2025-04-07 19:06:24 +08:00
7c80368710

[VLM] Florence-2 supports online serving (#16164) Isotr0py 2025-04-07 19:04:02 +08:00
95d63f38c0

doc: fix some typos in doc (#16154) yihong 2025-04-07 13:32:06 +08:00
bb8dab821e

[CI] Set max transformers version for Ultravox model test (#16149) Roger Wang 2025-04-06 21:37:58 -07:00
fc0f87768a

[Bugfix] Make dummy encoder prompt padding alternative and add missing warnings (#16129) Isotr0py 2025-04-07 12:07:15 +08:00
0a57386721

[Misc] Update Mistral-3.1 example (#16147) Cyrus Leung 2025-04-07 11:57:37 +08:00
3749e28774

[V1][Minor] Minor simplification for get_computed_blocks (#16139) Woosuk Kwon 2025-04-06 20:38:12 -07:00
86fc2321ff

[Metrics] Add bucket for request_latency, time_to_first_token and time_per_output_token (#15202) Kay Yan 2025-04-07 11:34:51 +08:00
2549c0dfef

Fix requires-python (#16132) Martin Hoyer 2025-04-07 04:22:25 +02:00
b10e519895

[V1][Minor] Optimize get_cached_block (#16135) Woosuk Kwon 2025-04-06 13:48:14 -07:00
9bde5ba127

[TPU] Update PyTorch/XLA (#16130) Chengji Yao 2025-04-06 11:25:55 -07:00
72c8f1ad04

[Misc] update requires-python in pyproject.toml (#16116) Reid 2025-04-06 22:56:34 +08:00
da224daaa9

[Bugfix] add hf_token to EngineArgs (#16093) paolovic 2025-04-06 16:47:33 +02:00
3a100b9278

[Bugfix] LoRA : Fix the order in which the kernels process LoRAs (#16040) Varun Sundar Rabindranath 2025-04-06 10:04:50 -04:00
242a637aea

[Model] use AutoWeightsLoader for stablelm,starcoder2,zamba2 (#16103) rongfu.leng 2025-04-06 20:52:01 +08:00
c2a9671510

[Misc] Improve model redirect to accept json dictionary (#16119) Isotr0py 2025-04-06 20:51:45 +08:00
d5ae4f7f42

[Doc][Bugfix] Add missing EOF in k8s deploy doc (#16025) Paul Schweigert 2025-04-06 08:10:57 -04:00
b6c502a150

[Misc] refactor example eagle (#16100) Reid 2025-04-06 17:42:48 +08:00
9ca710e525

[CI][V1] Fix passing tokenizer as kwarg to validate_guidance_grammar (#16117) Roger Wang 2025-04-06 01:18:00 -07:00
eb07c8cb5b

[Frontend] Fix typo in tool chat templates for llama3.2 and toolace (#14501) Ben Jackson 2025-04-06 00:44:36 -07:00
ba10801961

[Benchmark] Add sampling parameters to benchmark_serving. (#16022) Hyesoo Yang 2025-04-05 21:30:35 -07:00
620fc2d09e

[Model] fix model testing for TeleChat2ForCausalLM and V0 llama4 (#16112) Lucia Fang 2025-04-05 21:23:40 -07:00
29283eaa7e

[Model] use AutoWeightsLoader for phi, gemma, deepseek (#16088) Jonghyun Choe 2025-04-06 12:34:38 +09:00
2fa66ef713

[Bugfix] fix use_atomic_add support of marlin kernel when using v1 engine (#15946) Jinzhen Lin 2025-04-06 11:04:22 +08:00
13affc432d

[Misc] Remove redundant code (#16098) Chauncey 2025-04-06 11:03:50 +08:00
d8f094a92a

[Misc] format output for encoder_decoder.py (#16095) Reid 2025-04-06 10:57:18 +08:00
97ae6d777f

Fix some capitalisations in generated examples doc titles (#16094) Harry Mellor 2025-04-05 14:44:03 +01:00
6baeee70d1

Revert "doc: add info for macos clang errors (#16049)" (#16091) yihong 2025-04-05 19:51:51 +08:00
d2517a4939

[doc] fix 404 (#16082) Reid 2025-04-05 19:39:18 +08:00
6342adc438

fix: support clang17 for macos and fix the real libomp (#16086) yihong 2025-04-05 19:00:12 +08:00
0adba91547

[CI] Fix benchmark script level (#16089) Kevin H. Luu 2025-04-05 03:36:01 -07:00
4285e423a6

[Misc] Auto detect bitsandbytes pre-quantized models (#16027) Tristan Leclercq 2025-04-05 08:30:45 +02:00
63375f0cdb

[V1][Spec Decode] Update N-gram Proposer Interface (#15750) Woosuk Kwon 2025-04-04 16:32:54 -07:00
70ad3f9e98

[Bugfix][TPU] Fix V1 TPU worker for sliding window (#16059) Michael Goin 2025-04-04 17:31:19 -06:00
d6fc629f4d

[Kernel][Minor] Re-fuse triton moe weight application (#16071) bnellnm 2025-04-04 19:27:34 -04:00
af51d80fa1

Revert "[V1] Scatter and gather placeholders in the model runner" (#16075) Roger Wang 2025-04-04 14:50:57 -07:00
f5722a5052

[V1] Scatter and gather placeholders in the model runner (#15712) Cyrus Leung 2025-04-05 05:26:44 +08:00
651cf0fec1

[V1] DP scale-out (1/N): Use zmq ROUTER/DEALER sockets for input queue (#15906) Nick Hill 2025-04-04 12:56:43 -07:00
4dc52e1c53

[CI] Reorganize .buildkite directory (#16001) Kevin H. Luu 2025-04-04 12:16:20 -07:00
4708f13a9c

[Bugfix] Fix default behavior/fallback for pp in v1 (#16057) Michael Goin 2025-04-04 11:58:08 -06:00
a6d042df0a

[ROCm][Bugfix] Bring back fallback to eager mode removed in #14917, but for ROCm only (#15413) Gregory Shtrasberg 2025-04-04 12:40:37 -04:00
40a36ccfeb

[ROCm][Bugfix] Use platform specific FP8 dtype (#15717) Gregory Shtrasberg 2025-04-04 12:40:20 -04:00
ef608c37a7

[Distributed] [ROCM] Fix custom allreduce enable checks (#16010) Ilya Markov 2025-04-04 18:39:08 +02:00
2386803f2a

[CPU] Change default block_size for CPU backend (#16002) Li, Jiang 2025-04-05 00:39:05 +08:00
95862f7b4d

[Benchmark][Doc] Update throughput benchmark and README (#15998) Ziji Shi (Steven) 2025-04-04 09:39:02 -07:00
230b131b54

[Bugfix][kernels] Fix half2float conversion in gguf kernels (#15995) Isotr0py 2025-04-05 00:38:58 +08:00
0812d8dd41

[Hardware][Gaudi][BugFix] fix arguments of hpu fused moe (#15945) liuzhenwei 2025-04-05 00:38:55 +08:00
bf7e3c51ae

[Model] use AutoWeightsLoader for baichuan, gpt-neox, mpt (#15939) Jonghyun Choe 2025-04-05 01:38:52 +09:00
a35a8a8392

[V1][Spec Decode] Avoid logging useless nan metrics (#16023) Mark McLoughlin 2025-04-04 16:52:41 +01:00
4ef0bb1fcf

doc: add info for macos clang errors (#16049) yihong 2025-04-04 22:58:16 +08:00
fadc59c0e6

[TPU][V1] Remove ragged attention kernel parameter hard coding (#16041) Chengji Yao 2025-04-04 04:48:50 -07:00
86cbd2eee9

[Misc] improve gguf check (#15974) Reid 2025-04-04 09:33:36 +08:00
092475f738

[ROCm] Tweak the benchmark script to run on ROCm (#14252) Huy Do 2025-04-03 17:12:48 -07:00
dcc56d62da

[Bugfix] Fix function names in test_block_fp8.py (#16033) bnellnm 2025-04-03 19:01:34 -04:00
f15e70d906

[TPU] Switch Test to Non-Sliding Window (#15981) Robert Shaw 2025-04-03 14:28:45 -07:00
b6be6f8d1e

[TPU] Support sliding window and logit soft capping in the paged attention kernel for TPU. (#15732) iefgnoix 2025-04-03 14:23:28 -07:00
03a70eacaf

Re-enable the AMD Testing for the passing tests. (#15586) Alexei-V-Ivanov-AMD 2025-04-03 13:05:17 -05:00
45b1ff7a25

[Misc][Performance] Advance tpu.txt to the most recent nightly torch … (#16024) yarongmu-google 2025-04-03 10:32:54 -07:00
15ba07ef25

[Minor] Fused experts refactor (#15914) bnellnm 2025-04-03 13:19:38 -04:00
d2b58ca203

[Neuron][kernel] Fuse kv cache into a single tensor (#15911) Liangfu Chen 2025-04-03 09:51:32 -07:00
82e7e19a6e

[SupportsQuant] Chameleon, Chatglm, Commandr (#15952) Kyle Sayers 2025-04-03 11:25:22 -04:00
421c462948

[SupportsQuant] Bert, Blip, Blip2, Bloom (#15573) Kyle Sayers 2025-04-03 11:23:19 -04:00
84884cd9ac

fix: tiny fix make format.sh excutable (#16015) yihong 2025-04-03 23:18:05 +08:00
a43aa183dc

[doc] update contribution link (#15922) Reid 2025-04-03 18:47:31 +08:00
463bbb1835

[Bugfix][V1] Fix bug from putting llm_engine.model_executor in a background process (#15367) wwl2755 2025-04-03 02:32:10 -05:00
5e125e74d1

[misc] improve error message for "Failed to infer device type" (#15994) youkaichao 2025-04-03 14:45:03 +08:00
06f21ce7a5

[Benchmark] Add AIMO Dataset to Benchmark (#15955) Ziji Shi (Steven) 2025-04-02 23:09:18 -07:00
57a810db9c

[ROCM][V0] PA kennel selection when no sliding window provided (#15982) Aleksandr Malyshev 2025-04-02 22:28:44 -07:00

Commit Graph Select branches Hide Pull Requests main Mono Color

Commit Graph

Select branches

Hide Pull Requests

main