Commit Graph - vllm - Luminance Code Repo

20231088/vllm

Fork 0

26507f8973

[Docs] Fix a link and grammar issue in production-stack.md (#16809) main Michael Yao 2025-04-18 14:42:58 +08:00
9c1d5b456d

[Doc] add podman setup instructions for official image (#16796) Nathan Weinberg 2025-04-18 02:10:49 -04:00
e31045f95c

[Bugfix] fix pp for llama4 (#16746) Lucia Fang 2025-04-17 22:51:30 -07:00
aaec845f8e

[ROCm] [Attention] Cleanup ROCm output passing (#16431) Luka Govedič 2025-04-18 01:46:45 -04:00
7bdfd29a35

[Misc] add collect_env to cli and docker image (#16759) rongfu.leng 2025-04-18 13:13:35 +08:00
e78587a64c

Improve-mm-and-pooler-and-decoding-configs (#16789) Harry Mellor 2025-04-18 06:13:32 +01:00
7eb4255628

[BugFix] Accuracy fix for llama4 int4 - improperly casted scales (#16801) Lucas Wilkinson 2025-04-18 01:13:29 -04:00
6a0f547561

Add hardware print to TPU V1 test (#16792) Michael Goin 2025-04-17 23:13:26 -06:00
30ed81b7ca

[V1][Structured Output] Minor modification to _validate_structured_output() (#16748) Shanshan Shen 2025-04-18 13:12:54 +08:00
7a4a5de729

[Misc] Update outdated note: LMCache now supports chunked prefill (#16697) Chauncey 2025-04-18 13:12:42 +08:00
c16fb5dae8

[Doc] Improve help examples for --compilation-config (#16729) Cyrus Leung 2025-04-18 12:22:34 +08:00
e37073efd7

Add property-based testing for vLLM endpoints using an API defined by an OpenAPI 3.1 schema (#16721) Tarun Kumar 2025-04-18 09:38:27 +05:30
183dad7a85

[Attention] Update to lastest FA3 code (#13111) Lucas Wilkinson 2025-04-17 18:14:07 -04:00
3408e47159

[P/D][V1] KV Connector API V1 (#15960) Yihua Cheng 2025-04-17 15:22:40 -05:00
0377b8310b

[MLA] Simplification to batch P/D reordering (#16673) Nick Hill 2025-04-17 13:12:09 -07:00
e4755f7fac

[V1][Metrics] Fix http metrics middleware (#15894) Mark McLoughlin 2025-04-17 20:52:18 +01:00
92edf35826

[ROCM] enable aiter fused moe kernel for llama4 bf16 checkpoints (#16674) Sijia(Jackson) Chen 2025-04-17 11:44:34 -07:00
eb5819b2d9

[V1][TPU] Enable Top K (#15489) Nicolò Lucchesi 2025-04-17 20:18:11 +02:00
5989f4684d

[TPU][V1] Fix padding recompilation when max-num-batched-tokens is not even (#16726) Nicolò Lucchesi 2025-04-17 20:09:57 +02:00
5125d72f02

[Model] use AutoWeightsLoader for olmoe,opt,orion,persimmon,phi3_small (#16548) rongfu.leng 2025-04-18 01:48:31 +08:00
a018e555fd

[Kernel] Add fp8_w8a8 fused MoE kernel tuning configs for DeepSeek V3/R1 on NVIDIA H20 (#16753) Ximingwang-09 2025-04-18 00:01:30 +08:00
6211b92273

[Bugfix]Fix index out of range error in api server log (#16787) Robin 2025-04-18 00:01:07 +08:00
05fcd1b430

[V1][Perf] Faster incremental detokenization (#15137) Nick Hill 2025-04-17 07:45:24 -07:00
7c02d6a137

[Doc] Changed explanation of generation_tokens_total and prompt_tokens_total counter type metrics to avoid confusion (#16784) Insu Kim 2025-04-17 23:10:08 +09:00
11c3b98491

[Doc] Document Matryoshka Representation Learning support (#16770) wang.yuqi 2025-04-17 21:37:37 +08:00
dbe7f07001

[Doc] Make sure to update vLLM when installing latest code (#16781) Cyrus Leung 2025-04-17 20:53:31 +08:00
c69bf4ee06

fix: hyperlink (#16778) Reid 2025-04-17 19:34:20 +08:00
d27ea94034

Improve configs - TokenizerPoolConfig + DeviceConfig (#16603) Harry Mellor 2025-04-17 12:19:42 +01:00
99ed526101

[Misc] refactor examples series - lmcache (#16758) Reid 2025-04-17 19:02:35 +08:00
207da28186

[Doc] Fix a 404 link in installation/cpu.md (#16773) Michael Yao 2025-04-17 18:46:21 +08:00
5b1aca2ae3

[Bugfix] Fix GLM4 model (#16618) intervitens 2025-04-17 13:35:07 +03:00
d8e557b5e5

[doc] add open-webui example (#16747) Reid 2025-04-17 18:27:32 +08:00
61a44a0b22

[Doc] Add more tips to avoid OOM (#16765) Cyrus Leung 2025-04-17 17:54:34 +08:00
a6481525b8

[misc] ignore marlin_moe_wna16 local gen codes (#16760) DefTruth 2025-04-17 17:15:14 +08:00
8cac35ba43

[Ray] Improve documentation on batch inference (#16609) Richard Liaw 2025-04-16 22:19:26 -07:00
9dbf7a2dc1

[V1] Remove log noise when idle (#16735) Russell Bryant 2025-04-17 00:34:08 -04:00
607029e515

[Bugfix] Revert max_prompt_len validation for decoder-only models. (#16741) David Heineman 2025-04-16 21:33:15 -07:00
cb072ce93b

[Bugfix] Update Florence-2 tokenizer to make grounding tasks work (#16734) Isotr0py 2025-04-17 12:17:39 +08:00
95aca283b4

[rocm][V0] fix selection logic for custom PA in V0 (#16426) Divakar Verma 2025-04-16 21:52:11 -05:00
2b05b8ce69

[V1][Frontend] Improve Shutdown And Logs (#11737) Robert Shaw 2025-04-16 22:48:34 -04:00
3c776dcefb

Adding vllm buildkite job for IBM Power (#16679) Aaruni Aggarwal 2025-04-17 08:17:47 +05:30
2cbd4d2999

[V1][Spec Dec Bug Fix] Respect Spec Dec Method Specification (#16636) Bryan Lu 2025-04-16 19:47:26 -07:00
3092375e27

[V1][Performance] Implement custom serializaton for MultiModalKwargs [Rebased] (#16432) Staszek Paśko 2025-04-17 04:28:32 +02:00
3cd91dc955

Help user create custom model for Transformers backend remote code models (#16719) Harry Mellor 2025-04-17 02:05:59 +01:00
8a7368e069

[Misc] Remove redundant comment (#16703) Jade Zheng 2025-04-17 08:44:52 +08:00
93e561ec4d

Improve error for structured output backend selection (#16717) Harry Mellor 2025-04-17 01:35:35 +01:00
e1b004839a

[Hardware] Add processor inputs to platform validation (#16680) Joe Runde 2025-04-16 18:28:42 +02:00
ee378f3d49

[Model] support modernbert (#16648) xsank 2025-04-16 20:30:15 +08:00
e82ee40de3

[Bugfix][Kernel] fix potential cuda graph broken for merge_attn_states kernel (#16693) DefTruth 2025-04-16 18:31:39 +08:00
facbe2a114

[Doc] Improve OOM troubleshooting (#16704) Cyrus Leung 2025-04-16 18:29:48 +08:00
7168920491

[Misc] refactor examples series (#16708) Reid 2025-04-16 18:16:36 +08:00
21378a2323

[CI] Cleanup additional_dependencies: [toml] for pre-commit yapf hook (#16405) Kay Yan 2025-04-16 18:05:31 +08:00
976711d9db

[V1][Structured Output] Move xgrammar related utils to backend_xgrammar.py (#16578) Shanshan Shen 2025-04-16 17:01:36 +08:00
44fa4d556c

[ROCM] Bind triton version to 3.2 in requirements-built.txt (#16664) Sage Moore 2025-04-15 23:05:28 -07:00
3ac98edcb1

[Feature] add model aware kv ops helper (#16020) billishyahao 2025-04-16 14:00:43 +08:00
966c742ed2

Disable remote caching when calling compile_fx (#16611) Richard Zou 2025-04-16 01:18:28 -04:00
0d7d05f4b6

[Misc] Modify LRUCache touch (#16689) Jee Jee Li 2025-04-16 12:51:38 +08:00
96bb8aa68b

[Bugfix] fix gpu docker image mis benchmarks dir (#16628) rongfu.leng 2025-04-16 12:21:14 +08:00
3badb0213b

[Model] Add PLaMo2 (#14323) Shinichi Hemmi 2025-04-16 11:31:30 +09:00
fdcb850f14

[Misc] Enable vLLM to Dynamically Load LoRA from a Remote Server (#10546) Angky William 2025-04-15 15:31:38 -07:00
54a66e5fee

[Misc] Update compressed-tensors WNA16 to support zero-points (#14211) Dipika Sikka 2025-04-15 09:33:51 -04:00
280d62b8a2

[Kernel] Remove redundant Exp calculations (#16123) DefTruth 2025-04-15 20:58:37 +08:00
1666e66443

Add "/server_info" endpoint in api_server to retrieve the vllm_config. (#16572) Xihui Cang 2025-04-15 19:50:38 +08:00
1575c1701a

[CI/Build] Fix LoRA OOM (#16624) Jee Jee Li 2025-04-15 16:38:19 +08:00
6ae996a873

[Misc] refactor argument parsing in examples (#16635) Reid 2025-04-15 16:05:30 +08:00
b590adfdc1

Fix vLLM x torch.compile config caching (#16491) Richard Zou 2025-04-15 02:11:11 -04:00
b4fe16c75b

Add vllm bench [latency, throughput] CLI commands (#16508) Michael Goin 2025-04-15 00:10:35 -06:00
bc5dd4f669

[Bugfix] Fix broken GritLM model and tests (missing pooling_metadata) (#16631) Pooya Davoodi 2025-04-14 23:09:58 -07:00
dbb036cf61

[Bugfix] Fix tests/kernels/test_mamba_ssm_ssd.py (#16623) Tyler Michael Smith 2025-04-15 01:35:38 -04:00
70e7ed841d

[BugFix]: Update minimum pyzmq version (#16549) Taneem Ibrahim 2025-04-14 22:06:03 -05:00
d06ba4ed3f

[Kernel] moe wna16 marlin kernel (#14447) Jinzhen Lin 2025-04-15 11:05:22 +08:00
6b40996ae8

[Core][Bugfix] Fix Offline MM Beam Search (#16390) Alex Brooks 2025-04-14 20:33:02 -06:00
d2020acac7

config check sleep mode support oot platforms (#16562) Shuqiao Li 2025-04-15 07:31:50 +08:00
1eb3c2ed48

[DOC][TPU] Add core idea about avoiding recompilation after warmup (#16614) Chengji Yao 2025-04-14 14:56:06 -07:00
c64ee87267

[Hardware][TPU] Add torchvision to tpu dependency file (#16616) Siyuan Liu 2025-04-14 14:50:46 -07:00
b1308b84a3

[Model][VLM] Add Kimi-VL model support (#16387) courage17340 2025-04-15 05:41:48 +08:00
7b5ecf79bd

s390x: Fix PyArrow build and add CPU test script for Buildkite CI (#16036) Nishan Acharya 2025-04-14 23:25:32 +05:30
9883a18859

Fix triton install condition on CPU (#16600) Harry Mellor 2025-04-14 18:06:01 +01:00
b3f2fddd17

[TPU][V1] Fix exponential padding when max-num-batched-tokens is not a power of 2 (#16596) Nicolò Lucchesi 2025-04-14 19:01:05 +02:00
aa29841ede

[Bugfix] Multi-modal caches not acting like LRU caches (#16593) Cyrus Leung 2025-04-15 00:24:16 +08:00
6bf27affb6

[fix]: Dockerfile.ppc64le fixes for opencv-python and hf-xet (#16048) Md. Shafi Hussain 2025-04-14 21:38:39 +05:30
1dd23386ec

[Misc] Update usage with mooncake lib for kv transfer (#16523) shangmingc 2025-04-14 19:31:37 +08:00
7cbfc10943

[Misc] refactor examples (#16563) Reid 2025-04-14 17:59:15 +08:00
ce4ddd2d1a

[Misc] remove warning if triton>=3.2.0 (#16553) DefTruth 2025-04-14 17:39:47 +08:00
e51929ebca

Improve configs - SchedulerConfig (#16533) Harry Mellor 2025-04-14 10:24:16 +01:00
dc1b4a6f13

[Core][V0] Enable regex support with xgrammar (#13228) Russell Bryant 2025-04-13 22:13:38 -04:00
63d2705edb

[Benchmark][Bugfix] Fix SonnetDataset default values in benchmark_throughput.py (#16556) Jennifer Zhao 2025-04-13 17:20:26 -07:00
d085a44082

Enable PTPC FP8 for CompressedTensorsW8A8Fp8MoEMethod (triton fused_moe) (#16537) Michael Goin 2025-04-13 08:55:18 -06:00
f49e5aff11

[V1][Spec Decode] KV cache slots for eagle heads (#16370) Lily Liu 2025-04-12 19:42:51 -07:00
6c11ecf8d3

[Bugfix] Validate logit biases to prevent out of vocab ids crashing engine (#16529) Ryan McConville 2025-04-12 21:19:19 +01:00
93e5f3c5fb

[Perf] Optimize Preparing Inputs for GPU Model Runner (#16484) SnowCharm 2025-04-12 22:54:37 +08:00
70363bccfa

Fix syntaxWarning: invalid escape sequence '\s' (#16532) Jie Fu (傅杰) 2025-04-12 22:39:42 +08:00
3cdc57669f

[Misc] Delete redundant code (#16530) Jee Jee Li 2025-04-12 19:21:37 +08:00
68bb122eb4

[MISC] Make GroupCoordinator compatible with out-of-tree devices (#16464) Huazhong Ji 2025-04-12 17:20:25 +08:00
d9fc8cd9da

[V1] Enable multi-input by default (#15799) Cyrus Leung 2025-04-12 16:52:39 +08:00
f069f3ea74

[Misc] Openai transcription client example use same Whisper model (#16487) Nicolò Lucchesi 2025-04-12 09:27:03 +02:00
c5bc0e7fcc

[Misc] Update chat utils tests (#16520) Cyrus Leung 2025-04-12 14:48:43 +08:00
4a3a518722

fix: spelling (#16466) Tianer Zhou 2025-04-12 14:24:22 +08:00
fbf722c6e6

[Frontend] support matryoshka representation / support embedding API dimensions (#16331) wang.yuqi 2025-04-12 14:23:10 +08:00
e92d7085bf

[Feature][V1] Add xgrammar to support minLength, maxLength with test (#16516) leon-seidel 2025-04-12 08:22:07 +02:00

Commit Graph Select branches Hide Pull Requests main Mono Color

Commit Graph

Select branches

Hide Pull Requests

main