3311 Commits

Author SHA1 Message Date
Michael Goin
bb01f2915e
[Bugfix][Model] Fix Mllama SDPA illegal memory access for batched multi-image (#9626)
Signed-off-by: mgoin <michael@neuralmagic.com>
2024-10-24 10:03:44 +08:00
Russell Bryant
b548d7a5f4
[CI/Build] Add bot to close stale issues and PRs (#9436) 2024-10-23 15:45:26 -07:00
Yunfei Chu
fc6c274626
[Model] Add Qwen2-Audio model support (#9248)
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-10-23 17:54:22 +00:00
Alex Brooks
150b779081
[Frontend] Enable Online Multi-image Support for MLlama (#9393)
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2024-10-23 17:28:57 +00:00
Yongzao
9013e24f7b
[torch.compile] Adding torch compile annotations to some models (#9614) 2024-10-23 10:07:48 -07:00
Michael Goin
fd0e2cfdb2
[Misc] Separate total and output tokens in benchmark_throughput.py (#8914) 2024-10-23 16:47:20 +00:00
Tyler Michael Smith
e5ac6a4199
[Bugfix] Fix divide by zero when serving Mamba models (#9617)
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
2024-10-23 16:40:43 +00:00
youkaichao
dbdd3b5e5a
[misc] comment to avoid future confusion about baichuan (#9620)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-10-23 09:14:44 -07:00
Cyrus Leung
e7116c017c
[Bugfix] Fix _init_vision_model in NVLM_D model (#9611)
Co-authored-by: Isotr0py <2037008807@qq.com>
2024-10-23 14:09:04 +00:00
Alex Brooks
31a08f5bd2
[Model] Add min_pixels / max_pixels to Qwen2VL as mm_processor_kwargs (#9612)
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
2024-10-23 14:05:18 +00:00
Cyrus Leung
c18e1a3418
[VLM] Enable overriding whether post layernorm is used in vision encoder + fix quant args (#9217)
Co-authored-by: Isotr0py <2037008807@qq.com>
2024-10-23 11:27:37 +00:00
Isotr0py
3ff57ebfca
[Model] Initialize Florence-2 language backbone support (#9555) 2024-10-23 10:42:47 +00:00
Mengqing Cao
2394962d70
[Hardware][XPU] using current_platform.is_xpu (#9605) 2024-10-23 08:28:21 +00:00
Luka Govedič
51c24c9736
[Build] Fix FetchContent multiple build issue (#9596)
Signed-off-by: luka <luka@neuralmagic.com>
2024-10-23 12:43:07 +08:00
Cyrus Leung
831540cf04
[Model] Support E5-V (#9576) 2024-10-23 11:35:29 +08:00
Flex Wang
29061ed9df
[Misc] Add an env var VLLM_LOGGING_PREFIX, if set, it will be prepend to all logging messages (#9590) 2024-10-23 11:17:28 +08:00
Chen Zhang
65050a40e6
[Bugfix] Generate exactly input_len tokens in benchmark_throughput (#9592) 2024-10-22 17:45:35 -07:00
Seth Kimmel
208cb34c81
[Doc]: Update tensorizer docs to include vllm[tensorizer] (#7889)
Co-authored-by: Kaunil Dhruv <dhruv.kaunil@gmail.com>
2024-10-22 15:43:25 -07:00
yulei
b17046e298
[BugFix] Fix metrics error for --num-scheduler-steps > 1 (#8234) 2024-10-22 15:43:03 -07:00
Lucas Wilkinson
d1e8240875
[Bugfix] Fix spurious "No compiled cutlass_scaled_mm ..." for W8A8 on Turing (#9487) 2024-10-22 15:41:13 -07:00
Jeremy Arnold
cb6fdaa0a0
[Misc] Make benchmarks use EngineArgs (#9529) 2024-10-22 15:40:38 -07:00
Aurick Qiao
23b899a8e6
[Bugfix] fix detokenizer shallow copy (#5919) 2024-10-22 15:38:12 -07:00
youkaichao
17c79f3c36
[torch.compile] auto infer dynamic_arg_dims from type annotation (#9589) 2024-10-22 13:43:37 -07:00
Ronen Schaffer
cd5601ac37
[BugFix] Prevent exporting duplicate OpenTelemetry spans (#9017) 2024-10-22 11:11:53 -07:00
Yuhong Guo
434984e665
[Frontend] Support custom request_id from request (#9550)
Co-authored-by: Yuhong Guo <yuhong.gyh@antgroup.com>
2024-10-22 18:07:30 +00:00
Yuan
32a1ee74a0
[Hardware][Intel CPU][DOC] Update docs for CPU backend (#6212)
Signed-off-by: Yuan Zhou <yuan.zhou@intel.com>
Co-authored-by: Rafael Vasquez <rafvasq21@gmail.com>
Co-authored-by: Gubrud, Aaron D <aaron.d.gubrud@intel.com>
Co-authored-by: adgubrud <96072084+adgubrud@users.noreply.github.com>
2024-10-22 10:38:04 -07:00
gopalsarda
08075c3448
[Bugfix] Eagle: change config name for fc bias (#9580) 2024-10-22 16:14:22 +00:00
Isotr0py
bb392ea2d2
[Model][VLM] Initialize support for Mono-InternVL model (#9528) 2024-10-22 16:01:46 +00:00
xendo
9dbcce84a7
[Neuron] [Bugfix] Fix neuron startup (#9374)
Co-authored-by: Jerzy Zagorski <jzagorsk@amazon.com>
2024-10-22 12:51:41 +00:00
Jee Jee Li
a48e3ec052
[CI/Build][LoRA] Temporarily fix long context failure issue (#9579) 2024-10-22 11:32:51 +00:00
Woosuk Kwon
6c5af09b39
[V1] Implement vLLM V1 [1/N] (#9289) 2024-10-22 01:24:07 -07:00
wangshuai09
3ddbe25502
[Hardware][CPU] using current_platform.is_cpu (#9536) 2024-10-22 00:50:43 -07:00
chenqianfzh
0d02747f2e
support TP in qwen2 bnb (#9574) 2024-10-22 07:13:23 +00:00
Rafael Vasquez
f7db5f0fa9
[Doc] Use shell code-blocks and fix section headers (#9508)
Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>
2024-10-22 06:43:24 +00:00
Kuntai Du
ca30c3c84b
[Core] Remove evictor_v1 (#9572) 2024-10-22 04:55:49 +00:00
Wallas Henrique
c0292211ce
[CI/Build] Replaced some models on tests for smaller ones (#9570)
Signed-off-by: Wallas Santos <wallashss@ibm.com>
2024-10-22 04:52:14 +00:00
Falko1
74692421f7
[Bugfix]: phi.py get rope_theta from config file (#9503)
Co-authored-by: Isotr0py <2037008807@qq.com>
2024-10-22 02:53:36 +00:00
ngrozae
29acd2c34c
[Bugfix][OpenVINO] fix_dockerfile_openvino (#9552) 2024-10-21 19:47:52 -07:00
Cyrus Leung
f085995a7b
[CI/Build] Remove unnecessary fork_new_process (#9484) 2024-10-21 19:47:29 -07:00
Travis Johnson
b729901139
[Bugfix]: serialize config by value for --trust-remote-code (#6751)
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2024-10-21 19:46:24 -07:00
youkaichao
76a5e13270
[core] move parallel sampling out from vllm core (#9302) 2024-10-22 00:31:44 +00:00
Joe Runde
ef7faad1b8
🐛 Fixup more test failures from memory profiling (#9563)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
2024-10-21 17:10:56 -07:00
Kuntai Du
575dcebe9a
[CI] Make format checker error message more user-friendly by using emoji (#9564)
This PR makes format checker error message more user-friendly by adding emojis.
2024-10-21 23:45:15 +00:00
Wallas Henrique
711f3a7806
[Frontend] Don't log duplicate error stacktrace for every request in the batch (#9023)
Signed-off-by: Wallas Santos <wallashss@ibm.com>
2024-10-21 14:49:41 -07:00
Nick Hill
15713e3b75
[BugFix] Update draft model TP size check to allow matching target TP size (#9394)
Co-authored-by: Baoyuan Qi <qibaoyuan@126.com>
2024-10-21 14:14:29 -07:00
youkaichao
d621c43df7
[doc] fix format (#9562) 2024-10-21 13:54:57 -07:00
Nick Hill
9d9186be97
[Frontend] Reduce frequency of client cancellation checking (#7959) 2024-10-21 13:28:10 -07:00
Michael Goin
5241aa1494
[Model][Bugfix] Fix batching with multi-image in PixtralHF (#9518) 2024-10-21 14:20:07 -04:00
Varad Ahirwadkar
ec6bd6c4c6
[BugFix] Use correct python3 binary in Docker.ppc64le entrypoint (#9492)
Signed-off-by: Varad Ahirwadkar <varad.ahirwadkar1@ibm.com>
2024-10-21 17:43:02 +00:00
yudian0504
8ca8954841
[Bugfix][Misc]: fix graph capture for decoder (#9549) 2024-10-21 17:33:30 +00:00