211 Commits

Author SHA1 Message Date
elijah
c6db21313c
bugfix: Fix signature mismatch in benchmark's get_tokenizer function (#11982)
Signed-off-by: elijah <f1renze.142857@gmail.com>
2025-01-13 15:22:07 +00:00
minmin
8a579408f3
[Misc] Update benchmark_prefix_caching.py fixed example usage (#11920)
Signed-off-by: Ren MinMin <renmm6@chinaunicom.cn>
Co-authored-by: Ren MinMin <renmm6@chinaunicom.cn>
2025-01-10 20:39:22 +00:00
Kuntai Du
5959564f94
Doc fix in benchmark_long_document_qa_throughput.py (#11933)
Signed-off-by: Kuntai Du <kuntai@uchicago.edu>
2025-01-10 23:51:43 +08:00
Ye (Charlotte) Qi
1d967acb45
[Bugfix] fix beam search input errors and latency benchmark script (#11875)
Signed-off-by: Ye Qi <yeq@meta.com>
Co-authored-by: yeq <yeq@devgpu004.lla3.facebook.com>
2025-01-09 17:36:39 +08:00
Divakar Verma
4d29e91be8
[Misc] sort torch profiler table by kernel timing (#11813) 2025-01-08 10:57:04 +08:00
Yihua Cheng
0c6f998554
[Benchmark] Add benchmark script for CPU offloading (#11533)
Signed-off-by: ApostaC <yihua98@uchicago.edu>
Co-authored-by: KuntaiDu <kuntai@uchicago.edu>
2025-01-01 00:10:55 +00:00
Jiaxin Shan
fc601665eb
[Misc] Update disaggregation benchmark scripts and test logs (#11456)
Signed-off-by: Jiaxin Shan <seedjeffwan@gmail.com>
2024-12-25 06:58:48 +00:00
Varun Sundar Rabindranath
98356735ac
[misc] benchmark_throughput : Add LoRA (#11267)
Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
2024-12-19 15:43:16 +08:00
Dipika Sikka
60508ffda9
[Kernel]: Cutlass 2:4 Sparsity + FP8/Int8 Quant Support (#10995)
Co-authored-by: Faraz Shahsavan <faraz.shahsavan@gmail.com>
Co-authored-by: ilmarkov <markovilya197@gmail.com>
Co-authored-by: Rahul Tuli <rahul@neuralmagic.com>
Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
2024-12-18 09:57:16 -05:00
Roger Wang
02222a0256
[Misc] Kernel Benchmark for RMSNorm (#11241)
Signed-off-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Xiaoyu Zhang <BBuf@users.noreply.github.com>
2024-12-17 06:57:02 +00:00
Alexander Matveev
238c0d93b4
[Misc] Add tokenizer_mode param to benchmark_serving.py (#11174)
Signed-off-by: Alexander Matveev <alexm@neuralmagic.com>
2024-12-13 16:19:10 +00:00
Luka Govedič
30870b4f66
[torch.compile] Dynamic fp8 + rms_norm fusion (#10906)
Signed-off-by: luka <luka@neuralmagic.com>
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
2024-12-13 03:19:23 +00:00
Chendi.Xue
82eb5ea8f3
Benchmark serving structured output (#10880)
Signed-off-by: Chendi Xue <chendi.xue@intel.com>
Co-authored-by: Michael Goin <michael@neuralmagic.com>
2024-12-04 16:28:21 -05:00
Chendi.Xue
381ac93bb5
[Benchmark] Benchmark structured output with datasets (#10557)
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
Signed-off-by: Chendi Xue <chendi.xue@intel.com>
Co-authored-by: Aaron Pham <contact@aarnphm.xyz>
2024-12-03 17:21:06 -07:00
Michael Goin
4433195ab7
[Bugfix] Prevent benchmark_throughput.py from using duplicated random prompts (#10753) 2024-12-03 02:26:15 +00:00
Kuntai Du
0590ec3fd9
[Core] Implement disagg prefill by StatelessProcessGroup (#10502)
This PR provides initial support for single-node disaggregated prefill in 1P1D scenario.
Signed-off-by: KuntaiDu <kuntai@uchicago.edu>
Co-authored-by: ApostaC <yihua98@uchicago.edu>
Co-authored-by: YaoJiayi <120040070@link.cuhk.edu.cn>
2024-12-01 19:01:00 -06:00
Roger Wang
c11f172187
[Misc] Adding MMMU-Pro vision dataset to serving benchmark (#10804)
Signed-off-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Chen Zhang <zhangch99@outlook.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
2024-12-01 08:47:05 +00:00
Wang, Yi
8a93a598d9
fix the issue that len(tokenizer(prompt)["input_ids"]) > prompt_len (#10524)
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2024-11-21 11:15:36 +00:00
ElizaWszola
b00b33d77e
[Model][Quantization] HQQ support through Marlin kernel expansion (#9766)
Signed-off-by: ElizaWszola <eliza@neuralmagic.com>
2024-11-19 13:31:12 -08:00
Ricky Xu
90a6c759ca
[misc] partial prefix & random input generation benchmark (#9929)
Signed-off-by: rickyx <rickyx@anyscale.com>
2024-11-18 15:39:14 -08:00
Lucas Wilkinson
96d999fbe8
[Kernel] Initial Machete W4A8 support + Refactors (#9855)
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
2024-11-18 12:59:29 -07:00
Jaehyun An
8b6725b0cf
[Misc] Update benchmark to support image_url file or http (#10287)
Signed-off-by: rbbang <anjaehyun87@gmail.com>
2024-11-16 18:15:40 +08:00
Cyrus Leung
f4c2187e29
[Misc] Fix typo in #5895 (#10145)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-11-08 09:07:01 +00:00
DearPlanet
ad39bd640c
[Bugfix] Add error handling when server cannot respond any valid tokens (#5895) 2024-11-08 04:58:37 +00:00
Cody Yu
201fc07730
[V1] Prefix caching (take 2) (#9972)
Signed-off-by: Cody Yu <hao.yu.cody@gmail.com>
2024-11-07 17:34:44 -08:00
Russell Bryant
3be5b26a76
[CI/Build] Add shell script linting using shellcheck (#7925)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2024-11-07 18:17:29 +00:00
Atlas
a62bc0109c
[Misc] Add Gamma-Distribution Request Generation Support for Serving Benchmark. (#10105)
Signed-off-by: Mozhou <spli161006@gmail.com>
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
2024-11-07 11:20:30 +00:00
Aaron Pham
21063c11c7
[CI/Build] drop support for Python 3.8 EOL (#8464)
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
2024-11-06 07:11:55 +00:00
lkchen
d2e80332a7
[Feature] Update benchmark_throughput.py to support image input (#9851)
Signed-off-by: Linkun Chen <github+anyscale@lkchen.net>
Co-authored-by: Linkun Chen <github+anyscale@lkchen.net>
2024-11-05 19:30:02 +00:00
lkchen
9a5664d4a4
[Misc] Refactor benchmark_throughput.py (#9779)
Signed-off-by: Linkun Chen <github+anyscale@lkchen.net>
Co-authored-by: Linkun Chen <lkchen@github.com>
Co-authored-by: Linkun Chen <github+anyscale@lkchen.net>
2024-11-04 14:32:16 -08:00
Tran Quang Dai
ea4adeddc1
[Bugfix] Fix E2EL mean and median stats (#9984)
Signed-off-by: daitran2k1 <tranquangdai7a@gmail.com>
2024-11-04 09:37:58 +00:00
Guillaume Calmettes
abbfb6134d
[Misc][OpenAI] deprecate max_tokens in favor of new max_completion_tokens field for chat completion endpoint (#9837) 2024-10-30 18:15:56 -07:00
wangshuai09
622b7ab955
[Hardware] using current_platform.seed_everything (#9785)
Signed-off-by: wangshuai09 <391746016@qq.com>
2024-10-29 14:47:44 +00:00
youkaichao
32176fee73
[torch.compile] support moe models (#9632)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-10-27 21:58:04 -07:00
Michael Goin
fd0e2cfdb2
[Misc] Separate total and output tokens in benchmark_throughput.py (#8914) 2024-10-23 16:47:20 +00:00
Chen Zhang
65050a40e6
[Bugfix] Generate exactly input_len tokens in benchmark_throughput (#9592) 2024-10-22 17:45:35 -07:00
Jeremy Arnold
cb6fdaa0a0
[Misc] Make benchmarks use EngineArgs (#9529) 2024-10-22 15:40:38 -07:00
Andy Dai
855e0e6f97
[Frontend][Misc] Goodput metric support (#9338) 2024-10-20 18:39:32 +00:00
Russell Bryant
7dbe738d65
[Misc] benchmark: Add option to set max concurrency (#9390)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2024-10-18 11:15:28 -07:00
Kai Wu
d65049daab
[Bugfix] Add random_seed to sample_hf_requests in benchmark_serving script (#9013)
Co-authored-by: Isotr0py <2037008807@qq.com>
2024-10-17 21:11:11 +00:00
Kuntai Du
81ede99ca4
[Core] Deprecating block manager v1 and make block manager v2 default (#8704)
Removing the block manager v1. This is the initial piece of prefix-caching-centric design. In order to achieve prefix-caching-centric design, we need to simplify the code path so that we only use v2 block manager (which has much higher performance on prefix caching).
2024-10-17 11:38:15 -05:00
Cyrus Leung
7e7eae338d
[Misc] Standardize RoPE handling for Qwen2-VL (#9250) 2024-10-16 13:56:17 +08:00
Grace Ho
5d264f4ab8
pass ignore_eos parameter to all benchmark_serving calls (#9349) 2024-10-15 13:30:44 -07:00
Andy Dai
94bf9ae4e9
[Misc] Fix sampling from sonnet for long context case (#9235) 2024-10-11 00:33:16 +00:00
sroy745
f3a507f1d3
[Core] Add an environment variable which needs to be set explicitly to allow BlockSpaceManagerV1 (#9149) 2024-10-10 14:17:17 +08:00
youkaichao
18b296fdb2
[core] remove beam search from the core (#9105) 2024-10-07 05:47:04 +00:00
Brendan Wong
168cab6bbf
[Frontend] API support for beam search (#9087)
Co-authored-by: youkaichao <youkaichao@126.com>
2024-10-05 23:39:03 -07:00
Cody Yu
27302dd584
[Misc] Fix CI lint (#9085) 2024-10-04 16:07:54 -07:00
Andy Dai
0cc566ca8f
[Misc] Add random seed for prefix cache benchmark (#9081) 2024-10-04 21:58:57 +00:00
Kuntai Du
fbb74420e7
[CI] Update performance benchmark: upgrade trt-llm to r24.07, and add SGLang (#7412) 2024-10-04 14:01:44 -07:00