55 Commits

Author SHA1 Message Date
gujing
936db119ed
benchmark_serving support --served-model-name param (#12109)
Signed-off-by: zibai <zibai.gj@alibaba-inc.com>
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
2025-01-19 09:59:56 +00:00
Alexander Matveev
238c0d93b4
[Misc] Add tokenizer_mode param to benchmark_serving.py (#11174)
Signed-off-by: Alexander Matveev <alexm@neuralmagic.com>
2024-12-13 16:19:10 +00:00
Roger Wang
c11f172187
[Misc] Adding MMMU-Pro vision dataset to serving benchmark (#10804)
Signed-off-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Chen Zhang <zhangch99@outlook.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
2024-12-01 08:47:05 +00:00
Jaehyun An
8b6725b0cf
[Misc] Update benchmark to support image_url file or http (#10287)
Signed-off-by: rbbang <anjaehyun87@gmail.com>
2024-11-16 18:15:40 +08:00
Atlas
a62bc0109c
[Misc] Add Gamma-Distribution Request Generation Support for Serving Benchmark. (#10105)
Signed-off-by: Mozhou <spli161006@gmail.com>
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
2024-11-07 11:20:30 +00:00
Tran Quang Dai
ea4adeddc1
[Bugfix] Fix E2EL mean and median stats (#9984)
Signed-off-by: daitran2k1 <tranquangdai7a@gmail.com>
2024-11-04 09:37:58 +00:00
Andy Dai
855e0e6f97
[Frontend][Misc] Goodput metric support (#9338) 2024-10-20 18:39:32 +00:00
Russell Bryant
7dbe738d65
[Misc] benchmark: Add option to set max concurrency (#9390)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2024-10-18 11:15:28 -07:00
Kai Wu
d65049daab
[Bugfix] Add random_seed to sample_hf_requests in benchmark_serving script (#9013)
Co-authored-by: Isotr0py <2037008807@qq.com>
2024-10-17 21:11:11 +00:00
Grace Ho
5d264f4ab8
pass ignore_eos parameter to all benchmark_serving calls (#9349) 2024-10-15 13:30:44 -07:00
Andy Dai
94bf9ae4e9
[Misc] Fix sampling from sonnet for long context case (#9235) 2024-10-11 00:33:16 +00:00
youkaichao
18b296fdb2
[core] remove beam search from the core (#9105) 2024-10-07 05:47:04 +00:00
Kuntai Du
fbb74420e7
[CI] Update performance benchmark: upgrade trt-llm to r24.07, and add SGLang (#7412) 2024-10-04 14:01:44 -07:00
vlsav
22f5851b80
Update benchmark_serving.py to read and write json-datasets, results in UTF8, for better compatibility with Windows (#8997) 2024-10-01 11:07:06 -07:00
Chen Zhang
e585b583a9
[Bugfix] Support testing prefill throughput with benchmark_serving.py --hf-output-len 1 (#8891) 2024-09-28 18:51:22 +00:00
Peter Pan
0e088750af
[MISC] Fix invalid escape sequence '\' (#8830)
Signed-off-by: Peter Pan <Peter.Pan@daocloud.io>
2024-09-27 01:13:25 -07:00
Kuntai Du
c52ec5f034
[Bugfix] fixing sonnet benchmark bug in benchmark_serving.py (#8616) 2024-09-19 05:24:24 +00:00
Isotr0py
1b6de8352b
[Benchmark] Support sample from HF datasets and image input for benchmark_serving (#8495) 2024-09-17 07:34:27 +00:00
Wei-Sheng Chin
795b662cff
Enable Random Prefix Caching in Serving Profiling Tool (benchmark_serving.py) (#8241) 2024-09-06 20:18:16 -07:00
afeldman-nm
e5cab71531
[Frontend] Add --logprobs argument to benchmark_serving.py (#8191) 2024-09-06 09:01:14 -07:00
Cody Yu
77d9e514a2
[MISC] Replace input token throughput with total token throughput (#8164)
Co-authored-by: Michael Goin <michael@neuralmagic.com>
2024-09-04 20:23:22 +00:00
Wei-Sheng Chin
0c785d344d
Add more percentiles and latencies (#7759) 2024-08-29 16:48:11 -07:00
William Lin
dd53c4b023
[misc] Add Torch profiler support (#7451)
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>
2024-08-21 15:39:26 -07:00
Fish
ccb20db8bd
[Bugfix] Benchmark serving script used global parameter 'args' in function 'sample_random_requests' (#6428) 2024-07-14 19:27:01 -07:00
Ethan Xu
dbfe254eda
[Feature] vLLM CLI (#5090)
Co-authored-by: simon-mo <simon.mo@hey.com>
2024-07-14 15:36:43 -07:00
Kuntai Du
a4feba929b
[CI/Build] Add nightly benchmarking for tgi, tensorrt-llm and lmdeploy (#5362) 2024-07-11 13:28:38 -07:00
Haichuan
717f4bcea0
Feature/add benchmark testing (#5947)
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-07-08 07:52:06 +00:00
Haichuan
333306a252
add benchmark for fix length input and output (#5857)
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-07-07 07:42:13 +00:00
Michael Goin
8065a7e220
[Frontend] Add FlexibleArgumentParser to support both underscore and dash in names (#5718) 2024-06-20 17:00:13 -06:00
zhyncs
1f12122b17
[Misc] use AutoTokenizer for benchmark serving when vLLM not installed (#5588) 2024-06-17 09:40:35 -07:00
Cyrus Leung
0e9164b40a
[mypy] Enable type checking for test directory (#5017) 2024-06-15 04:45:31 +00:00
Kuntai Du
319ad7f1d3
[CI/Build][Misc] Add CI that benchmarks vllm performance on those PRs with perf-benchmarks label (#5073)
Co-authored-by: simon-mo <simon.mo@hey.com>
2024-06-13 22:36:20 -07:00
Tyler Michael Smith
02cc3b51a7
[misc] benchmark_serving.py -- add ITL results and tweak TPOT results (#5263) 2024-06-05 10:17:51 -07:00
Roger Wang
f17a1a8f96
[Misc] Make Serving Benchmark More User-friendly (#5044) 2024-05-25 17:28:16 +00:00
Kuntai Du
c3af44722c
[Doc]Add documentation to benchmarking script when running TGI (#4920) 2024-05-20 20:16:57 +00:00
Roger Wang
7923dcad12
[Misc] Update ShareGPT Dataset Sampling in Serving Benchmark (#4279) 2024-04-24 09:49:13 -07:00
Chang Su
819a309c0f
[Bugfix] Fix args in benchmark_serving (#3836)
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-04-04 07:41:05 +00:00
Roger Wang
45b6ef6513
feat(benchmarks): Add Prefix Caching Benchmark to Serving Benchmark (#3277) 2024-03-27 13:39:26 -07:00
SangBin Cho
01bfb22b41
[CI] Try introducing isort. (#3495) 2024-03-25 07:59:47 -07:00
Simon Mo
8e67598aa6
[Misc] fix line length for entire codebase (#3444) 2024-03-16 00:36:29 -07:00
TianYu GUO
1ece1ae829
[Minor Fix] Fix comments in benchmark_serving (#3252) 2024-03-07 22:22:59 -08:00
Massimiliano Pronesti
93dc5a2870
chore(vllm): codespell for spell checking (#2820) 2024-02-21 18:56:01 -08:00
Ronen Schaffer
d7f396486e
Update comment (#2934) 2024-02-21 18:18:37 -08:00
Roger Wang
a4211a4dc3
Serving Benchmark Refactoring (#2433) 2024-02-12 22:53:00 -08:00
Simon Mo
1e4277d2d1
lint: format all python file instead of just source code (#2567) 2024-01-23 15:53:06 -08:00
Harry Mellor
63e835cbcc
Fix progress bar and allow HTTPS in benchmark_serving.py (#2552) 2024-01-22 14:40:31 -08:00
Harry Mellor
2709c0009a
Support OpenAI API server in benchmark_serving.py (#2172) 2024-01-18 20:34:08 -08:00
Antoni Baum
acbed3ef40
Use monotonic time where appropriate (#1249) 2023-10-02 19:22:05 -07:00
Ricardo Lu
8c4b2592fb
fix: enable trust-remote-code in api server & benchmark. (#509) 2023-07-19 17:06:15 -07:00
Woosuk Kwon
4338cc4750
[Tokenizer] Add an option to specify tokenizer (#284) 2023-06-28 09:46:58 -07:00