youkaichao
|
18b296fdb2
|
[core] remove beam search from the core (#9105)
|
2024-10-07 05:47:04 +00:00 |
|
Kuntai Du
|
fbb74420e7
|
[CI] Update performance benchmark: upgrade trt-llm to r24.07, and add SGLang (#7412)
|
2024-10-04 14:01:44 -07:00 |
|
vlsav
|
22f5851b80
|
Update benchmark_serving.py to read and write json-datasets, results in UTF8, for better compatibility with Windows (#8997)
|
2024-10-01 11:07:06 -07:00 |
|
Chen Zhang
|
e585b583a9
|
[Bugfix] Support testing prefill throughput with benchmark_serving.py --hf-output-len 1 (#8891)
|
2024-09-28 18:51:22 +00:00 |
|
Peter Pan
|
0e088750af
|
[MISC] Fix invalid escape sequence '\' (#8830)
Signed-off-by: Peter Pan <Peter.Pan@daocloud.io>
|
2024-09-27 01:13:25 -07:00 |
|
Kuntai Du
|
c52ec5f034
|
[Bugfix] fixing sonnet benchmark bug in benchmark_serving.py (#8616)
|
2024-09-19 05:24:24 +00:00 |
|
Isotr0py
|
1b6de8352b
|
[Benchmark] Support sample from HF datasets and image input for benchmark_serving (#8495)
|
2024-09-17 07:34:27 +00:00 |
|
Wei-Sheng Chin
|
795b662cff
|
Enable Random Prefix Caching in Serving Profiling Tool (benchmark_serving.py) (#8241)
|
2024-09-06 20:18:16 -07:00 |
|
afeldman-nm
|
e5cab71531
|
[Frontend] Add --logprobs argument to benchmark_serving.py (#8191)
|
2024-09-06 09:01:14 -07:00 |
|
Cody Yu
|
77d9e514a2
|
[MISC] Replace input token throughput with total token throughput (#8164)
Co-authored-by: Michael Goin <michael@neuralmagic.com>
|
2024-09-04 20:23:22 +00:00 |
|
Wei-Sheng Chin
|
0c785d344d
|
Add more percentiles and latencies (#7759)
|
2024-08-29 16:48:11 -07:00 |
|
William Lin
|
dd53c4b023
|
[misc] Add Torch profiler support (#7451)
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>
|
2024-08-21 15:39:26 -07:00 |
|
Fish
|
ccb20db8bd
|
[Bugfix] Benchmark serving script used global parameter 'args' in function 'sample_random_requests' (#6428)
|
2024-07-14 19:27:01 -07:00 |
|
Ethan Xu
|
dbfe254eda
|
[Feature] vLLM CLI (#5090)
Co-authored-by: simon-mo <simon.mo@hey.com>
|
2024-07-14 15:36:43 -07:00 |
|
Kuntai Du
|
a4feba929b
|
[CI/Build] Add nightly benchmarking for tgi, tensorrt-llm and lmdeploy (#5362)
|
2024-07-11 13:28:38 -07:00 |
|
Haichuan
|
717f4bcea0
|
Feature/add benchmark testing (#5947)
Co-authored-by: Roger Wang <ywang@roblox.com>
|
2024-07-08 07:52:06 +00:00 |
|
Haichuan
|
333306a252
|
add benchmark for fix length input and output (#5857)
Co-authored-by: Roger Wang <ywang@roblox.com>
|
2024-07-07 07:42:13 +00:00 |
|
Michael Goin
|
8065a7e220
|
[Frontend] Add FlexibleArgumentParser to support both underscore and dash in names (#5718)
|
2024-06-20 17:00:13 -06:00 |
|
zhyncs
|
1f12122b17
|
[Misc] use AutoTokenizer for benchmark serving when vLLM not installed (#5588)
|
2024-06-17 09:40:35 -07:00 |
|
Cyrus Leung
|
0e9164b40a
|
[mypy] Enable type checking for test directory (#5017)
|
2024-06-15 04:45:31 +00:00 |
|
Kuntai Du
|
319ad7f1d3
|
[CI/Build][Misc] Add CI that benchmarks vllm performance on those PRs with perf-benchmarks label (#5073)
Co-authored-by: simon-mo <simon.mo@hey.com>
|
2024-06-13 22:36:20 -07:00 |
|
Tyler Michael Smith
|
02cc3b51a7
|
[misc] benchmark_serving.py -- add ITL results and tweak TPOT results (#5263)
|
2024-06-05 10:17:51 -07:00 |
|
Roger Wang
|
f17a1a8f96
|
[Misc] Make Serving Benchmark More User-friendly (#5044)
|
2024-05-25 17:28:16 +00:00 |
|
Kuntai Du
|
c3af44722c
|
[Doc]Add documentation to benchmarking script when running TGI (#4920)
|
2024-05-20 20:16:57 +00:00 |
|
Roger Wang
|
7923dcad12
|
[Misc] Update ShareGPT Dataset Sampling in Serving Benchmark (#4279)
|
2024-04-24 09:49:13 -07:00 |
|
Chang Su
|
819a309c0f
|
[Bugfix] Fix args in benchmark_serving (#3836)
Co-authored-by: Roger Wang <ywang@roblox.com>
|
2024-04-04 07:41:05 +00:00 |
|
Roger Wang
|
45b6ef6513
|
feat(benchmarks): Add Prefix Caching Benchmark to Serving Benchmark (#3277)
|
2024-03-27 13:39:26 -07:00 |
|
SangBin Cho
|
01bfb22b41
|
[CI] Try introducing isort. (#3495)
|
2024-03-25 07:59:47 -07:00 |
|
Simon Mo
|
8e67598aa6
|
[Misc] fix line length for entire codebase (#3444)
|
2024-03-16 00:36:29 -07:00 |
|
TianYu GUO
|
1ece1ae829
|
[Minor Fix] Fix comments in benchmark_serving (#3252)
|
2024-03-07 22:22:59 -08:00 |
|
Massimiliano Pronesti
|
93dc5a2870
|
chore(vllm): codespell for spell checking (#2820)
|
2024-02-21 18:56:01 -08:00 |
|
Ronen Schaffer
|
d7f396486e
|
Update comment (#2934)
|
2024-02-21 18:18:37 -08:00 |
|
Roger Wang
|
a4211a4dc3
|
Serving Benchmark Refactoring (#2433)
|
2024-02-12 22:53:00 -08:00 |
|
Simon Mo
|
1e4277d2d1
|
lint: format all python file instead of just source code (#2567)
|
2024-01-23 15:53:06 -08:00 |
|
Harry Mellor
|
63e835cbcc
|
Fix progress bar and allow HTTPS in benchmark_serving.py (#2552)
|
2024-01-22 14:40:31 -08:00 |
|
Harry Mellor
|
2709c0009a
|
Support OpenAI API server in benchmark_serving.py (#2172)
|
2024-01-18 20:34:08 -08:00 |
|
Antoni Baum
|
acbed3ef40
|
Use monotonic time where appropriate (#1249)
|
2023-10-02 19:22:05 -07:00 |
|
Ricardo Lu
|
8c4b2592fb
|
fix: enable trust-remote-code in api server & benchmark. (#509)
|
2023-07-19 17:06:15 -07:00 |
|
Woosuk Kwon
|
4338cc4750
|
[Tokenizer] Add an option to specify tokenizer (#284)
|
2023-06-28 09:46:58 -07:00 |
|
Zhuohan Li
|
43710e8d09
|
[Fix] Fix default port number in benchmark scripts (#265)
|
2023-06-26 13:15:35 -07:00 |
|
Woosuk Kwon
|
3f92038b99
|
Add comments on swap space (#154)
|
2023-06-18 11:39:35 -07:00 |
|
Woosuk Kwon
|
0b98ba15c7
|
Change the name to vLLM (#150)
|
2023-06-17 03:07:40 -07:00 |
|
Zhuohan Li
|
e5464ee484
|
Rename servers to engines (#152)
|
2023-06-17 17:25:21 +08:00 |
|
Woosuk Kwon
|
311490a720
|
Add script for benchmarking serving throughput (#145)
|
2023-06-14 19:55:38 -07:00 |
|