Russell Bryant
3be5b26a76
[CI/Build] Add shell script linting using shellcheck ( #7925 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2024-11-07 18:17:29 +00:00
Atlas
a62bc0109c
[Misc] Add Gamma-Distribution Request Generation Support for Serving Benchmark. ( #10105 )
...
Signed-off-by: Mozhou <spli161006@gmail.com>
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
2024-11-07 11:20:30 +00:00
Aaron Pham
21063c11c7
[CI/Build] drop support for Python 3.8 EOL ( #8464 )
...
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
2024-11-06 07:11:55 +00:00
lkchen
d2e80332a7
[Feature] Update benchmark_throughput.py to support image input ( #9851 )
...
Signed-off-by: Linkun Chen <github+anyscale@lkchen.net>
Co-authored-by: Linkun Chen <github+anyscale@lkchen.net>
2024-11-05 19:30:02 +00:00
lkchen
9a5664d4a4
[Misc] Refactor benchmark_throughput.py ( #9779 )
...
Signed-off-by: Linkun Chen <github+anyscale@lkchen.net>
Co-authored-by: Linkun Chen <lkchen@github.com>
Co-authored-by: Linkun Chen <github+anyscale@lkchen.net>
2024-11-04 14:32:16 -08:00
Tran Quang Dai
ea4adeddc1
[Bugfix] Fix E2EL mean and median stats ( #9984 )
...
Signed-off-by: daitran2k1 <tranquangdai7a@gmail.com>
2024-11-04 09:37:58 +00:00
Guillaume Calmettes
abbfb6134d
[Misc][OpenAI] deprecate max_tokens in favor of new max_completion_tokens field for chat completion endpoint ( #9837 )
2024-10-30 18:15:56 -07:00
wangshuai09
622b7ab955
[Hardware] using current_platform.seed_everything ( #9785 )
...
Signed-off-by: wangshuai09 <391746016@qq.com>
2024-10-29 14:47:44 +00:00
youkaichao
32176fee73
[torch.compile] support moe models ( #9632 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-10-27 21:58:04 -07:00
Michael Goin
fd0e2cfdb2
[Misc] Separate total and output tokens in benchmark_throughput.py ( #8914 )
2024-10-23 16:47:20 +00:00
Chen Zhang
65050a40e6
[Bugfix] Generate exactly input_len tokens in benchmark_throughput ( #9592 )
2024-10-22 17:45:35 -07:00
Jeremy Arnold
cb6fdaa0a0
[Misc] Make benchmarks use EngineArgs ( #9529 )
2024-10-22 15:40:38 -07:00
Andy Dai
855e0e6f97
[Frontend][Misc] Goodput metric support ( #9338 )
2024-10-20 18:39:32 +00:00
Russell Bryant
7dbe738d65
[Misc] benchmark: Add option to set max concurrency ( #9390 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2024-10-18 11:15:28 -07:00
Kai Wu
d65049daab
[Bugfix] Add random_seed to sample_hf_requests in benchmark_serving script ( #9013 )
...
Co-authored-by: Isotr0py <2037008807@qq.com>
2024-10-17 21:11:11 +00:00
Kuntai Du
81ede99ca4
[Core] Deprecating block manager v1 and make block manager v2 default ( #8704 )
...
Removing the block manager v1. This is the initial piece of prefix-caching-centric design. In order to achieve prefix-caching-centric design, we need to simplify the code path so that we only use v2 block manager (which has much higher performance on prefix caching).
2024-10-17 11:38:15 -05:00
Cyrus Leung
7e7eae338d
[Misc] Standardize RoPE handling for Qwen2-VL ( #9250 )
2024-10-16 13:56:17 +08:00
Grace Ho
5d264f4ab8
pass ignore_eos parameter to all benchmark_serving calls ( #9349 )
2024-10-15 13:30:44 -07:00
Andy Dai
94bf9ae4e9
[Misc] Fix sampling from sonnet for long context case ( #9235 )
2024-10-11 00:33:16 +00:00
sroy745
f3a507f1d3
[Core] Add an environment variable which needs to be set explicitly to allow BlockSpaceManagerV1 ( #9149 )
2024-10-10 14:17:17 +08:00
youkaichao
18b296fdb2
[core] remove beam search from the core ( #9105 )
2024-10-07 05:47:04 +00:00
Brendan Wong
168cab6bbf
[Frontend] API support for beam search ( #9087 )
...
Co-authored-by: youkaichao <youkaichao@126.com>
2024-10-05 23:39:03 -07:00
Cody Yu
27302dd584
[Misc] Fix CI lint ( #9085 )
2024-10-04 16:07:54 -07:00
Andy Dai
0cc566ca8f
[Misc] Add random seed for prefix cache benchmark ( #9081 )
2024-10-04 21:58:57 +00:00
Kuntai Du
fbb74420e7
[CI] Update performance benchmark: upgrade trt-llm to r24.07, and add SGLang ( #7412 )
2024-10-04 14:01:44 -07:00
vlsav
22f5851b80
Update benchmark_serving.py to read and write json-datasets, results in UTF8, for better compatibility with Windows ( #8997 )
2024-10-01 11:07:06 -07:00
Chen Zhang
e585b583a9
[Bugfix] Support testing prefill throughput with benchmark_serving.py --hf-output-len 1 ( #8891 )
2024-09-28 18:51:22 +00:00
Peter Pan
0e088750af
[MISC] Fix invalid escape sequence '\' ( #8830 )
...
Signed-off-by: Peter Pan <Peter.Pan@daocloud.io>
2024-09-27 01:13:25 -07:00
Cyrus Leung
3b00b9c26c
[Core] renamePromptInputs
and inputs
( #8876 )
2024-09-26 20:35:15 -07:00
Simon Mo
4f1ba0844b
Revert "rename PromptInputs and inputs with backward compatibility ( #8760 ) ( #8810 )
2024-09-25 10:36:26 -07:00
Cyrus Leung
28e1299e60
rename PromptInputs and inputs with backward compatibility ( #8760 )
2024-09-25 09:36:47 -07:00
Archit Patke
6da1ab6b41
[Core] Adding Priority Scheduling ( #5958 )
2024-09-24 19:50:50 -07:00
Simon Mo
3185fb0cca
Revert "[Core] Rename PromptInputs
to PromptType
, and inputs
to prompt
" ( #8750 )
2024-09-24 05:45:20 +00:00
youkaichao
0250dd68c5
re-implement beam search on top of vllm core ( #8726 )
...
Co-authored-by: Brendan Wong <bjwpokemon@gmail.com>
2024-09-23 22:08:12 -07:00
Lucas Wilkinson
86e9c8df29
[Kernel] (2/N) Machete - Integrate into CompressedTensorsWNA16 and GPTQMarlin ( #7701 )
...
Co-authored-by: mgoin <michael@neuralmagic.com>
Co-authored-by: Divakar Verma <137818590+divakar-amd@users.noreply.github.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
2024-09-23 13:46:26 -04:00
Cyrus Leung
0057894ef7
[Core] Rename PromptInputs
and inputs
( #8673 )
2024-09-20 19:00:54 -07:00
Kunshang Ji
855c8ae2c9
[MISC] remove engine_use_ray in benchmark_throughput.py ( #8615 )
2024-09-18 22:33:20 -07:00
Kuntai Du
c52ec5f034
[Bugfix] fixing sonnet benchmark bug in benchmark_serving.py ( #8616 )
2024-09-19 05:24:24 +00:00
Aaron Pham
9d104b5beb
[CI/Build] Update Ruff version ( #8469 )
...
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2024-09-18 11:00:56 +00:00
Cyrus Leung
6ffa3f314c
[CI/Build] Avoid CUDA initialization ( #8534 )
2024-09-18 10:38:11 +00:00
Isotr0py
1b6de8352b
[Benchmark] Support sample from HF datasets and image input for benchmark_serving ( #8495 )
2024-09-17 07:34:27 +00:00
Aarni Koskela
8baa454937
[Misc] Move device options to a single place ( #8322 )
2024-09-11 13:25:58 -07:00
Wei-Sheng Chin
795b662cff
Enable Random Prefix Caching in Serving Profiling Tool (benchmark_serving.py) ( #8241 )
2024-09-06 20:18:16 -07:00
afeldman-nm
e5cab71531
[Frontend] Add --logprobs argument to benchmark_serving.py
( #8191 )
2024-09-06 09:01:14 -07:00
Cody Yu
77d9e514a2
[MISC] Replace input token throughput with total token throughput ( #8164 )
...
Co-authored-by: Michael Goin <michael@neuralmagic.com>
2024-09-04 20:23:22 +00:00
Nick Hill
d4db9f53c8
[Benchmark] Add --async-engine
option to benchmark_throughput.py ( #7964 )
2024-09-03 20:57:41 -04:00
Wei-Sheng Chin
0c785d344d
Add more percentiles and latencies ( #7759 )
2024-08-29 16:48:11 -07:00
Philipp Schmid
345be0e244
[benchmark] Update TGI version ( #7917 )
2024-08-27 15:07:53 -07:00
Megha Agarwal
2eedede875
[Core] Asynchronous Output Processor ( #7049 )
...
Co-authored-by: Alexander Matveev <alexm@neuralmagic.com>
2024-08-26 20:53:20 -07:00
Alexander Matveev
9db93de20c
[Core] Add multi-step support to LLMEngine ( #7789 )
2024-08-23 12:45:53 -07:00