20231088/vllm - vllm - Luminance Code Repo

20231088/vllm

Author	SHA1	Message	Date
youkaichao	18b296fdb2	[core] remove beam search from the core (#9105 )	2024-10-07 05:47:04 +00:00
Kuntai Du	fbb74420e7	[CI] Update performance benchmark: upgrade trt-llm to r24.07, and add SGLang (#7412 )	2024-10-04 14:01:44 -07:00
vlsav	22f5851b80	Update benchmark_serving.py to read and write json-datasets, results in UTF8, for better compatibility with Windows (#8997 )	2024-10-01 11:07:06 -07:00
Chen Zhang	e585b583a9	[Bugfix] Support testing prefill throughput with benchmark_serving.py --hf-output-len 1 (#8891 )	2024-09-28 18:51:22 +00:00
Peter Pan	0e088750af	[MISC] Fix invalid escape sequence '\' (#8830 ) Signed-off-by: Peter Pan <Peter.Pan@daocloud.io>	2024-09-27 01:13:25 -07:00
Kuntai Du	c52ec5f034	[Bugfix] fixing sonnet benchmark bug in benchmark_serving.py (#8616 )	2024-09-19 05:24:24 +00:00
Isotr0py	1b6de8352b	[Benchmark] Support sample from HF datasets and image input for benchmark_serving (#8495 )	2024-09-17 07:34:27 +00:00
Wei-Sheng Chin	795b662cff	Enable Random Prefix Caching in Serving Profiling Tool (benchmark_serving.py) (#8241 )	2024-09-06 20:18:16 -07:00
afeldman-nm	e5cab71531	[Frontend] Add --logprobs argument to `benchmark_serving.py` (#8191 )	2024-09-06 09:01:14 -07:00
Cody Yu	77d9e514a2	[MISC] Replace input token throughput with total token throughput (#8164 ) Co-authored-by: Michael Goin <michael@neuralmagic.com>	2024-09-04 20:23:22 +00:00
Wei-Sheng Chin	0c785d344d	Add more percentiles and latencies (#7759 )	2024-08-29 16:48:11 -07:00
William Lin	dd53c4b023	[misc] Add Torch profiler support (#7451 ) Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>	2024-08-21 15:39:26 -07:00
Fish	ccb20db8bd	[Bugfix] Benchmark serving script used global parameter 'args' in function 'sample_random_requests' (#6428 )	2024-07-14 19:27:01 -07:00
Ethan Xu	dbfe254eda	[Feature] vLLM CLI (#5090 ) Co-authored-by: simon-mo <simon.mo@hey.com>	2024-07-14 15:36:43 -07:00
Kuntai Du	a4feba929b	[CI/Build] Add nightly benchmarking for tgi, tensorrt-llm and lmdeploy (#5362 )	2024-07-11 13:28:38 -07:00
Haichuan	717f4bcea0	Feature/add benchmark testing (#5947 ) Co-authored-by: Roger Wang <ywang@roblox.com>	2024-07-08 07:52:06 +00:00
Haichuan	333306a252	add benchmark for fix length input and output (#5857 ) Co-authored-by: Roger Wang <ywang@roblox.com>	2024-07-07 07:42:13 +00:00
Michael Goin	8065a7e220	[Frontend] Add FlexibleArgumentParser to support both underscore and dash in names (#5718 )	2024-06-20 17:00:13 -06:00
zhyncs	1f12122b17	[Misc] use AutoTokenizer for benchmark serving when vLLM not installed (#5588 )	2024-06-17 09:40:35 -07:00
Cyrus Leung	0e9164b40a	[mypy] Enable type checking for test directory (#5017 )	2024-06-15 04:45:31 +00:00
Kuntai Du	319ad7f1d3	[CI/Build][Misc] Add CI that benchmarks vllm performance on those PRs with `perf-benchmarks` label (#5073 ) Co-authored-by: simon-mo <simon.mo@hey.com>	2024-06-13 22:36:20 -07:00
Tyler Michael Smith	02cc3b51a7	[misc] benchmark_serving.py -- add ITL results and tweak TPOT results (#5263 )	2024-06-05 10:17:51 -07:00
Roger Wang	f17a1a8f96	[Misc] Make Serving Benchmark More User-friendly (#5044 )	2024-05-25 17:28:16 +00:00
Kuntai Du	c3af44722c	[Doc]Add documentation to benchmarking script when running TGI (#4920 )	2024-05-20 20:16:57 +00:00
Roger Wang	7923dcad12	[Misc] Update ShareGPT Dataset Sampling in Serving Benchmark (#4279 )	2024-04-24 09:49:13 -07:00
Chang Su	819a309c0f	[Bugfix] Fix args in benchmark_serving (#3836 ) Co-authored-by: Roger Wang <ywang@roblox.com>	2024-04-04 07:41:05 +00:00
Roger Wang	45b6ef6513	feat(benchmarks): Add Prefix Caching Benchmark to Serving Benchmark (#3277 )	2024-03-27 13:39:26 -07:00
SangBin Cho	01bfb22b41	[CI] Try introducing isort. (#3495 )	2024-03-25 07:59:47 -07:00
Simon Mo	8e67598aa6	[Misc] fix line length for entire codebase (#3444 )	2024-03-16 00:36:29 -07:00
TianYu GUO	1ece1ae829	[Minor Fix] Fix comments in benchmark_serving (#3252 )	2024-03-07 22:22:59 -08:00
Massimiliano Pronesti	93dc5a2870	chore(vllm): codespell for spell checking (#2820 )	2024-02-21 18:56:01 -08:00
Ronen Schaffer	d7f396486e	Update comment (#2934 )	2024-02-21 18:18:37 -08:00
Roger Wang	a4211a4dc3	Serving Benchmark Refactoring (#2433 )	2024-02-12 22:53:00 -08:00
Simon Mo	1e4277d2d1	lint: format all python file instead of just source code (#2567 )	2024-01-23 15:53:06 -08:00
Harry Mellor	63e835cbcc	Fix progress bar and allow HTTPS in `benchmark_serving.py` (#2552 )	2024-01-22 14:40:31 -08:00
Harry Mellor	2709c0009a	Support OpenAI API server in `benchmark_serving.py` (#2172 )	2024-01-18 20:34:08 -08:00
Antoni Baum	acbed3ef40	Use monotonic time where appropriate (#1249 )	2023-10-02 19:22:05 -07:00
Ricardo Lu	8c4b2592fb	fix: enable trust-remote-code in api server & benchmark. (#509 )	2023-07-19 17:06:15 -07:00
Woosuk Kwon	4338cc4750	[Tokenizer] Add an option to specify tokenizer (#284 )	2023-06-28 09:46:58 -07:00
Zhuohan Li	43710e8d09	[Fix] Fix default port number in benchmark scripts (#265 )	2023-06-26 13:15:35 -07:00
Woosuk Kwon	3f92038b99	Add comments on swap space (#154 )	2023-06-18 11:39:35 -07:00
Woosuk Kwon	0b98ba15c7	Change the name to vLLM (#150 )	2023-06-17 03:07:40 -07:00
Zhuohan Li	e5464ee484	Rename servers to engines (#152 )	2023-06-17 17:25:21 +08:00
Woosuk Kwon	311490a720	Add script for benchmarking serving throughput (#145 )	2023-06-14 19:55:38 -07:00

44 Commits