diff --git a/benchmarks/README.md b/benchmarks/README.md index 367ef934..edc10d8b 100644 --- a/benchmarks/README.md +++ b/benchmarks/README.md @@ -1,29 +1,181 @@ # Benchmarking vLLM -## Downloading the ShareGPT dataset +This README guides you through running benchmark tests with the extensive +datasets supported on vLLM. It’s a living document, updated as new features and datasets +become available. -You can download the dataset by running: +## Dataset Overview + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
DatasetOnlineOfflineData Path
ShareGPTwget https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered/resolve/main/ShareGPT_V3_unfiltered_cleaned_split.json
BurstGPTwget https://github.com/HPMLL/BurstGPT/releases/download/v1.1/BurstGPT_without_fails_2.csv
SonnetLocal file: benchmarks/sonnet.txt
Randomsynthetic
HuggingFace🚧Specify your dataset path on HuggingFace
VisionArena🚧lmarena-ai/vision-arena-bench-v0.1 (a HuggingFace dataset)
+✅: supported +🚧: to be supported + +**Note**: VisionArena’s `dataset-name` should be set to `hf` + +--- +## Example - Online Benchmark + +First start serving your model ```bash -wget https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered/resolve/main/ShareGPT_V3_unfiltered_cleaned_split.json +MODEL_NAME="NousResearch/Hermes-3-Llama-3.1-8B" +vllm serve ${MODEL_NAME} --disable-log-requests ``` -## Downloading the ShareGPT4V dataset - -The json file refers to several image datasets (coco, llava, etc.). The benchmark scripts -will ignore a datapoint if the referred image is missing. +Then run the benchmarking script ```bash -wget https://huggingface.co/datasets/Lin-Chen/ShareGPT4V/resolve/main/sharegpt4v_instruct_gpt4-vision_cap100k.json -mkdir coco -p -wget http://images.cocodataset.org/zips/train2017.zip -O coco/train2017.zip -unzip coco/train2017.zip -d coco/ +# download dataset +# wget https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered/resolve/main/ShareGPT_V3_unfiltered_cleaned_split.json +MODEL_NAME="NousResearch/Hermes-3-Llama-3.1-8B" +NUM_PROMPTS=10 +BACKEND="openai-chat" +DATASET_NAME="sharegpt" +DATASET_PATH="/ShareGPT_V3_unfiltered_cleaned_split.json" +python3 benchmarks/benchmark_serving.py --backend ${BACKEND} --model ${MODEL_NAME} --endpoint /v1/chat/completions --dataset-name ${DATASET_NAME} --dataset-path ${DATASET_PATH} --num-prompts ${NUM_PROMPTS} ``` -# Downloading the BurstGPT dataset +If successful, you will see the following output -You can download the BurstGPT v1.1 dataset by running: +``` +============ Serving Benchmark Result ============ +Successful requests: 10 +Benchmark duration (s): 5.78 +Total input tokens: 1369 +Total generated tokens: 2212 +Request throughput (req/s): 1.73 +Output token throughput (tok/s): 382.89 +Total Token throughput (tok/s): 619.85 +---------------Time to First Token---------------- +Mean TTFT (ms): 71.54 +Median TTFT (ms): 73.88 +P99 TTFT (ms): 79.49 +-----Time per Output Token (excl. 1st token)------ +Mean TPOT (ms): 7.91 +Median TPOT (ms): 7.96 +P99 TPOT (ms): 8.03 +---------------Inter-token Latency---------------- +Mean ITL (ms): 7.74 +Median ITL (ms): 7.70 +P99 ITL (ms): 8.39 +================================================== +``` + +### VisionArena Benchmark for Vision Language Models ```bash -wget https://github.com/HPMLL/BurstGPT/releases/download/v1.1/BurstGPT_without_fails_2.csv +# need a model with vision capability here +vllm serve Qwen/Qwen2-VL-7B-Instruct --disable-log-requests ``` + +```bash +MODEL_NAME="Qwen/Qwen2-VL-7B-Instruct" +NUM_PROMPTS=10 +BACKEND="openai-chat" +DATASET_NAME="hf" +DATASET_PATH="lmarena-ai/vision-arena-bench-v0.1" +DATASET_SPLIT='train' + +python3 benchmarks/benchmark_serving.py \ + --backend "${BACKEND}" \ + --model "${MODEL_NAME}" \ + --endpoint "/v1/chat/completions" \ + --dataset-name "${DATASET_NAME}" \ + --dataset-path "${DATASET_PATH}" \ + --hf-split "${DATASET_SPLIT}" \ + --num-prompts "${NUM_PROMPTS}" +``` + +--- +## Example - Offline Throughput Benchmark + +```bash +MODEL_NAME="NousResearch/Hermes-3-Llama-3.1-8B" +NUM_PROMPTS=10 +DATASET_NAME="sonnet" +DATASET_PATH="benchmarks/sonnet.txt" + +python3 benchmarks/benchmark_throughput.py \ + --model "${MODEL_NAME}" \ + --dataset-name "${DATASET_NAME}" \ + --dataset-path "${DATASET_PATH}" \ + --num-prompts "${NUM_PROMPTS}" + ``` + +If successful, you will see the following output + +``` +Throughput: 7.35 requests/s, 4789.20 total tokens/s, 1102.83 output tokens/s +``` + +### Benchmark with LoRA Adapters + +``` bash +MODEL_NAME="meta-llama/Llama-2-7b-hf" +BACKEND="vllm" +DATASET_NAME="sharegpt" +DATASET_PATH="/home/jovyan/data/vllm_benchmark_datasets/ShareGPT_V3_unfiltered_cleaned_split.json" +NUM_PROMPTS=10 +MAX_LORAS=2 +MAX_LORA_RANK=8 +ENABLE_LORA="--enable-lora" +LORA_PATH="yard1/llama-2-7b-sql-lora-test" + +python3 benchmarks/benchmark_throughput.py \ + --model "${MODEL_NAME}" \ + --backend "${BACKEND}" \ + --dataset_path "${DATASET_PATH}" \ + --dataset_name "${DATASET_NAME}" \ + --num-prompts "${NUM_PROMPTS}" \ + --max-loras "${MAX_LORAS}" \ + --max-lora-rank "${MAX_LORA_RANK}" \ + ${ENABLE_LORA} \ + --lora-path "${LORA_PATH}" + ```