[Doc] Update benchmarks README (#14646)

Signed-off-by: Jennifer Zhao <7443418+JenZhao@users.noreply.github.com> Co-authored-by: Jennifer Zhao <7443418+JenZhao@users.noreply.github.com> Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
2025-03-11 19:23:04 -07:00 · 2025-03-11 19:23:04 -07:00 · 4a42b9f5d6
commit 4a42b9f5d6
parent 47532cd9f4
1 changed files with 166 additions and 14 deletions
--- a/benchmarks/README.md
+++ b/benchmarks/README.md
@ -1,29 +1,181 @@
 # Benchmarking vLLM
-## Downloading the ShareGPT dataset
+This README guides you through running benchmark tests with the extensive
 datasets supported on vLLM. It’s a living document, updated as new features and datasets
 become available.
-You can download the dataset by running:
+## Dataset Overview
 <table style="width:100%; border-collapse: collapse;">
  <thead>
    <tr>
      <th style="width:15%; text-align: left;">Dataset</th>
      <th style="width:10%; text-align: center;">Online</th>
      <th style="width:10%; text-align: center;">Offline</th>
      <th style="width:65%; text-align: left;">Data Path</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><strong>ShareGPT</strong></td>
      <td style="text-align: center;">✅</td>
      <td style="text-align: center;">✅</td>
      <td><code>wget https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered/resolve/main/ShareGPT_V3_unfiltered_cleaned_split.json</code></td>
    </tr>
    <tr>
      <td><strong>BurstGPT</strong></td>
      <td style="text-align: center;">✅</td>
      <td style="text-align: center;">✅</td>
      <td><code>wget https://github.com/HPMLL/BurstGPT/releases/download/v1.1/BurstGPT_without_fails_2.csv</code></td>
    </tr>
    <tr>
      <td><strong>Sonnet</strong></td>
      <td style="text-align: center;">✅</td>
      <td style="text-align: center;">✅</td>
      <td>Local file: <code>benchmarks/sonnet.txt</code></td>
    </tr>
    <tr>
      <td><strong>Random</strong></td>
      <td style="text-align: center;">✅</td>
      <td style="text-align: center;">✅</td>
      <td><code>synthetic</code></td>
    </tr>
    <tr>
      <td><strong>HuggingFace</strong></td>
      <td style="text-align: center;">✅</td>
      <td style="text-align: center;">🚧</td>
      <td>Specify your dataset path on HuggingFace</td>
    </tr>
    <tr>
      <td><strong>VisionArena</strong></td>
      <td style="text-align: center;">✅</td>
      <td style="text-align: center;">🚧</td>
      <td><code>lmarena-ai/vision-arena-bench-v0.1</code> (a HuggingFace dataset)</td>
    </tr>
  </tbody>
 </table>
 ✅: supported  
 🚧: to be supported
 **Note**: VisionArena’s `dataset-name` should be set to `hf`
 ---
 ## Example - Online Benchmark
 First start serving your model
 ```bash
-wget https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered/resolve/main/ShareGPT_V3_unfiltered_cleaned_split.json
+MODEL_NAME="NousResearch/Hermes-3-Llama-3.1-8B"
 vllm serve ${MODEL_NAME} --disable-log-requests
 ```
-## Downloading the ShareGPT4V dataset
+Then run the benchmarking script
 The json file refers to several image datasets (coco, llava, etc.). The benchmark scripts
 will ignore a datapoint if the referred image is missing.
 ```bash
-wget https://huggingface.co/datasets/Lin-Chen/ShareGPT4V/resolve/main/sharegpt4v_instruct_gpt4-vision_cap100k.json
+# download dataset
-mkdir coco -p
+# wget https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered/resolve/main/ShareGPT_V3_unfiltered_cleaned_split.json
-wget http://images.cocodataset.org/zips/train2017.zip -O coco/train2017.zip
+MODEL_NAME="NousResearch/Hermes-3-Llama-3.1-8B"
-unzip coco/train2017.zip -d coco/
+NUM_PROMPTS=10
 BACKEND="openai-chat"
 DATASET_NAME="sharegpt"
 DATASET_PATH="<your data path>/ShareGPT_V3_unfiltered_cleaned_split.json"
 python3 benchmarks/benchmark_serving.py --backend ${BACKEND} --model ${MODEL_NAME} --endpoint /v1/chat/completions --dataset-name ${DATASET_NAME} --dataset-path ${DATASET_PATH} --num-prompts ${NUM_PROMPTS}
 ```
-# Downloading the BurstGPT dataset
+If successful, you will see the following output
-You can download the BurstGPT v1.1 dataset by running:
+```
 ============ Serving Benchmark Result ============
 Successful requests:                     10        
 Benchmark duration (s):                  5.78      
 Total input tokens:                      1369      
 Total generated tokens:                  2212      
 Request throughput (req/s):              1.73      
 Output token throughput (tok/s):         382.89    
 Total Token throughput (tok/s):          619.85    
 ---------------Time to First Token----------------
 Mean TTFT (ms):                          71.54     
 Median TTFT (ms):                        73.88     
 P99 TTFT (ms):                           79.49     
 -----Time per Output Token (excl. 1st token)------
 Mean TPOT (ms):                          7.91      
 Median TPOT (ms):                        7.96      
 P99 TPOT (ms):                           8.03      
 ---------------Inter-token Latency----------------
 Mean ITL (ms):                           7.74      
 Median ITL (ms):                         7.70      
 P99 ITL (ms):                            8.39      
 ==================================================
 ```
 ### VisionArena Benchmark for Vision Language Models
 ```bash
-wget https://github.com/HPMLL/BurstGPT/releases/download/v1.1/BurstGPT_without_fails_2.csv
+# need a model with vision capability here
 vllm serve Qwen/Qwen2-VL-7B-Instruct --disable-log-requests
 ```
 ```bash
 MODEL_NAME="Qwen/Qwen2-VL-7B-Instruct"
 NUM_PROMPTS=10
 BACKEND="openai-chat"
 DATASET_NAME="hf"
 DATASET_PATH="lmarena-ai/vision-arena-bench-v0.1"
 DATASET_SPLIT='train'
 python3 benchmarks/benchmark_serving.py \
  --backend "${BACKEND}" \
  --model "${MODEL_NAME}" \
  --endpoint "/v1/chat/completions" \
  --dataset-name "${DATASET_NAME}" \
  --dataset-path "${DATASET_PATH}" \
  --hf-split "${DATASET_SPLIT}" \
  --num-prompts "${NUM_PROMPTS}"
 ```
 ---
 ## Example - Offline Throughput Benchmark
 ```bash
 MODEL_NAME="NousResearch/Hermes-3-Llama-3.1-8B"
 NUM_PROMPTS=10
 DATASET_NAME="sonnet"
 DATASET_PATH="benchmarks/sonnet.txt"
 python3 benchmarks/benchmark_throughput.py \
  --model "${MODEL_NAME}" \
  --dataset-name "${DATASET_NAME}" \
  --dataset-path "${DATASET_PATH}" \
  --num-prompts "${NUM_PROMPTS}"
  ```
 If successful, you will see the following output
 ```
 Throughput: 7.35 requests/s, 4789.20 total tokens/s, 1102.83 output tokens/s
 ```
 ### Benchmark with LoRA Adapters
 ``` bash
 MODEL_NAME="meta-llama/Llama-2-7b-hf"
 BACKEND="vllm"
 DATASET_NAME="sharegpt"
 DATASET_PATH="/home/jovyan/data/vllm_benchmark_datasets/ShareGPT_V3_unfiltered_cleaned_split.json"
 NUM_PROMPTS=10
 MAX_LORAS=2
 MAX_LORA_RANK=8
 ENABLE_LORA="--enable-lora"
 LORA_PATH="yard1/llama-2-7b-sql-lora-test"
 python3 benchmarks/benchmark_throughput.py \
  --model "${MODEL_NAME}" \
  --backend "${BACKEND}" \
  --dataset_path "${DATASET_PATH}" \
  --dataset_name "${DATASET_NAME}" \
  --num-prompts "${NUM_PROMPTS}" \
  --max-loras "${MAX_LORAS}" \
  --max-lora-rank "${MAX_LORA_RANK}" \
  ${ENABLE_LORA} \
  --lora-path "${LORA_PATH}"
  ```