vllm/docs/source/contributing/profiling/profiling_index.md

# Profiling vLLM

We support tracing vLLM workers using the `torch.profiler` module. You can enable tracing by setting the `VLLM_TORCH_PROFILER_DIR` environment variable to the directory where you want to save the traces: `VLLM_TORCH_PROFILER_DIR=/mnt/traces/`

The OpenAI server also needs to be started with the `VLLM_TORCH_PROFILER_DIR` environment variable set.

When using `benchmarks/benchmark_serving.py`, you can enable profiling by passing the `--profile` flag.

```{warning}
Only enable profiling in a development environment.
```

Traces can be visualized using <https://ui.perfetto.dev/>.

```{tip}
Only send a few requests through vLLM when profiling, as the traces can get quite large. Also, no need to untar the traces, they can be viewed directly.
```

```{tip}
To stop the profiler - it flushes out all the profile trace files to the directory. This takes time, for example for about 100 requests worth of data for a llama 70b, it takes about 10 minutes to flush out on a H100.
Set the env variable VLLM_RPC_TIMEOUT to a big number before you start the server. Say something like 30 minutes.
`export VLLM_RPC_TIMEOUT=1800000`
```

## Example commands and usage

### Offline Inference

Refer to <gh-file:examples/offline_inference/offline_inference_with_profiler.py> for an example.

### OpenAI Server

```bash
VLLM_TORCH_PROFILER_DIR=./vllm_profile python -m vllm.entrypoints.openai.api_server --model meta-llama/Meta-Llama-3-70B
```

benchmark_serving.py:

```bash
python benchmarks/benchmark_serving.py --backend vllm --model meta-llama/Meta-Llama-3-70B --dataset-name sharegpt --dataset-path sharegpt.json --profile --num-prompts 2
```
[Docs] Convert rST to MyST (Markdown) (#11145) Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com> 2024-12-23 17:35:38 -05:00			`# Profiling vLLM`

			We support tracing vLLM workers using the `torch.profiler` module. You can enable tracing by setting the `VLLM_TORCH_PROFILER_DIR` environment variable to the directory where you want to save the traces: `VLLM_TORCH_PROFILER_DIR=/mnt/traces/`

			The OpenAI server also needs to be started with the `VLLM_TORCH_PROFILER_DIR` environment variable set.

			When using `benchmarks/benchmark_serving.py`, you can enable profiling by passing the `--profile` flag.

			```{warning}
			`Only enable profiling in a development environment.`
			```

			`Traces can be visualized using <https://ui.perfetto.dev/>.`

			```{tip}
			`Only send a few requests through vLLM when profiling, as the traces can get quite large. Also, no need to untar the traces, they can be viewed directly.`
			```

			```{tip}
			`To stop the profiler - it flushes out all the profile trace files to the directory. This takes time, for example for about 100 requests worth of data for a llama 70b, it takes about 10 minutes to flush out on a H100.`
			`Set the env variable VLLM_RPC_TIMEOUT to a big number before you start the server. Say something like 30 minutes.`
			`export VLLM_RPC_TIMEOUT=1800000`
			```

[Doc] Improve GitHub links (#11491) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> 2024-12-26 06:49:26 +08:00			`## Example commands and usage`
[Docs] Convert rST to MyST (Markdown) (#11145) Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com> 2024-12-23 17:35:38 -05:00
[Doc] Improve GitHub links (#11491) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> 2024-12-26 06:49:26 +08:00			`### Offline Inference`
[Docs] Convert rST to MyST (Markdown) (#11145) Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com> 2024-12-23 17:35:38 -05:00
[Doc] Move examples into categories (#11840) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> 2025-01-08 13:09:53 +00:00			`Refer to <gh-file:examples/offline_inference/offline_inference_with_profiler.py> for an example.`
[Docs] Convert rST to MyST (Markdown) (#11145) Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com> 2024-12-23 17:35:38 -05:00
[Doc] Improve GitHub links (#11491) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> 2024-12-26 06:49:26 +08:00			`### OpenAI Server`
[Docs] Convert rST to MyST (Markdown) (#11145) Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com> 2024-12-23 17:35:38 -05:00
			```bash
			`VLLM_TORCH_PROFILER_DIR=./vllm_profile python -m vllm.entrypoints.openai.api_server --model meta-llama/Meta-Llama-3-70B`
			```

			`benchmark_serving.py:`

			```bash
			`python benchmarks/benchmark_serving.py --backend vllm --model meta-llama/Meta-Llama-3-70B --dataset-name sharegpt --dataset-path sharegpt.json --profile --num-prompts 2`
			```