From 9d2b4a70f43e91f49c3d2fd449d586ff8f25e31f Mon Sep 17 00:00:00 2001 From: Mark McLoughlin Date: Fri, 14 Mar 2025 16:45:25 +0000 Subject: [PATCH] [V1][Metrics] Updated list of deprecated metrics in v0.8 (#14695) Signed-off-by: Mark McLoughlin --- docs/source/serving/metrics.md | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/docs/source/serving/metrics.md b/docs/source/serving/metrics.md index 1d55f201..647ece3f 100644 --- a/docs/source/serving/metrics.md +++ b/docs/source/serving/metrics.md @@ -39,7 +39,16 @@ The following metrics are exposed: The following metrics are deprecated and due to be removed in a future version: -- *(No metrics are currently deprecated)* +- `vllm:num_requests_swapped`, `vllm:cpu_cache_usage_perc`, and + `vllm:cpu_prefix_cache_hit_rate` because KV cache offloading is not + used in V1. +- `vllm:gpu_prefix_cache_hit_rate` is replaced by queries+hits + counters in V1. +- `vllm:time_in_queue_requests` because it duplicates + `vllm:request_queue_time_seconds`. +- `vllm:model_forward_time_milliseconds` and + `vllm:model_execute_time_milliseconds` because + prefill/decode/inference time metrics should be used instead. Note: when metrics are deprecated in version `X.Y`, they are hidden in version `X.Y+1` but can be re-enabled using the `--show-hidden-metrics-for-version=X.Y` escape hatch,