vllm/docs/source/design/input_processing/input_processing_pipeline.md

(input-processing-pipeline)=

# Input Processing Pipeline

1. Input data is passed to {class}`~vllm.LLMEngine` (or {class}`~vllm.AsyncLLMEngine`).

2. Tokenize the data if necessary.

3. Process the inputs using {meth}`INPUT_REGISTRY.process_input <vllm.inputs.registry.InputRegistry.process_input>`.

   - For example, add placeholder tokens to reserve KV cache for multi-modal embeddings.

4. Send the processed inputs to {class}`~vllm.executor.executor_base.ExecutorBase`.

5. Distribute the inputs via {class}`~vllm.worker.worker_base.WorkerBase` to {class}`~vllm.worker.model_runner_base.ModelRunnerBase`.

6. If the data contains multi-modal data, convert it into keyword arguments using {meth}`MULTIMODAL_REGISTRY.map_input <vllm.multimodal.MultiModalRegistry.map_input>`.

   - For example, convert a {class}`PIL.Image.Image` input to its pixel values for a vision model.
[Docs] Convert rST to MyST (Markdown) (#11145) Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com> 2024-12-23 17:35:38 -05:00			`(input-processing-pipeline)=`

			`# Input Processing Pipeline`

			1. Input data is passed to {class}`~vllm.LLMEngine` (or {class}`~vllm.AsyncLLMEngine`).

			`2. Tokenize the data if necessary.`

			3. Process the inputs using {meth}`INPUT_REGISTRY.process_input <vllm.inputs.registry.InputRegistry.process_input>`.

			`- For example, add placeholder tokens to reserve KV cache for multi-modal embeddings.`

			4. Send the processed inputs to {class}`~vllm.executor.executor_base.ExecutorBase`.

			5. Distribute the inputs via {class}`~vllm.worker.worker_base.WorkerBase` to {class}`~vllm.worker.model_runner_base.ModelRunnerBase`.

			6. If the data contains multi-modal data, convert it into keyword arguments using {meth}`MULTIMODAL_REGISTRY.map_input <vllm.multimodal.MultiModalRegistry.map_input>`.

			- For example, convert a {class}`PIL.Image.Image` input to its pixel values for a vision model.