880 B
880 B
(input-processing-pipeline)=
Input Processing Pipeline
-
Input data is passed to {class}
~vllm.LLMEngine
(or {class}~vllm.AsyncLLMEngine
). -
Tokenize the data if necessary.
-
Process the inputs using {meth}
INPUT_REGISTRY.process_input <vllm.inputs.registry.InputRegistry.process_input>
.- For example, add placeholder tokens to reserve KV cache for multi-modal embeddings.
-
Send the processed inputs to {class}
~vllm.executor.executor_base.ExecutorBase
. -
Distribute the inputs via {class}
~vllm.worker.worker_base.WorkerBase
to {class}~vllm.worker.model_runner_base.ModelRunnerBase
. -
If the data contains multi-modal data, convert it into keyword arguments using {meth}
MULTIMODAL_REGISTRY.map_input <vllm.multimodal.MultiModalRegistry.map_input>
.- For example, convert a {class}
PIL.Image.Image
input to its pixel values for a vision model.
- For example, convert a {class}