20231088/vllm

Rafael Vasquez 32aa2059ad

Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>

2024-12-23 22:35:38 +00:00

(input-processing-pipeline)=

Input Processing Pipeline

Input data is passed to {class}~vllm.LLMEngine (or {class}~vllm.AsyncLLMEngine).
Tokenize the data if necessary.
Process the inputs using {meth}INPUT_REGISTRY.process_input <vllm.inputs.registry.InputRegistry.process_input>.
- For example, add placeholder tokens to reserve KV cache for multi-modal embeddings.
Send the processed inputs to {class}~vllm.executor.executor_base.ExecutorBase.
Distribute the inputs via {class}~vllm.worker.worker_base.WorkerBase to {class}~vllm.worker.model_runner_base.ModelRunnerBase.
If the data contains multi-modal data, convert it into keyword arguments using {meth}MULTIMODAL_REGISTRY.map_input <vllm.multimodal.MultiModalRegistry.map_input>.
- For example, convert a {class}PIL.Image.Image input to its pixel values for a vision model.