20231088/vllm

[Core] Integrate fastsafetensors loader for loading model weights (#10647 )

Signed-off-by: Manish Sethi <Manish.sethi1@ibm.com>

2025-03-24 08:08:02 -07:00

390 B

Raw Blame History

Loading Model weights with fastsafetensors

Using fastsafetensor library enables loading model weights to GPU memory by leveraging GPU direct storage. See https://github.com/foundation-model-stack/fastsafetensors for more details. For enabling this feature, set the environment variable USE_FASTSAFETENSOR to true

390 B Raw Blame History

Loading Model weights with fastsafetensors

390 B

Raw Blame History