diff --git a/examples/other/tensorize_vllm_model.py b/examples/other/tensorize_vllm_model.py index 68345e6c..7d11ba51 100644 --- a/examples/other/tensorize_vllm_model.py +++ b/examples/other/tensorize_vllm_model.py @@ -27,7 +27,7 @@ https://github.com/coreweave/tensorizer To serialize a model, install vLLM from source, then run something like this from the root level of this repository: -python -m examples.offline_inference.tensorize_vllm_model \ +python -m examples.other.tensorize_vllm_model \ --model facebook/opt-125m \ serialize \ --serialized-directory s3://my-bucket \ @@ -47,7 +47,7 @@ providing a `--keyfile` argument. To deserialize a model, you can run something like this from the root level of this repository: -python -m examples.offline_inference.tensorize_vllm_model \ +python -m examples.other.tensorize_vllm_model \ --model EleutherAI/gpt-j-6B \ --dtype float16 \ deserialize \ @@ -65,11 +65,11 @@ shard's rank. Sharded models serialized with this script will be named as model-rank-%03d.tensors For more information on the available arguments for serializing, run -`python -m examples.offline_inference.tensorize_vllm_model serialize --help`. +`python -m examples.other.tensorize_vllm_model serialize --help`. Or for deserializing: -`python -m examples.offline_inference.tensorize_vllm_model deserialize --help`. +`python -m examples.other.tensorize_vllm_model deserialize --help`. Once a model is serialized, tensorizer can be invoked with the `LLM` class directly to load models: @@ -90,7 +90,7 @@ TensorizerConfig arguments desired. In order to see all of the available arguments usable to configure loading with tensorizer that are given to `TensorizerConfig`, run: -`python -m examples.offline_inference.tensorize_vllm_model deserialize --help` +`python -m examples.other.tensorize_vllm_model deserialize --help` under the `tensorizer options` section. These can also be used for deserialization in this example script, although `--tensorizer-uri` and