vllm/examples/offline_inference_embedding.py

from vllm import LLM

# Sample prompts.
prompts = [
    "Hello, my name is",
    "The president of the United States is",
    "The capital of France is",
    "The future of AI is",
]

# Create an LLM.
model = LLM(
    model="intfloat/e5-mistral-7b-instruct",
    task="embed",  # You should pass task="embed" for embedding models
    enforce_eager=True,
)

# Generate embedding. The output is a list of PoolingRequestOutputs.
outputs = model.encode(prompts)
# Print the outputs.
for output in outputs:
    print(output.outputs.embedding)  # list of 4096 floats
[Model][Misc] Add e5-mistral-7b-instruct and Embedding API (#3734) 2024-05-11 11:30:37 -07:00			`from vllm import LLM`

			`# Sample prompts.`
			`prompts = [`
			`"Hello, my name is",`
			`"The president of the United States is",`
			`"The capital of France is",`
			`"The future of AI is",`
			`]`

			`# Create an LLM.`
[Misc] Split up pooling tasks (#10820) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> 2024-12-11 17:28:00 +08:00			`model = LLM(`
			`model="intfloat/e5-mistral-7b-instruct",`
			`task="embed", # You should pass task="embed" for embedding models`
			`enforce_eager=True,`
			`)`

[Misc] Rename embedding classes to pooling (#10801) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> 2024-12-01 14:36:51 +08:00			`# Generate embedding. The output is a list of PoolingRequestOutputs.`
[Model][Misc] Add e5-mistral-7b-instruct and Embedding API (#3734) 2024-05-11 11:30:37 -07:00			`outputs = model.encode(prompts)`
			`# Print the outputs.`
			`for output in outputs:`
			`print(output.outputs.embedding) # list of 4096 floats`