vllm/examples/online_serving/gradio_webserver.py

# SPDX-License-Identifier: Apache-2.0
"""Example for starting a Gradio Webserver
Start vLLM API server:
    python -m vllm.entrypoints.api_server \
        --model meta-llama/Llama-2-7b-chat-hf

Start Webserver:
    python examples/online_serving/gradio_webserver.py

Note that `pip install --upgrade gradio` is needed to run this example.
More details: https://github.com/gradio-app/gradio

If your antivirus software blocks the download of frpc for gradio,
you can install it manually by following these steps:

1. Download this file: https://cdn-media.huggingface.co/frpc-gradio-0.3/frpc_linux_amd64
2. Rename the downloaded file to: frpc_linux_amd64_v0.3
3. Move the file to this location: /home/user/.cache/huggingface/gradio/frpc
"""
import argparse
import json

import gradio as gr
import requests


def http_bot(prompt):
    headers = {"User-Agent": "vLLM Client"}
    pload = {
        "prompt": prompt,
        "stream": True,
        "max_tokens": 128,
    }
    response = requests.post(args.model_url,
                             headers=headers,
                             json=pload,
                             stream=True)

    for chunk in response.iter_lines(chunk_size=8192,
                                     decode_unicode=False,
                                     delimiter=b"\n"):
        if chunk:
            data = json.loads(chunk.decode("utf-8"))
            output = data["text"][0]
            yield output


def build_demo():
    with gr.Blocks() as demo:
        gr.Markdown("# vLLM text completion demo\n")
        inputbox = gr.Textbox(label="Input",
                              placeholder="Enter text and press ENTER")
        outputbox = gr.Textbox(label="Output",
                               placeholder="Generated result from the model")
        inputbox.submit(http_bot, [inputbox], [outputbox])
    return demo


def parse_args():
    parser = argparse.ArgumentParser()
    parser.add_argument("--host", type=str, default=None)
    parser.add_argument("--port", type=int, default=8001)
    parser.add_argument("--model-url",
                        type=str,
                        default="http://localhost:8000/generate")
    return parser.parse_args()


def main(args):
    demo = build_demo()
    demo.queue().launch(server_name=args.host,
                        server_port=args.port,
                        share=True)


if __name__ == "__main__":
    args = parse_args()
    main(args)
[Misc] Add SPDX-License-Identifier headers to python source files (#12628) - Add SPDX license headers to python source files - Check for SPDX headers using pre-commit commit 9d7ef44c3cfb72ca4c32e1c677d99259d10d4745 Author: Russell Bryant <rbryant@redhat.com> Date: Fri Jan 31 14:18:24 2025 -0500 Add SPDX license headers to python source files This commit adds SPDX license headers to python source files as recommended to the project by the Linux Foundation. These headers provide a concise way that is both human and machine readable for communicating license information for each source file. It helps avoid any ambiguity about the license of the code and can also be easily used by tools to help manage license compliance. The Linux Foundation runs license scans against the codebase to help ensure we are in compliance with the licenses of the code we use, including dependencies. Having these headers in place helps that tool do its job. More information can be found on the SPDX site: - https://spdx.dev/learn/handling-license-info/ Signed-off-by: Russell Bryant <rbryant@redhat.com> commit 5a1cf1cb3b80759131c73f6a9dddebccac039dea Author: Russell Bryant <rbryant@redhat.com> Date: Fri Jan 31 14:36:32 2025 -0500 Check for SPDX headers using pre-commit Signed-off-by: Russell Bryant <rbryant@redhat.com> --------- Signed-off-by: Russell Bryant <rbryant@redhat.com> 2025-02-02 14:58:18 -05:00			`# SPDX-License-Identifier: Apache-2.0`
[Misc] refactor argument parsing in examples (#16635) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com> 2025-04-15 16:05:30 +08:00			`"""Example for starting a Gradio Webserver`
			`Start vLLM API server:`
			`python -m vllm.entrypoints.api_server \`
			`--model meta-llama/Llama-2-7b-chat-hf`
[Misc] Add SPDX-License-Identifier headers to python source files (#12628) - Add SPDX license headers to python source files - Check for SPDX headers using pre-commit commit 9d7ef44c3cfb72ca4c32e1c677d99259d10d4745 Author: Russell Bryant <rbryant@redhat.com> Date: Fri Jan 31 14:18:24 2025 -0500 Add SPDX license headers to python source files This commit adds SPDX license headers to python source files as recommended to the project by the Linux Foundation. These headers provide a concise way that is both human and machine readable for communicating license information for each source file. It helps avoid any ambiguity about the license of the code and can also be easily used by tools to help manage license compliance. The Linux Foundation runs license scans against the codebase to help ensure we are in compliance with the licenses of the code we use, including dependencies. Having these headers in place helps that tool do its job. More information can be found on the SPDX site: - https://spdx.dev/learn/handling-license-info/ Signed-off-by: Russell Bryant <rbryant@redhat.com> commit 5a1cf1cb3b80759131c73f6a9dddebccac039dea Author: Russell Bryant <rbryant@redhat.com> Date: Fri Jan 31 14:36:32 2025 -0500 Check for SPDX headers using pre-commit Signed-off-by: Russell Bryant <rbryant@redhat.com> --------- Signed-off-by: Russell Bryant <rbryant@redhat.com> 2025-02-02 14:58:18 -05:00
[Misc] refactor argument parsing in examples (#16635) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com> 2025-04-15 16:05:30 +08:00			`Start Webserver:`
			`python examples/online_serving/gradio_webserver.py`

			Note that `pip install --upgrade gradio` is needed to run this example.
			`More details: https://github.com/gradio-app/gradio`

			`If your antivirus software blocks the download of frpc for gradio,`
			`you can install it manually by following these steps:`

			`1. Download this file: https://cdn-media.huggingface.co/frpc-gradio-0.3/frpc_linux_amd64`
			`2. Rename the downloaded file to: frpc_linux_amd64_v0.3`
			`3. Move the file to this location: /home/user/.cache/huggingface/gradio/frpc`
			`"""`
FastAPI-based working frontend (#10) 2023-03-29 14:48:56 +08:00			`import argparse`
			`import json`

			`import gradio as gr`
			`import requests`


			`def http_bot(prompt):`
Change the name to vLLM (#150) 2023-06-17 03:07:40 -07:00			`headers = {"User-Agent": "vLLM Client"}`
FastAPI-based working frontend (#10) 2023-03-29 14:48:56 +08:00			`pload = {`
			`"prompt": prompt,`
Rename servers and change port numbers to reduce confusion (#149) 2023-06-17 00:13:02 +08:00			`"stream": True,`
Enhance SamplingParams (#96) 2023-05-11 15:45:30 -07:00			`"max_tokens": 128,`
FastAPI-based working frontend (#10) 2023-03-29 14:48:56 +08:00			`}`
[Quality] Add code formatter and linter (#326) 2023-07-03 11:31:55 -07:00			`response = requests.post(args.model_url,`
			`headers=headers,`
			`json=pload,`
			`stream=True)`

			`for chunk in response.iter_lines(chunk_size=8192,`
			`decode_unicode=False,`
[Bugfix] Fix small typo in the example of Streaming delimiter (#14793) 2025-03-14 16:05:17 +08:00			`delimiter=b"\n"):`
FastAPI-based working frontend (#10) 2023-03-29 14:48:56 +08:00			`if chunk:`
			`data = json.loads(chunk.decode("utf-8"))`
			`output = data["text"][0]`
			`yield output`


			`def build_demo():`
			`with gr.Blocks() as demo:`
[Quality] Add code formatter and linter (#326) 2023-07-03 11:31:55 -07:00			`gr.Markdown("# vLLM text completion demo\n")`
			`inputbox = gr.Textbox(label="Input",`
			`placeholder="Enter text and press ENTER")`
			`outputbox = gr.Textbox(label="Output",`
			`placeholder="Generated result from the model")`
FastAPI-based working frontend (#10) 2023-03-29 14:48:56 +08:00			`inputbox.submit(http_bot, [inputbox], [outputbox])`
			`return demo`


[Misc] refactor argument parsing in examples (#16635) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com> 2025-04-15 16:05:30 +08:00			`def parse_args():`
FastAPI-based working frontend (#10) 2023-03-29 14:48:56 +08:00			`parser = argparse.ArgumentParser()`
API server support ipv4 / ipv6 dualstack (#1288) Co-authored-by: Zhuohan Li <zhuohan123@gmail.com> 2023-10-07 15:15:54 -07:00			`parser.add_argument("--host", type=str, default=None)`
Rename servers and change port numbers to reduce confusion (#149) 2023-06-17 00:13:02 +08:00			`parser.add_argument("--port", type=int, default=8001)`
[Quality] Add code formatter and linter (#326) 2023-07-03 11:31:55 -07:00			`parser.add_argument("--model-url",`
			`type=str,`
			`default="http://localhost:8000/generate")`
[Misc] refactor argument parsing in examples (#16635) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com> 2025-04-15 16:05:30 +08:00			`return parser.parse_args()`
FastAPI-based working frontend (#10) 2023-03-29 14:48:56 +08:00
[Misc] refactor argument parsing in examples (#16635) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com> 2025-04-15 16:05:30 +08:00
			`def main(args):`
FastAPI-based working frontend (#10) 2023-03-29 14:48:56 +08:00			`demo = build_demo()`
Remove deprecated parameter: concurrency_count (#2315) 2024-01-03 19:56:21 +02:00			`demo.queue().launch(server_name=args.host,`
			`server_port=args.port,`
			`share=True)`
[Misc] refactor argument parsing in examples (#16635) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com> 2025-04-15 16:05:30 +08:00

			`if __name__ == "__main__":`
			`args = parse_args()`
			`main(args)`