CacheFlow

Installation

pip install psutil numpy torch transformers
pip install flash-attn # This may take up to 10 mins.
pip install -e .

Test simple server

ray start --head
python simple_server.py

The detailed arguments for simple_server.py can be found by:

python simple_server.py --help

FastAPI server

Install the following additional dependencies:

pip install fastapi uvicorn

To start the server:

ray start --head
python -m cacheflow.http_frontend.fastapi_frontend

To test the server:

python -m cacheflow.http_frontend.test_cli_client

Gradio web server

Install the following additional dependencies:

pip install gradio

Start the server:

python -m cacheflow.http_frontend.fastapi_frontend
# At another terminal
python -m cacheflow.http_frontend.gradio_webserver
Description
No description provided
Readme 47 MiB
Languages
Python 84.5%
Cuda 10.1%
C++ 3.9%
C 0.6%
Shell 0.5%
Other 0.3%