2023-02-09 11:24:15 +00:00
|
|
|
# CacheFlow
|
2023-02-24 12:04:49 +00:00
|
|
|
|
|
|
|
## Installation
|
|
|
|
|
|
|
|
```bash
|
2023-03-29 21:25:32 -07:00
|
|
|
pip install psutil numpy ray torch
|
|
|
|
pip install git+https://github.com/huggingface/transformers # Required for LLaMA.
|
|
|
|
pip install sentencepiece # Required for LlamaTokenizer.
|
|
|
|
pip install flash-attn # This may take up to 20 mins.
|
2023-02-24 12:04:49 +00:00
|
|
|
pip install -e .
|
|
|
|
```
|
|
|
|
|
2023-03-29 14:48:56 +08:00
|
|
|
## Test simple server
|
2023-02-24 12:04:49 +00:00
|
|
|
|
|
|
|
```bash
|
2023-03-22 04:45:42 +08:00
|
|
|
ray start --head
|
2023-03-29 14:48:56 +08:00
|
|
|
python simple_server.py
|
|
|
|
```
|
|
|
|
|
|
|
|
The detailed arguments for `simple_server.py` can be found by:
|
|
|
|
```bash
|
|
|
|
python simple_server.py --help
|
|
|
|
```
|
|
|
|
|
|
|
|
## FastAPI server
|
|
|
|
|
|
|
|
Install the following additional dependencies:
|
|
|
|
```bash
|
|
|
|
pip install fastapi uvicorn
|
|
|
|
```
|
|
|
|
|
|
|
|
To start the server:
|
|
|
|
```bash
|
|
|
|
ray start --head
|
|
|
|
python -m cacheflow.http_frontend.fastapi_frontend
|
|
|
|
```
|
|
|
|
|
|
|
|
To test the server:
|
|
|
|
```bash
|
|
|
|
python -m cacheflow.http_frontend.test_cli_client
|
|
|
|
```
|
|
|
|
|
|
|
|
## Gradio web server
|
|
|
|
|
|
|
|
Install the following additional dependencies:
|
|
|
|
```bash
|
|
|
|
pip install gradio
|
|
|
|
```
|
|
|
|
|
|
|
|
Start the server:
|
|
|
|
```bash
|
|
|
|
python -m cacheflow.http_frontend.fastapi_frontend
|
|
|
|
# At another terminal
|
|
|
|
python -m cacheflow.http_frontend.gradio_webserver
|
2023-02-24 12:04:49 +00:00
|
|
|
```
|