vllm/README.md

56 lines
1022 B
Markdown
Raw Normal View History

2023-02-09 11:24:15 +00:00
# CacheFlow
2023-02-24 12:04:49 +00:00
## Installation
```bash
pip install psutil numpy ray torch
pip install git+https://github.com/huggingface/transformers # Required for LLaMA.
pip install sentencepiece # Required for LlamaTokenizer.
pip install flash-attn # This may take up to 20 mins.
2023-02-24 12:04:49 +00:00
pip install -e .
```
2023-03-29 14:48:56 +08:00
## Test simple server
2023-02-24 12:04:49 +00:00
```bash
2023-03-22 04:45:42 +08:00
ray start --head
2023-03-29 14:48:56 +08:00
python simple_server.py
```
The detailed arguments for `simple_server.py` can be found by:
```bash
python simple_server.py --help
```
## FastAPI server
Install the following additional dependencies:
```bash
pip install fastapi uvicorn
```
To start the server:
```bash
ray start --head
python -m cacheflow.http_frontend.fastapi_frontend
```
To test the server:
```bash
python -m cacheflow.http_frontend.test_cli_client
```
## Gradio web server
Install the following additional dependencies:
```bash
pip install gradio
```
Start the server:
```bash
python -m cacheflow.http_frontend.fastapi_frontend
# At another terminal
python -m cacheflow.http_frontend.gradio_webserver
2023-02-24 12:04:49 +00:00
```