vllm/README.md

# CacheFlow

## Installation

```bash
pip install psutil numpy ray torch
pip install git+https://github.com/huggingface/transformers  # Required for LLaMA.
pip install sentencepiece  # Required for LlamaTokenizer.
pip install flash-attn  # This may take up to 20 mins.
pip install -e .
```

## Test simple server

```bash
ray start --head
python simple_server.py
```

The detailed arguments for `simple_server.py` can be found by:
```bash
python simple_server.py --help
```

## FastAPI server

Install the following additional dependencies:
```bash
pip install fastapi uvicorn
```

To start the server:
```bash
ray start --head
python -m cacheflow.http_frontend.fastapi_frontend
```

To test the server:
```bash
python -m cacheflow.http_frontend.test_cli_client
```

## Gradio web server

Install the following additional dependencies:
```bash
pip install gradio
```

Start the server:
```bash
python -m cacheflow.http_frontend.fastapi_frontend
# At another terminal
python -m cacheflow.http_frontend.gradio_webserver
```
Initial commit 2023-02-09 11:24:15 +00:00			`# CacheFlow`
Add README 2023-02-24 12:04:49 +00:00
			`## Installation`

			```bash
Implement LLaMA (#9) Co-authored-by: Zhuohan Li <zhuohan123@gmail.com> 2023-03-29 21:25:32 -07:00			`pip install psutil numpy ray torch`
			`pip install git+https://github.com/huggingface/transformers # Required for LLaMA.`
			`pip install sentencepiece # Required for LlamaTokenizer.`
			`pip install flash-attn # This may take up to 20 mins.`
Add README 2023-02-24 12:04:49 +00:00			`pip install -e .`
			```

FastAPI-based working frontend (#10) 2023-03-29 14:48:56 +08:00			`## Test simple server`
Add README 2023-02-24 12:04:49 +00:00
			```bash
Support tensor parallel (#2) 2023-03-22 04:45:42 +08:00			`ray start --head`
FastAPI-based working frontend (#10) 2023-03-29 14:48:56 +08:00			`python simple_server.py`
			```

			The detailed arguments for `simple_server.py` can be found by:
			```bash
			`python simple_server.py --help`
			```

			`## FastAPI server`

			`Install the following additional dependencies:`
			```bash
			`pip install fastapi uvicorn`
			```

			`To start the server:`
			```bash
			`ray start --head`
			`python -m cacheflow.http_frontend.fastapi_frontend`
			```

			`To test the server:`
			```bash
			`python -m cacheflow.http_frontend.test_cli_client`
			```

			`## Gradio web server`

			`Install the following additional dependencies:`
			```bash
			`pip install gradio`
			```

			`Start the server:`
			```bash
			`python -m cacheflow.http_frontend.fastapi_frontend`
			`# At another terminal`
			`python -m cacheflow.http_frontend.gradio_webserver`
Add README 2023-02-24 12:04:49 +00:00			```