Zhuohan Li
|
c128d69856
|
Fix README.md Link (#927)
|
2023-08-31 17:18:34 -07:00 |
|
Zhuohan Li
|
0080d8329d
|
Add acknowledgement to a16z grant
|
2023-08-30 02:26:47 -07:00 |
|
ldwang
|
85ebcda94d
|
Fix typo of Aquila in README.md (#836)
|
2023-08-22 20:48:36 -07:00 |
|
Zhuohan Li
|
14f9c72bfd
|
Update Supported Model List (#825)
|
2023-08-22 11:51:44 -07:00 |
|
Zhuohan Li
|
f7389f4763
|
[Doc] Add Baichuan 13B to supported models (#656)
|
2023-08-02 16:45:12 -07:00 |
|
Zhuohan Li
|
1b0bd0fe8a
|
Add Falcon support (new) (#592)
|
2023-08-02 14:04:39 -07:00 |
|
Zhuohan Li
|
df5dd3c68e
|
Add Baichuan-7B to README (#494)
|
2023-07-25 15:25:12 -07:00 |
|
Zhuohan Li
|
6fc2a38b11
|
Add support for LLaMA-2 (#505)
|
2023-07-20 11:38:27 -07:00 |
|
Andre Slavescu
|
c894836108
|
[Model] Add support for GPT-J (#226)
Co-authored-by: woWoosuk Kwon <woosuk.kwon@berkeley.edu>
|
2023-07-08 17:55:16 -07:00 |
|
Woosuk Kwon
|
404422f42e
|
[Model] Add support for MPT (#334)
|
2023-07-03 16:47:53 -07:00 |
|
Woosuk Kwon
|
e41f06702c
|
Add support for BLOOM (#331)
|
2023-07-03 13:12:35 -07:00 |
|
Zhanghao Wu
|
f72297562f
|
Add news for the vllm+skypilot example (#314)
|
2023-06-29 12:32:37 -07:00 |
|
Zhuohan Li
|
2cf1a333b6
|
[Doc] Documentation for distributed inference (#261)
|
2023-06-26 11:34:23 -07:00 |
|
Lianmin Zheng
|
6214dd6ce9
|
Update README.md (#236)
|
2023-06-25 16:58:06 -07:00 |
|
Woosuk Kwon
|
665c48963b
|
[Docs] Add GPTBigCode to supported models (#213)
|
2023-06-22 15:05:11 -07:00 |
|
Zhuohan Li
|
033f5c78f5
|
Remove e.g. in README (#167)
|
2023-06-20 14:00:28 +08:00 |
|
Woosuk Kwon
|
794e578de0
|
[Minor] Fix URLs (#166)
|
2023-06-19 22:57:14 -07:00 |
|
Zhuohan Li
|
fc72e39de3
|
Change image urls (#164)
|
2023-06-20 11:15:15 +08:00 |
|
Woosuk Kwon
|
b7e62d3454
|
Fix repo & documentation URLs (#163)
|
2023-06-19 20:03:40 -07:00 |
|
Woosuk Kwon
|
364536acd1
|
[Docs] Minor fix (#162)
|
2023-06-19 19:58:23 -07:00 |
|
Zhuohan Li
|
0b32a987dd
|
Add and list supported models in README (#161)
|
2023-06-20 10:57:46 +08:00 |
|
Zhuohan Li
|
a255885f83
|
Add logo and polish readme (#156)
|
2023-06-19 16:31:13 +08:00 |
|
Woosuk Kwon
|
dcda03b4cb
|
Write README and front page of doc (#147)
|
2023-06-18 03:19:38 -07:00 |
|
Woosuk Kwon
|
0b98ba15c7
|
Change the name to vLLM (#150)
|
2023-06-17 03:07:40 -07:00 |
|
Woosuk Kwon
|
c3442c1f6f
|
Refactor system architecture (#109)
|
2023-05-20 13:06:59 -07:00 |
|
Woosuk Kwon
|
7addca5935
|
Specify python package dependencies in requirements.txt (#78)
|
2023-05-07 16:30:43 -07:00 |
|
Woosuk Kwon
|
c9d5b6d4a8
|
Replace FlashAttention with xformers (#70)
|
2023-05-05 02:01:08 -07:00 |
|
Woosuk Kwon
|
2c5cd0defe
|
Add ninja to dependency (#21)
|
2023-04-01 19:00:20 -07:00 |
|
Zhuohan Li
|
e3f00d191e
|
Modify README to include info on loading LLaMA (#18)
|
2023-04-01 01:07:57 +08:00 |
|
Woosuk Kwon
|
80a2f812f1
|
Implement LLaMA (#9)
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
|
2023-03-30 12:25:32 +08:00 |
|
Zhuohan Li
|
721fa3df15
|
FastAPI-based working frontend (#10)
|
2023-03-29 14:48:56 +08:00 |
|
Zhuohan Li
|
2f49f15585
|
Support tensor parallel (#2)
|
2023-03-21 13:45:42 -07:00 |
|
Woosuk Kwon
|
e9d3f2ff77
|
Add memory analyzer & utomatically configure KV cache size (#6)
|
2023-03-11 23:23:14 -08:00 |
|
Woosuk Kwon
|
3e9f991d6a
|
Use FlashAttention for multi_query_kv_attention (#4)
|
2023-03-01 21:13:08 -08:00 |
|
Woosuk Kwon
|
c84c708a1d
|
Add README
|
2023-02-24 12:04:49 +00:00 |
|
Woosuk Kwon
|
e7d9d9c08c
|
Initial commit
|
2023-02-09 11:24:15 +00:00 |
|