9 Commits

Author SHA1 Message Date
Woosuk Kwon
c9d5b6d4a8
Replace FlashAttention with xformers (#70) 2023-05-05 02:01:08 -07:00
Woosuk Kwon
e548c1488a
Add support for GPT-2 (#60) 2023-05-04 02:59:56 -07:00
Woosuk Kwon
a96d63c21d
Add support for GPT-NeoX (Pythia) (#50) 2023-04-28 00:32:10 -07:00
Woosuk Kwon
88c0268a18
Implement custom kernel for LLaMA rotary embedding (#14) 2023-03-30 11:04:21 -07:00
Woosuk Kwon
80a2f812f1
Implement LLaMA (#9)
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
2023-03-30 12:25:32 +08:00
Zhuohan Li
721fa3df15
FastAPI-based working frontend (#10) 2023-03-29 14:48:56 +08:00
Zhuohan Li
2f49f15585
Support tensor parallel (#2) 2023-03-21 13:45:42 -07:00
Woosuk Kwon
cfae35b861
Add miscellaneous updates (#8) 2023-03-13 13:48:38 -07:00
Woosuk Kwon
e9d3f2ff77
Add memory analyzer & utomatically configure KV cache size (#6) 2023-03-11 23:23:14 -08:00