15 Commits

Author SHA1 Message Date
Woosuk Kwon
c84e924287
[Minor] Fix a dtype bug (#79) 2023-05-06 02:12:12 -07:00
Woosuk Kwon
189ae23133
Use dtype from model config & Add Dolly V2 (#63) 2023-05-04 03:05:37 -07:00
Woosuk Kwon
e548c1488a
Add support for GPT-2 (#60) 2023-05-04 02:59:56 -07:00
Zhuohan Li
27f1410d06
New weight loader without np copy (#52) 2023-05-03 15:32:04 +08:00
Woosuk Kwon
a96d63c21d
Add support for GPT-NeoX (Pythia) (#50) 2023-04-28 00:32:10 -07:00
Woosuk Kwon
ee88a7e5f3
Add an option to use dummy model weights (#33) 2023-04-08 23:36:12 -07:00
Woosuk Kwon
80a2f812f1
Implement LLaMA (#9)
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
2023-03-30 12:25:32 +08:00
Zhuohan Li
721fa3df15
FastAPI-based working frontend (#10) 2023-03-29 14:48:56 +08:00
Zhuohan Li
2f49f15585
Support tensor parallel (#2) 2023-03-21 13:45:42 -07:00
Woosuk Kwon
e9d3f2ff77
Add memory analyzer & utomatically configure KV cache size (#6) 2023-03-11 23:23:14 -08:00
Woosuk Kwon
1a7eb7da61
Support beam search & parallel generation (#7) 2023-03-10 09:58:21 -08:00
Woosuk Kwon
cbf8779afa
Fix a bug in tying OPT embeddings (#1) 2023-02-24 16:29:36 -08:00
Woosuk Kwon
1ce1333573 Set default dtype to half 2023-02-23 21:31:39 +00:00
Woosuk Kwon
608f74ffe5 Minor 2023-02-22 18:08:25 +00:00
Woosuk Kwon
709a69176e Move worker/models -> models 2023-02-22 18:03:48 +00:00