9 Commits

Author SHA1 Message Date
Woosuk Kwon
7c041ab578
Refactor system architecture (#82) 2023-05-09 15:30:12 -07:00
Zhuohan Li
27f1410d06
New weight loader without np copy (#52) 2023-05-03 15:32:04 +08:00
Zhuohan Li
4858f3bb45
Add an option to launch cacheflow without ray (#51) 2023-04-30 15:42:17 +08:00
Woosuk Kwon
ee88a7e5f3
Add an option to use dummy model weights (#33) 2023-04-08 23:36:12 -07:00
Woosuk Kwon
12659a0bd7
Add CUDA graph-based all reduce launcher (#26) 2023-04-05 11:16:57 -07:00
Zhuohan Li
2f49f15585
Support tensor parallel (#2) 2023-03-21 13:45:42 -07:00
Woosuk Kwon
1a7eb7da61
Support beam search & parallel generation (#7) 2023-03-10 09:58:21 -08:00
Woosuk Kwon
1ce1333573 Set default dtype to half 2023-02-23 21:31:39 +00:00
Woosuk Kwon
1f6c7ef437 Add controller 2023-02-23 09:32:19 +00:00