Zhuohan Li
|
2f49f15585
|
Support tensor parallel (#2)
|
2023-03-21 13:45:42 -07:00 |
|
Woosuk Kwon
|
cfae35b861
|
Add miscellaneous updates (#8)
|
2023-03-13 13:48:38 -07:00 |
|
Woosuk Kwon
|
04e5acc08e
|
Fix a bug in 1D input shape (#5)
|
2023-03-06 10:05:27 -08:00 |
|
Woosuk Kwon
|
3e9f991d6a
|
Use FlashAttention for multi_query_kv_attention (#4)
|
2023-03-01 21:13:08 -08:00 |
|
Woosuk Kwon
|
0deacbce6e
|
Implement single_query_cached_kv_attention kernel (#3)
|
2023-03-01 15:02:19 -08:00 |
|
Woosuk Kwon
|
762fd1c3fa
|
Refactor and annotate types for attention
|
2023-02-24 08:58:46 +00:00 |
|
Woosuk Kwon
|
7f22f90e8c
|
Remove xformers
|
2023-02-24 08:36:16 +00:00 |
|
Woosuk Kwon
|
932844f1cd
|
Fix attention
|
2023-02-23 23:02:25 +00:00 |
|
Woosuk Kwon
|
ba84b8728a
|
Fix attention
|
2023-02-23 22:29:46 +00:00 |
|
Woosuk Kwon
|
87e0bcd426
|
Fix attention
|
2023-02-23 21:32:02 +00:00 |
|
Woosuk Kwon
|
d4bc1a4d24
|
Add unoptimized OPT Attention
|
2023-02-23 09:31:55 +00:00 |
|