20231088/vllm - vllm - Luminance Code Repo

20231088/vllm

Author	SHA1	Message	Date
Woosuk Kwon	7c041ab578	Refactor system architecture (#82 )	2023-05-09 15:30:12 -07:00
Woosuk Kwon	c9d5b6d4a8	Replace FlashAttention with xformers (#70 )	2023-05-05 02:01:08 -07:00
Zhuohan Li	27f1410d06	New weight loader without np copy (#52 )	2023-05-03 15:32:04 +08:00
Woosuk Kwon	ee88a7e5f3	Add an option to use dummy model weights (#33 )	2023-04-08 23:36:12 -07:00
Woosuk Kwon	12659a0bd7	Add CUDA graph-based all reduce launcher (#26 )	2023-04-05 11:16:57 -07:00
Woosuk Kwon	897cb2ae28	Optimize data movement (#20 )	2023-04-02 00:30:17 -07:00
Zhuohan Li	2f49f15585	Support tensor parallel (#2 )	2023-03-21 13:45:42 -07:00
Woosuk Kwon	cfae35b861	Add miscellaneous updates (#8 )	2023-03-13 13:48:38 -07:00
Woosuk Kwon	1a7eb7da61	Support beam search & parallel generation (#7 )	2023-03-10 09:58:21 -08:00
Woosuk Kwon	0deacbce6e	Implement `single_query_cached_kv_attention` kernel (#3 )	2023-03-01 15:02:19 -08:00
Woosuk Kwon	1ce1333573	Set default dtype to half	2023-02-23 21:31:39 +00:00
Woosuk Kwon	fdd0f2f472	Minor	2023-02-23 20:23:47 +00:00
Woosuk Kwon	343cea3dbc	Add seq_ids to input metadata	2023-02-23 09:25:01 +00:00
Woosuk Kwon	4b1ac23f53	Fix slot mapping	2023-02-23 00:10:07 +00:00
Woosuk Kwon	8290fce47d	Add Worker class	2023-02-22 19:01:38 +00:00