20231088/vllm - vllm - Luminance Code Repo

20231088/vllm

Author	SHA1	Message	Date
Stephen Krider	9cabcb7645	Add Dockerfile (#1350 )	2023-10-31 12:36:47 -07:00
Jared Roesch	79a30912b8	Add py.typed so consumers of vLLM can get type checking (#1509 ) * Add py.typed so consumers of vLLM can get type checking * Update py.typed --------- Co-authored-by: aarnphm <29749331+aarnphm@users.noreply.github.com> Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>	2023-10-30 14:50:47 -07:00
chooper1	1f24755bf8	Support SqueezeLLM (#1326 ) Co-authored-by: squeeze-ai-lab <squeezeailab.bair@gmail.com> Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2023-10-21 23:14:59 -07:00
Woosuk Kwon	d0740dff1b	Fix error message on `TORCH_CUDA_ARCH_LIST` (#1239 ) Co-authored-by: Yunfeng Bai <yunfeng.bai@scale.com>	2023-10-14 14:47:43 -07:00
Antoni Baum	cf5cb1e33e	Allocate more shared memory to attention kernel (#1154 )	2023-09-26 22:27:13 -07:00
Woosuk Kwon	a425bd9a9a	[Setup] Enable `TORCH_CUDA_ARCH_LIST` for selecting target GPUs (#1074 )	2023-09-26 10:21:08 -07:00
Woosuk Kwon	e3e79e9e8a	Implement AWQ quantization support for LLaMA (#1032 ) Co-authored-by: Robert Irvine <robert@seamlessml.com> Co-authored-by: root <rirv938@gmail.com> Co-authored-by: Casper <casperbh.96@gmail.com> Co-authored-by: julian-q <julianhquevedo@gmail.com>	2023-09-16 00:03:37 -07:00
Woosuk Kwon	d6770d1f23	Update setup.py (#1006 )	2023-09-10 23:42:45 -07:00
Woosuk Kwon	a41c20435e	Add compute capability 8.9 to default targets (#829 )	2023-08-23 07:28:38 +09:00
Xudong Zhang	65fc1c3127	set default coompute capability according to cuda version (#773 )	2023-08-21 16:05:44 -07:00
Cody Yu	2b7d3aca2e	Update setup.py (#282 ) Co-authored-by: neubig <neubig@gmail.com>	2023-06-27 14:34:23 -07:00
Woosuk Kwon	570fb2e9cc	[PyPI] Fix package info in setup.py (#158 )	2023-06-19 18:05:01 -07:00
Woosuk Kwon	dcda03b4cb	Write README and front page of doc (#147 )	2023-06-18 03:19:38 -07:00
Woosuk Kwon	0b98ba15c7	Change the name to vLLM (#150 )	2023-06-17 03:07:40 -07:00
Woosuk Kwon	e38074b1e6	Support FP32 (#141 )	2023-06-07 00:40:21 -07:00
Woosuk Kwon	376725ce74	[PyPI] Packaging for PyPI distribution (#140 )	2023-06-05 20:03:14 -07:00
Woosuk Kwon	d721168449	Improve setup script & Add a guard for bfloat16 kernels (#130 )	2023-05-27 00:59:32 -07:00
Woosuk Kwon	7addca5935	Specify python package dependencies in requirements.txt (#78 )	2023-05-07 16:30:43 -07:00
Woosuk Kwon	e070829ae8	Support bfloat16 data type (#54 )	2023-05-03 14:09:44 -07:00
Woosuk Kwon	436e523bf1	Refactor attention kernels (#53 )	2023-05-03 13:40:13 -07:00
Woosuk Kwon	897cb2ae28	Optimize data movement (#20 )	2023-04-02 00:30:17 -07:00
Woosuk Kwon	09e9245478	Add custom kernel for RMS normalization (#16 )	2023-04-01 00:51:22 +08:00
Woosuk Kwon	88c0268a18	Implement custom kernel for LLaMA rotary embedding (#14 )	2023-03-30 11:04:21 -07:00
Woosuk Kwon	0deacbce6e	Implement `single_query_cached_kv_attention` kernel (#3 )	2023-03-01 15:02:19 -08:00
Woosuk Kwon	ffad4e1e03	cache_kernel -> cache_kernels	2023-02-16 20:05:45 +00:00
Woosuk Kwon	6f058c7ba8	Implement cache ops	2023-02-16 07:47:03 +00:00
Woosuk Kwon	3be29a1104	Add blank setup file	2023-02-09 11:37:06 +00:00

27 Commits