Stephen Krider
9cabcb7645
Add Dockerfile ( #1350 )
2023-10-31 12:36:47 -07:00
Jared Roesch
79a30912b8
Add py.typed so consumers of vLLM can get type checking ( #1509 )
...
* Add py.typed so consumers of vLLM can get type checking
* Update py.typed
---------
Co-authored-by: aarnphm <29749331+aarnphm@users.noreply.github.com>
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
2023-10-30 14:50:47 -07:00
chooper1
1f24755bf8
Support SqueezeLLM ( #1326 )
...
Co-authored-by: squeeze-ai-lab <squeezeailab.bair@gmail.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2023-10-21 23:14:59 -07:00
Woosuk Kwon
d0740dff1b
Fix error message on TORCH_CUDA_ARCH_LIST
( #1239 )
...
Co-authored-by: Yunfeng Bai <yunfeng.bai@scale.com>
2023-10-14 14:47:43 -07:00
Antoni Baum
cf5cb1e33e
Allocate more shared memory to attention kernel ( #1154 )
2023-09-26 22:27:13 -07:00
Woosuk Kwon
a425bd9a9a
[Setup] Enable TORCH_CUDA_ARCH_LIST
for selecting target GPUs ( #1074 )
2023-09-26 10:21:08 -07:00
Woosuk Kwon
e3e79e9e8a
Implement AWQ quantization support for LLaMA ( #1032 )
...
Co-authored-by: Robert Irvine <robert@seamlessml.com>
Co-authored-by: root <rirv938@gmail.com>
Co-authored-by: Casper <casperbh.96@gmail.com>
Co-authored-by: julian-q <julianhquevedo@gmail.com>
2023-09-16 00:03:37 -07:00
Woosuk Kwon
d6770d1f23
Update setup.py ( #1006 )
2023-09-10 23:42:45 -07:00
Woosuk Kwon
a41c20435e
Add compute capability 8.9 to default targets ( #829 )
2023-08-23 07:28:38 +09:00
Xudong Zhang
65fc1c3127
set default coompute capability according to cuda version ( #773 )
2023-08-21 16:05:44 -07:00
Cody Yu
2b7d3aca2e
Update setup.py ( #282 )
...
Co-authored-by: neubig <neubig@gmail.com>
2023-06-27 14:34:23 -07:00
Woosuk Kwon
570fb2e9cc
[PyPI] Fix package info in setup.py ( #158 )
2023-06-19 18:05:01 -07:00
Woosuk Kwon
dcda03b4cb
Write README and front page of doc ( #147 )
2023-06-18 03:19:38 -07:00
Woosuk Kwon
0b98ba15c7
Change the name to vLLM ( #150 )
2023-06-17 03:07:40 -07:00
Woosuk Kwon
e38074b1e6
Support FP32 ( #141 )
2023-06-07 00:40:21 -07:00
Woosuk Kwon
376725ce74
[PyPI] Packaging for PyPI distribution ( #140 )
2023-06-05 20:03:14 -07:00
Woosuk Kwon
d721168449
Improve setup script & Add a guard for bfloat16 kernels ( #130 )
2023-05-27 00:59:32 -07:00
Woosuk Kwon
7addca5935
Specify python package dependencies in requirements.txt ( #78 )
2023-05-07 16:30:43 -07:00
Woosuk Kwon
e070829ae8
Support bfloat16 data type ( #54 )
2023-05-03 14:09:44 -07:00
Woosuk Kwon
436e523bf1
Refactor attention kernels ( #53 )
2023-05-03 13:40:13 -07:00
Woosuk Kwon
897cb2ae28
Optimize data movement ( #20 )
2023-04-02 00:30:17 -07:00
Woosuk Kwon
09e9245478
Add custom kernel for RMS normalization ( #16 )
2023-04-01 00:51:22 +08:00
Woosuk Kwon
88c0268a18
Implement custom kernel for LLaMA rotary embedding ( #14 )
2023-03-30 11:04:21 -07:00
Woosuk Kwon
0deacbce6e
Implement single_query_cached_kv_attention
kernel ( #3 )
2023-03-01 15:02:19 -08:00
Woosuk Kwon
ffad4e1e03
cache_kernel -> cache_kernels
2023-02-16 20:05:45 +00:00
Woosuk Kwon
6f058c7ba8
Implement cache ops
2023-02-16 07:47:03 +00:00
Woosuk Kwon
3be29a1104
Add blank setup file
2023-02-09 11:37:06 +00:00