Woosuk Kwon
|
d0740dff1b
|
Fix error message on TORCH_CUDA_ARCH_LIST (#1239)
Co-authored-by: Yunfeng Bai <yunfeng.bai@scale.com>
|
2023-10-14 14:47:43 -07:00 |
|
Antoni Baum
|
cf5cb1e33e
|
Allocate more shared memory to attention kernel (#1154)
|
2023-09-26 22:27:13 -07:00 |
|
Woosuk Kwon
|
a425bd9a9a
|
[Setup] Enable TORCH_CUDA_ARCH_LIST for selecting target GPUs (#1074)
|
2023-09-26 10:21:08 -07:00 |
|
Woosuk Kwon
|
e3e79e9e8a
|
Implement AWQ quantization support for LLaMA (#1032)
Co-authored-by: Robert Irvine <robert@seamlessml.com>
Co-authored-by: root <rirv938@gmail.com>
Co-authored-by: Casper <casperbh.96@gmail.com>
Co-authored-by: julian-q <julianhquevedo@gmail.com>
|
2023-09-16 00:03:37 -07:00 |
|
Woosuk Kwon
|
d6770d1f23
|
Update setup.py (#1006)
|
2023-09-10 23:42:45 -07:00 |
|
Woosuk Kwon
|
a41c20435e
|
Add compute capability 8.9 to default targets (#829)
|
2023-08-23 07:28:38 +09:00 |
|
Xudong Zhang
|
65fc1c3127
|
set default coompute capability according to cuda version (#773)
|
2023-08-21 16:05:44 -07:00 |
|
Cody Yu
|
2b7d3aca2e
|
Update setup.py (#282)
Co-authored-by: neubig <neubig@gmail.com>
|
2023-06-27 14:34:23 -07:00 |
|
Woosuk Kwon
|
570fb2e9cc
|
[PyPI] Fix package info in setup.py (#158)
|
2023-06-19 18:05:01 -07:00 |
|
Woosuk Kwon
|
dcda03b4cb
|
Write README and front page of doc (#147)
|
2023-06-18 03:19:38 -07:00 |
|
Woosuk Kwon
|
0b98ba15c7
|
Change the name to vLLM (#150)
|
2023-06-17 03:07:40 -07:00 |
|
Woosuk Kwon
|
e38074b1e6
|
Support FP32 (#141)
|
2023-06-07 00:40:21 -07:00 |
|
Woosuk Kwon
|
376725ce74
|
[PyPI] Packaging for PyPI distribution (#140)
|
2023-06-05 20:03:14 -07:00 |
|
Woosuk Kwon
|
d721168449
|
Improve setup script & Add a guard for bfloat16 kernels (#130)
|
2023-05-27 00:59:32 -07:00 |
|
Woosuk Kwon
|
7addca5935
|
Specify python package dependencies in requirements.txt (#78)
|
2023-05-07 16:30:43 -07:00 |
|
Woosuk Kwon
|
e070829ae8
|
Support bfloat16 data type (#54)
|
2023-05-03 14:09:44 -07:00 |
|
Woosuk Kwon
|
436e523bf1
|
Refactor attention kernels (#53)
|
2023-05-03 13:40:13 -07:00 |
|
Woosuk Kwon
|
897cb2ae28
|
Optimize data movement (#20)
|
2023-04-02 00:30:17 -07:00 |
|
Woosuk Kwon
|
09e9245478
|
Add custom kernel for RMS normalization (#16)
|
2023-04-01 00:51:22 +08:00 |
|
Woosuk Kwon
|
88c0268a18
|
Implement custom kernel for LLaMA rotary embedding (#14)
|
2023-03-30 11:04:21 -07:00 |
|
Woosuk Kwon
|
0deacbce6e
|
Implement single_query_cached_kv_attention kernel (#3)
|
2023-03-01 15:02:19 -08:00 |
|
Woosuk Kwon
|
ffad4e1e03
|
cache_kernel -> cache_kernels
|
2023-02-16 20:05:45 +00:00 |
|
Woosuk Kwon
|
6f058c7ba8
|
Implement cache ops
|
2023-02-16 07:47:03 +00:00 |
|
Woosuk Kwon
|
3be29a1104
|
Add blank setup file
|
2023-02-09 11:37:06 +00:00 |
|