zhaoyang-star
|
9090bf02e7
|
Support FP8-E5M2 KV Cache (#2279)
Co-authored-by: zhaoyang <zhao.yang16@zte.com.cn>
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
|
2024-01-28 16:43:54 -08:00 |
|
Casper
|
beb89f68b4
|
AWQ: Up to 2.66x higher throughput (#2566)
|
2024-01-26 23:53:17 -08:00 |
|
Woosuk Kwon
|
6ef00b03a2
|
Enable CUDA graph for GPTQ & SqueezeLLM (#2318)
|
2024-01-03 09:52:29 -08:00 |
|
Jee Li
|
77af974b40
|
[FIX] Support non-zero CUDA devices in custom kernels (#1959)
|
2024-01-02 19:09:59 -08:00 |
|
kliuae
|
1b7c791d60
|
[ROCm] Fixes for GPTQ on ROCm (#2180)
|
2023-12-18 10:41:04 -08:00 |
|
CHU Tianxiang
|
0fbfc4b81b
|
Add GPTQ support (#916)
|
2023-12-15 03:04:22 -08:00 |
|
TJian
|
6ccc0bfffb
|
Merge EmbeddedLLM/vllm-rocm into vLLM main (#1836)
Co-authored-by: Philipp Moritz <pcmoritz@gmail.com>
Co-authored-by: Amir Balwel <amoooori04@gmail.com>
Co-authored-by: root <kuanfu.liu@akirakan.com>
Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com>
Co-authored-by: kuanfu <kuanfu.liu@embeddedllm.com>
Co-authored-by: miloice <17350011+kliuae@users.noreply.github.com>
|
2023-12-07 23:16:52 -08:00 |
|
chooper1
|
1f24755bf8
|
Support SqueezeLLM (#1326)
Co-authored-by: squeeze-ai-lab <squeezeailab.bair@gmail.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2023-10-21 23:14:59 -07:00 |
|
Woosuk Kwon
|
29678cd213
|
Minor fix on AWQ kernel launch (#1356)
|
2023-10-15 21:53:56 -07:00 |
|
CHU Tianxiang
|
980dd4a2c4
|
Fix overflow in awq kernel (#1295)
Co-authored-by: 楚天翔 <tianxiang.ctx@alibaba-inc.com>
|
2023-10-11 00:19:53 -07:00 |
|
twaka
|
8285736840
|
workaround of AWQ for Turing GPUs (#1252)
|
2023-10-10 19:48:16 -07:00 |
|
Woosuk Kwon
|
2b1c116b5a
|
Add minimum capability requirement for AWQ (#1064)
|
2023-09-18 12:02:01 -07:00 |
|
Woosuk Kwon
|
e3e79e9e8a
|
Implement AWQ quantization support for LLaMA (#1032)
Co-authored-by: Robert Irvine <robert@seamlessml.com>
Co-authored-by: root <rirv938@gmail.com>
Co-authored-by: Casper <casperbh.96@gmail.com>
Co-authored-by: julian-q <julianhquevedo@gmail.com>
|
2023-09-16 00:03:37 -07:00 |
|