Woosuk Kwon
abfc4f3387
[Misc] Use dataclass for InputMetadata ( #3452 )
...
Co-authored-by: youkaichao <youkaichao@126.com>
2024-03-17 10:02:46 +00:00
Simon Mo
6b78837b29
Fix setup.py neuron-ls issue ( #2671 )
2024-03-16 16:00:25 -07:00
Simon Mo
8e67598aa6
[Misc] fix line length for entire codebase ( #3444 )
2024-03-16 00:36:29 -07:00
youkaichao
604f235937
[Misc] add error message in non linux platform ( #3438 )
2024-03-15 21:21:37 +00:00
陈序
739c350c19
[Minor Fix] Use cupy-cuda11x in CUDA 11.8 build ( #3256 )
2024-03-13 09:43:24 -07:00
Zhuohan Li
2f8844ba08
Re-enable the 80 char line width limit ( #3305 )
2024-03-10 19:49:14 -07:00
Woosuk Kwon
1cb0cc2975
[FIX] Make flash_attn
optional ( #3269 )
2024-03-08 10:52:20 -08:00
Woosuk Kwon
2daf23ab0c
Separate attention backends ( #3005 )
2024-03-07 01:45:50 -08:00
Robert Shaw
c0c2335ce0
Integrate Marlin Kernels for Int4 GPTQ inference ( #2497 )
...
Co-authored-by: Robert Shaw <114415538+rib-2@users.noreply.github.com>
Co-authored-by: alexm <alexm@neuralmagic.com>
2024-03-01 12:47:51 -08:00
Billy Cao
2c08ff23c0
Fix building from source on WSL ( #3112 )
2024-02-29 11:13:58 -08:00
Philipp Moritz
cfc15a1031
Optimize Triton MoE Kernel ( #2979 )
...
Co-authored-by: Cade Daniel <edacih@gmail.com>
2024-02-26 13:48:56 -08:00
James Whedbee
264017a2bf
[ROCm] include gfx908 as supported ( #2792 )
2024-02-19 17:58:59 -08:00
Hongxia Yang
0580aab02f
[ROCm] support Radeon™ 7900 series (gfx1100) without using flash-attention ( #2768 )
2024-02-10 23:14:37 -08:00
Philipp Moritz
931746bc6d
Add documentation on how to do incremental builds ( #2796 )
2024-02-07 14:42:02 -08:00
Woosuk Kwon
f0d4e14557
Add fused top-K softmax kernel for MoE ( #2769 )
2024-02-05 17:38:02 -08:00
Douglas Lehr
2ccee3def6
[ROCm] Fixup arch checks for ROCM ( #2627 )
2024-02-05 14:59:09 -08:00
wangding zeng
5d60def02c
DeepseekMoE support with Fused MoE kernel ( #2453 )
...
Co-authored-by: roy <jasonailu87@gmail.com>
2024-01-29 21:19:48 -08:00
Rasmus Larsen
ea8489fce2
ROCm: Allow setting compilation target ( #2581 )
2024-01-29 10:52:31 -08:00
zhaoyang-star
9090bf02e7
Support FP8-E5M2 KV Cache ( #2279 )
...
Co-authored-by: zhaoyang <zhao.yang16@zte.com.cn>
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
2024-01-28 16:43:54 -08:00
Hanzhi Zhou
380170038e
Implement custom all reduce kernels ( #2192 )
2024-01-27 12:46:35 -08:00
Philipp Moritz
390b495ff3
Don't build punica kernels by default ( #2605 )
2024-01-26 15:19:19 -08:00
Hongxia Yang
6b7de1a030
[ROCm] add support to ROCm 6.0 and MI300 ( #2274 )
2024-01-26 12:41:10 -08:00
Antoni Baum
9b945daaf1
[Experimental] Add multi-LoRA support ( #1804 )
...
Co-authored-by: Chen Shen <scv119@gmail.com>
Co-authored-by: Shreyas Krishnaswamy <shrekris@anyscale.com>
Co-authored-by: Avnish Narayan <avnish@anyscale.com>
2024-01-23 15:26:37 -08:00
Liangfu Chen
18473cf498
[Neuron] Add an option to build with neuron ( #2065 )
2024-01-18 10:58:50 -08:00
Simon Mo
6e01e8c1c8
[CI] Add Buildkite ( #2355 )
2024-01-14 12:37:58 -08:00
kliuae
1b7c791d60
[ROCm] Fixes for GPTQ on ROCm ( #2180 )
2023-12-18 10:41:04 -08:00
Woosuk Kwon
2acd76f346
[ROCm] Temporarily remove GPTQ ROCm support ( #2138 )
2023-12-15 17:13:58 -08:00
CHU Tianxiang
0fbfc4b81b
Add GPTQ support ( #916 )
2023-12-15 03:04:22 -08:00
TJian
6ccc0bfffb
Merge EmbeddedLLM/vllm-rocm into vLLM main ( #1836 )
...
Co-authored-by: Philipp Moritz <pcmoritz@gmail.com>
Co-authored-by: Amir Balwel <amoooori04@gmail.com>
Co-authored-by: root <kuanfu.liu@akirakan.com>
Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com>
Co-authored-by: kuanfu <kuanfu.liu@embeddedllm.com>
Co-authored-by: miloice <17350011+kliuae@users.noreply.github.com>
2023-12-07 23:16:52 -08:00
Daya Khudia
c8e7eb1eb3
fix typo in getenv call ( #1972 )
2023-12-07 16:04:41 -08:00
AguirreNicolas
24f60a54f4
[Docker] Adding number of nvcc_threads during build as envar ( #1893 )
2023-12-07 11:00:32 -08:00
Yanming W
e0c6f556e8
[Build] Avoid building too many extensions ( #1624 )
2023-11-23 16:31:19 -08:00
Simon Mo
5ffc0d13a2
Migrate linter from pylint
to ruff
( #1665 )
2023-11-20 11:58:01 -08:00
Woosuk Kwon
fd58b73a40
Build CUDA11.8 wheels for release ( #1596 )
2023-11-09 03:52:29 -08:00
Stephen Krider
9cabcb7645
Add Dockerfile ( #1350 )
2023-10-31 12:36:47 -07:00
Jared Roesch
79a30912b8
Add py.typed so consumers of vLLM can get type checking ( #1509 )
...
* Add py.typed so consumers of vLLM can get type checking
* Update py.typed
---------
Co-authored-by: aarnphm <29749331+aarnphm@users.noreply.github.com>
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
2023-10-30 14:50:47 -07:00
chooper1
1f24755bf8
Support SqueezeLLM ( #1326 )
...
Co-authored-by: squeeze-ai-lab <squeezeailab.bair@gmail.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2023-10-21 23:14:59 -07:00
Woosuk Kwon
d0740dff1b
Fix error message on TORCH_CUDA_ARCH_LIST
( #1239 )
...
Co-authored-by: Yunfeng Bai <yunfeng.bai@scale.com>
2023-10-14 14:47:43 -07:00
Antoni Baum
cf5cb1e33e
Allocate more shared memory to attention kernel ( #1154 )
2023-09-26 22:27:13 -07:00
Woosuk Kwon
a425bd9a9a
[Setup] Enable TORCH_CUDA_ARCH_LIST
for selecting target GPUs ( #1074 )
2023-09-26 10:21:08 -07:00
Woosuk Kwon
e3e79e9e8a
Implement AWQ quantization support for LLaMA ( #1032 )
...
Co-authored-by: Robert Irvine <robert@seamlessml.com>
Co-authored-by: root <rirv938@gmail.com>
Co-authored-by: Casper <casperbh.96@gmail.com>
Co-authored-by: julian-q <julianhquevedo@gmail.com>
2023-09-16 00:03:37 -07:00
Woosuk Kwon
d6770d1f23
Update setup.py ( #1006 )
2023-09-10 23:42:45 -07:00
Woosuk Kwon
a41c20435e
Add compute capability 8.9 to default targets ( #829 )
2023-08-23 07:28:38 +09:00
Xudong Zhang
65fc1c3127
set default coompute capability according to cuda version ( #773 )
2023-08-21 16:05:44 -07:00
Cody Yu
2b7d3aca2e
Update setup.py ( #282 )
...
Co-authored-by: neubig <neubig@gmail.com>
2023-06-27 14:34:23 -07:00
Woosuk Kwon
570fb2e9cc
[PyPI] Fix package info in setup.py ( #158 )
2023-06-19 18:05:01 -07:00
Woosuk Kwon
dcda03b4cb
Write README and front page of doc ( #147 )
2023-06-18 03:19:38 -07:00
Woosuk Kwon
0b98ba15c7
Change the name to vLLM ( #150 )
2023-06-17 03:07:40 -07:00
Woosuk Kwon
e38074b1e6
Support FP32 ( #141 )
2023-06-07 00:40:21 -07:00
Woosuk Kwon
376725ce74
[PyPI] Packaging for PyPI distribution ( #140 )
2023-06-05 20:03:14 -07:00