vllm/csrc at 220a47627bf48c728ce0a2737be39c400bb6f653 - vllm - Luminance Code Repo

20231088/vllm

History

Casper beb89f68b4

AWQ: Up to 2.66x higher throughput (#2566 )

2024-01-26 23:53:17 -08:00

..

[FIX] Support non-zero CUDA devices in custom kernels (#1959 )

2024-01-02 19:09:59 -08:00

[Experimental] Add multi-LoRA support (#1804 )

2024-01-23 15:26:37 -08:00

AWQ: Up to 2.66x higher throughput (#2566 )

2024-01-26 23:53:17 -08:00

activation_kernels.cu

[FIX] Support non-zero CUDA devices in custom kernels (#1959 )

2024-01-02 19:09:59 -08:00

cache_kernels.cu

use a correct device when creating OptionalCUDAGuard (#2583 )

2024-01-25 23:48:17 -08:00

cache.h

Avoid multiple redefinition (#1817 )

2023-12-14 09:35:58 -08:00

cuda_compat.h

Merge EmbeddedLLM/vllm-rocm into vLLM main (#1836 )

2023-12-07 23:16:52 -08:00

cuda_utils_kernels.cu

[ROCm] add support to ROCm 6.0 and MI300 (#2274 )

2024-01-26 12:41:10 -08:00

cuda_utils.h

[ROCm] add support to ROCm 6.0 and MI300 (#2274 )

2024-01-26 12:41:10 -08:00

dispatch_utils.h

Avoid multiple redefinition (#1817 )

2023-12-14 09:35:58 -08:00

layernorm_kernels.cu

[FIX] Support non-zero CUDA devices in custom kernels (#1959 )

2024-01-02 19:09:59 -08:00

ops.h

AWQ: Up to 2.66x higher throughput (#2566 )

2024-01-26 23:53:17 -08:00

pos_encoding_kernels.cu

[FIX] Support non-zero CUDA devices in custom kernels (#1959 )

2024-01-02 19:09:59 -08:00

pybind.cpp

AWQ: Up to 2.66x higher throughput (#2566 )

2024-01-26 23:53:17 -08:00

reduction_utils.cuh

Merge EmbeddedLLM/vllm-rocm into vLLM main (#1836 )

2023-12-07 23:16:52 -08:00