vllm/fp8 at b5241e41d9fef56a89dcfda367a7eff87a07e3f7 - vllm - Luminance Code Repo

20231088/vllm

History

Varun Sundar Rabindranath b5241e41d9

[ Kernel ] FP8 Dynamic-Per-Token Quant Kernel (#6511 )

Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>

2024-07-18 01:38:35 +00:00

..

[CI/Build] Enforce style for C++ and CUDA code with clang-format (#4722 )

2024-05-22 07:18:41 +00:00

[CI/Build] Enforce style for C++ and CUDA code with clang-format (#4722 )

2024-05-22 07:18:41 +00:00

common.cu

[ Kernel ] FP8 Dynamic-Per-Token Quant Kernel (#6511 )

2024-07-18 01:38:35 +00:00

fp8_marlin.cu

[Kernel] Expand FP8 support to Ampere GPUs using FP8 Marlin (#5975 )

2024-07-03 17:38:00 +00:00