vllm/quantization at c11de33dad217bca79225128059b6fac7e1b2519 - vllm - Luminance Code Repo

20231088/vllm

History

Tyler Michael Smith c11de33dad

[Bugfix][Kernel] Fix per-token/per-channel quantization for Hopper scaled mm (#12696 )

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

2025-02-03 13:04:59 -08:00

..

[Kernel] fix types used in aqlm and ggml kernels to support dynamo (#7596 )

2024-08-16 14:00:11 -07:00

[CI/Build] Suppress divide-by-zero and missing return statement warnings (#7001 )

2024-08-05 16:00:01 -04:00

compressed_tensors

[MISC] Replace c10::optional with std::optional (#11730 )

2025-01-05 10:20:34 +09:00

[Bugfix][Kernel] Fix per-token/per-channel quantization for Hopper scaled mm (#12696 )

2025-02-03 13:04:59 -08:00

[torch.compile] Dynamic fp8 + rms_norm fusion (#10906 )

2024-12-13 03:19:23 +00:00

[torch.compile] Dynamic fp8 + rms_norm fusion (#10906 )

2024-12-13 03:19:23 +00:00

[AMD] Add support for GGUF quantization on ROCm (#10254 )

2024-11-22 21:14:49 -08:00

[Kernel][Misc] Use TORCH_LIBRARY instead of PYBIND11_MODULE for custom ops (#5047 )

2024-06-09 16:23:30 -04:00

Update pre-commit hooks (#12475 )

2025-01-27 17:23:08 -07:00

[Misc] Add SPDX-License-Identifier headers to python source files (#12628 )

2025-02-02 11:58:18 -08:00

Update pre-commit hooks (#12475 )

2025-01-27 17:23:08 -07:00

vectorization.cuh

[torch.compile] Dynamic fp8 + rms_norm fusion (#10906 )

2024-12-13 03:19:23 +00:00