vllm/quantization at cf069aa8aa38a9003c254f8434a29ec6a3070b08 - vllm - Luminance Code Repo

20231088/vllm

History

Harry Mellor cf069aa8aa

Update deprecated Python 3.8 typing (#13971 )

2025-03-02 17:34:51 -08:00

..

[Kernel] fix types used in aqlm and ggml kernels to support dynamo (#7596 )

2024-08-16 14:00:11 -07:00

[Kernel] Fix awq error when n is not divisable by 128 (#13227 )

2025-02-13 20:07:05 -08:00

compressed_tensors

[MISC] Replace c10::optional with std::optional (#11730 )

2025-01-05 10:20:34 +09:00

[Attention] MLA with chunked prefill (#12639 )

2025-02-21 15:30:12 -08:00

[CI/Build] Fix pre-commit errors from #13571 (#13709 )

2025-02-22 16:50:38 -08:00

[ROCm][Quantization][Kernel] Use FP8 FNUZ when OCP flag is 0 or undefined (#13851 )

2025-02-27 10:39:10 +08:00

[torch.compile] Dynamic fp8 + rms_norm fusion (#10906 )

2024-12-13 03:19:23 +00:00

Missing comment explaining VDR variable in GGUF kernels (#13290 )

2025-02-20 22:06:54 -08:00

[Kernel][Misc] Use TORCH_LIBRARY instead of PYBIND11_MODULE for custom ops (#5047 )

2024-06-09 16:23:30 -04:00

[Misc][Kernel]: Add GPTQAllSpark Quantization (#12931 )

2025-02-28 22:30:59 -08:00

Update pre-commit hooks (#12475 )

2025-01-27 17:23:08 -07:00

Update deprecated Python 3.8 typing (#13971 )

2025-03-02 17:34:51 -08:00

Update pre-commit hooks (#12475 )

2025-01-27 17:23:08 -07:00

vectorization.cuh

[torch.compile] Dynamic fp8 + rms_norm fusion (#10906 )

2024-12-13 03:19:23 +00:00