vllm/weight_loading at 5e5c8e091eacc16672a0a8265eb5cb0ece85d24b - vllm - Luminance Code Repo

20231088/vllm

History

Michael Goin 5e5c8e091e

[Quant][Perf] Use moe_wna16 kernel by default for MoEs with many experts (#13236 )

Signed-off-by: mgoin <mgoin64@gmail.com>

2025-02-14 12:53:42 -08:00

..

models-large.txt

[Misc] Update w2 scale loading for GPTQMarlinMoE (#12757 )

2025-02-06 01:02:14 -08:00

models.txt

[Attention] MLA decode optimizations (#12528 )

2025-01-30 23:49:37 -08:00

run_model_weight_loading_test.sh

[Attention] MLA decode optimizations (#12528 )

2025-01-30 23:49:37 -08:00

test_weight_loading.py

[Quant][Perf] Use moe_wna16 kernel by default for MoEs with many experts (#13236 )

2025-02-14 12:53:42 -08:00