vllm/moe at 6b40996ae8a5b065de5b3b650b5b1324f67f6334 - vllm - Luminance Code Repo

20231088/vllm

History

TJian 916836bbfb

[FEAT] [ROCm] [Embedding] Add encoder-only model support into ROCm Flash Attention to enable embedding models. (#14664 )

Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>

2025-03-12 09:31:19 -07:00

..

Update pre-commit hooks (#12475 )

2025-01-27 17:23:08 -07:00

marlin_moe_ops.cu

[Bugfix] Fix support for dimension like integers and ScalarType (#9299 )

2024-10-17 19:08:34 +00:00

moe_align_sum_kernels.cu

Optimize moe_align_block_size for deepseek_v3 (#12850 )

2025-02-13 18:43:37 -05:00

moe_ops.h

[ROCm][Bugfix] Ensure that the moe_wna16_gemm kernel is not built on ROCm platforms. (#14629 )

2025-03-12 08:00:28 -04:00

moe_wna16_utils.h

[Kernel] moe wna16 cuda kernel (#13321 )

2025-03-10 20:12:40 -04:00

moe_wna16.cu

[Kernel] moe wna16 cuda kernel (#13321 )

2025-03-10 20:12:40 -04:00

topk_softmax_kernels.cu

[Kernel][Misc] Use TORCH_LIBRARY instead of PYBIND11_MODULE for custom ops (#5047 )

2024-06-09 16:23:30 -04:00

torch_bindings.cpp

[FEAT] [ROCm] [Embedding] Add encoder-only model support into ROCm Flash Attention to enable embedding models. (#14664 )

2025-03-12 09:31:19 -07:00