vllm/csrc/moe/moe_ops.h

#pragma once

#include <torch/all.h>

void topk_softmax(torch::Tensor& topk_weights, torch::Tensor& topk_indices,
                  torch::Tensor& token_expert_indices,
                  torch::Tensor& gating_output);

void moe_sum(torch::Tensor& input, torch::Tensor& output);

void moe_align_block_size(torch::Tensor topk_ids, int64_t num_experts,
                          int64_t block_size, torch::Tensor sorted_token_ids,
                          torch::Tensor experts_ids,
                          torch::Tensor num_tokens_post_pad);

void sgl_moe_align_block_size(torch::Tensor topk_ids, int64_t num_experts,
                              int64_t block_size,
                              torch::Tensor sorted_token_ids,
                              torch::Tensor experts_ids,
                              torch::Tensor num_tokens_post_pad);
Add fused top-K softmax kernel for MoE (#2769) 2024-02-05 17:38:02 -08:00			`#pragma once`

[Kernel][Misc] Use TORCH_LIBRARY instead of PYBIND11_MODULE for custom ops (#5047) 2024-06-09 16:23:30 -04:00			`#include <torch/all.h>`
Add fused top-K softmax kernel for MoE (#2769) 2024-02-05 17:38:02 -08:00
[CI/Build] Enforce style for C++ and CUDA code with `clang-format` (#4722) 2024-05-22 03:18:41 -04:00			`void topk_softmax(torch::Tensor& topk_weights, torch::Tensor& topk_indices,`
			`torch::Tensor& token_expert_indices,`
			`torch::Tensor& gating_output);`
[Performance][Kernel] Fused_moe Performance Improvement (#9384) Signed-off-by: charlifu <charlifu@amd.com> 2024-10-24 17:37:52 -05:00
			`void moe_sum(torch::Tensor& input, torch::Tensor& output);`

			`void moe_align_block_size(torch::Tensor topk_ids, int64_t num_experts,`
			`int64_t block_size, torch::Tensor sorted_token_ids,`
			`torch::Tensor experts_ids,`
			`torch::Tensor num_tokens_post_pad);`
[Kernel] port sgl moe_align_block_size kernels (#12574) sgl_moe_align_block_size is based on: https://github.com/sgl-project/sglang/commit/ded9fcd09a43d5e7d5bb31a2bc3e9fc21bf65d2a moe_align_block_size is based on: https://github.com/sgl-project/sglang/commit/ba5112ff691d791a9e38c6c71f59324a5fcb49d0 Signed-off-by: Yang Chen <yangche@fb.com> 2025-02-02 21:09:50 -08:00
			`void sgl_moe_align_block_size(torch::Tensor topk_ids, int64_t num_experts,`
			`int64_t block_size,`
			`torch::Tensor sorted_token_ids,`
			`torch::Tensor experts_ids,`
			`torch::Tensor num_tokens_post_pad);`