20231088/vllm - vllm - Luminance Code Repo

20231088/vllm

Author	SHA1	Message	Date
Jinzhen Lin	d06ba4ed3f	[Kernel] moe wna16 marlin kernel (#14447 ) Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com> Co-authored-by: Michael Goin <michael@neuralmagic.com> Co-authored-by: mgoin <mgoin64@gmail.com>	2025-04-14 20:05:22 -07:00
TJian	916836bbfb	[FEAT] [ROCm] [Embedding] Add encoder-only model support into ROCm Flash Attention to enable embedding models. (#14664 ) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>	2025-03-12 09:31:19 -07:00
Sage Moore	45f3f3f59e	[ROCm][Bugfix] Ensure that the moe_wna16_gemm kernel is not built on ROCm platforms. (#14629 ) Signed-off-by: Sage Moore <sage@neuralmagic.com>	2025-03-12 08:00:28 -04:00
Jinzhen Lin	90e88ab756	[Kernel] moe wna16 cuda kernel (#13321 ) Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com> Co-authored-by: mgoin <mgoin64@gmail.com>	2025-03-10 20:12:40 -04:00
Michael Goin	2344192a55	Optimize moe_align_block_size for deepseek_v3 (#12850 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-02-13 18:43:37 -05:00
Shiyan Deng	f1042e86f0	[Misc] AMD Build Improvements (#12923 )	2025-02-12 02:36:10 -08:00
Gregory Shtrasberg	5b19b93082	[ROCm][Kernel] Using the correct warp_size value	2025-02-05 19:15:08 -08:00
Yang Chen	95460fc513	[Kernel] port sgl moe_align_block_size kernels (#12574 ) sgl_moe_align_block_size is based on: `ded9fcd09a` moe_align_block_size is based on: `ba5112ff69` Signed-off-by: Yang Chen <yangche@fb.com>	2025-02-03 13:09:50 +08:00
Harry Mellor	823ab79633	Update `pre-commit` hooks (#12475 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-01-27 17:23:08 -07:00
ElizaWszola	221d388cc5	[Bugfix][Kernel] Fix moe align block issue for mixtral (#12413 )	2025-01-25 01:49:28 +00:00
Jinzhen Lin	1e60f87bb3	[Kernel] fix moe_align_block_size error condition (#12239 ) Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>	2025-01-21 10:30:28 -08:00
Jinzhen Lin	750f4cabfa	[Kernel] optimize moe_align_block_size for cuda graph and large num_experts (e.g. DeepSeek-V3) (#12222 ) Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com> Co-authored-by: Michael Goin <mgoin@redhat.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>	2025-01-20 16:42:16 -08:00
Simon Mo	f49777ba62	Deepseek v3 (#11502 ) Signed-off-by: mgoin <michael@neuralmagic.com> Co-authored-by: mgoin <michael@neuralmagic.com> Co-authored-by: robertgshaw2-neuralmagic <rshaw@neuralmagic.com>	2024-12-26 16:09:44 -08:00
Charlie Fu	59449095ab	[Performance][Kernel] Fused_moe Performance Improvement (#9384 ) Signed-off-by: charlifu <charlifu@amd.com>	2024-10-24 15:37:52 -07:00
bnellnm	eca2c5f7c0	[Bugfix] Fix support for dimension like integers and ScalarType (#9299 )	2024-10-17 19:08:34 +00:00
ElizaWszola	05d686432f	[Kernel] Zero point support in fused MarlinMoE kernel + AWQ Fused MoE (#8973 ) Co-authored-by: Dipika <dipikasikka1@gmail.com> Co-authored-by: Dipika Sikka <ds3822@columbia.edu>	2024-10-04 12:34:44 -06:00
Lucas Wilkinson	aeb37c2a72	[CI/Build] Per file CUDA Archs (improve wheel size and dev build times) (#8845 )	2024-10-03 22:55:25 -04:00
ElizaWszola	d081da0064	[Bugfix] Fix Marlin MoE act order when is_k_full == False (#8741 ) Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>	2024-09-28 18:19:40 -07:00
ElizaWszola	a928ded995	[Kernel] Split Marlin MoE kernels into multiple files (#8661 ) Co-authored-by: mgoin <michael@neuralmagic.com>	2024-09-24 09:31:42 -07:00
Tyler Michael Smith	d66ac62854	[Kernel][Bugfix] Delete some more useless code in marlin_moe_ops.cu (#8643 )	2024-09-21 23:45:02 +00:00
Tyler Michael Smith	4c34ce8916	[Kernel] Remove marlin moe templating on thread_m_blocks (#8573 ) Co-authored-by: lwilkinson@neuralmagic.com	2024-09-19 01:42:49 +00:00
ElizaWszola	a091e2da3e	[Kernel] Enable 8-bit weights in Fused Marlin MoE (#8032 ) Co-authored-by: Dipika <dipikasikka1@gmail.com>	2024-09-16 09:47:19 -06:00
Dipika Sikka	6cd5e5b07e	[Misc] Fused MoE Marlin support for GPTQ (#8217 )	2024-09-09 23:02:52 -04:00
Dipika Sikka	fc911880cc	[Kernel] Expand MoE weight loading + Add Fused Marlin MoE Kernel (#7766 ) Co-authored-by: ElizaWszola <eliza@neuralmagic.com>	2024-08-27 15:07:09 -07:00
Michael Goin	aae74ef95c	Revert "[Kernel] Expand MoE weight loading + Add Fused Marlin MoE Kernel (#7527 )" (#7764 )	2024-08-22 03:42:14 +00:00
Dipika Sikka	8678a69ab5	[Kernel] Expand MoE weight loading + Add Fused Marlin MoE Kernel (#7527 ) Co-authored-by: ElizaWszola <eliza@neuralmagic.com>	2024-08-21 16:17:10 -07:00
Lucas Wilkinson	a8d604ca2a	[Misc] Disambiguate quantized types via a new ScalarType (#6396 )	2024-08-02 13:51:58 -07:00
bnellnm	5467ac3196	[Kernel][Misc] Use TORCH_LIBRARY instead of PYBIND11_MODULE for custom ops (#5047 )	2024-06-09 16:23:30 -04:00
Divakar Verma	a66cf40b20	[Kernel][ROCm][AMD] enable fused topk_softmax kernel for moe layer (#4927 ) This PR enables the fused topk_softmax kernel used in moe layer for HIP	2024-06-02 14:13:26 -07:00
Michael Goin	5f6d10c14c	[CI/Build] Enforce style for C++ and CUDA code with `clang-format` (#4722 )	2024-05-22 07:18:41 +00:00
Woosuk Kwon	f0d4e14557	Add fused top-K softmax kernel for MoE (#2769 )	2024-02-05 17:38:02 -08:00

31 Commits