20231088/vllm - vllm - Luminance Code Repo

20231088/vllm

Author	SHA1	Message	Date
Michael Goin	2344192a55	Optimize moe_align_block_size for deepseek_v3 (#12850 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-02-13 18:43:37 -05:00
Shiyan Deng	f1042e86f0	[Misc] AMD Build Improvements (#12923 )	2025-02-12 02:36:10 -08:00
Gregory Shtrasberg	5b19b93082	[ROCm][Kernel] Using the correct warp_size value	2025-02-05 19:15:08 -08:00
Yang Chen	95460fc513	[Kernel] port sgl moe_align_block_size kernels (#12574 ) sgl_moe_align_block_size is based on: `ded9fcd09a` moe_align_block_size is based on: `ba5112ff69` Signed-off-by: Yang Chen <yangche@fb.com>	2025-02-03 13:09:50 +08:00
ElizaWszola	221d388cc5	[Bugfix][Kernel] Fix moe align block issue for mixtral (#12413 )	2025-01-25 01:49:28 +00:00
Jinzhen Lin	1e60f87bb3	[Kernel] fix moe_align_block_size error condition (#12239 ) Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>	2025-01-21 10:30:28 -08:00
Jinzhen Lin	750f4cabfa	[Kernel] optimize moe_align_block_size for cuda graph and large num_experts (e.g. DeepSeek-V3) (#12222 ) Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com> Co-authored-by: Michael Goin <mgoin@redhat.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>	2025-01-20 16:42:16 -08:00
Simon Mo	f49777ba62	Deepseek v3 (#11502 ) Signed-off-by: mgoin <michael@neuralmagic.com> Co-authored-by: mgoin <michael@neuralmagic.com> Co-authored-by: robertgshaw2-neuralmagic <rshaw@neuralmagic.com>	2024-12-26 16:09:44 -08:00
Charlie Fu	59449095ab	[Performance][Kernel] Fused_moe Performance Improvement (#9384 ) Signed-off-by: charlifu <charlifu@amd.com>	2024-10-24 15:37:52 -07:00

9 Commits