[Bugfix][Kernel] FA3 Fix - RuntimeError: This flash attention build only supports pack_gqa (for build size reasons). (#12405)

Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
2025-01-24 15:20:59 -05:00 · 2025-01-24 15:20:59 -05:00 · 3132a933b6
commit 3132a933b6
parent df5dafaa5b
1 changed files with 1 additions and 1 deletions
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@ -576,7 +576,7 @@ else()
  FetchContent_Declare(
          vllm-flash-attn
          GIT_REPOSITORY https://github.com/vllm-project/flash-attention.git
-          GIT_TAG 0aff05f577e8a10086066a00618609199b25231d
+          GIT_TAG 9732b0ce005d1e6216864788502d5570004678f5
          GIT_PROGRESS TRUE
          # Don't share the vllm-flash-attn build between build types
          BINARY_DIR ${CMAKE_BINARY_DIR}/vllm-flash-attn