[CPU] SparseAttention op #21110

tianleiwu · 2024-06-20T03:43:39Z

Description

Add SparseAttention cpu implementation. It depends on CPU Flash Attention in #20805.

This work is still in progress:

Refactoring GQAAttentionBase
Add SparseAttention implementation
Add test cases
Test performance

Motivation and Context

onnxruntime/contrib_ops/cpu/sparse/sparse_attention.h

github-advanced-security

PREfast found more than 20 potential problems in the proposed changes. Check the Files changed tab for more details.

onnxruntime/contrib_ops/cpu/bert/multihead_attention.cc

onnxruntime/contrib_ops/cpu/sparse/sparse_attention_base.h

@@ -0,0 +1,321 @@
+// Copyright (c) Microsoft Corporation. All rights reserved.


onnxruntime/contrib_ops/cpu/sparse/sparse_attention.cc

@@ -0,0 +1,210 @@
+// Copyright (c) Microsoft Corporation. All rights reserved.


onnxruntime/test/python/transformers/test_sparse_attention.py

+            sequence_length = sequence_lengths[i % len(sequence_lengths)]
+            num_heads = heads[i % len(heads)]
+            head_size = head_sizes[i % len(head_sizes)]
+            format = formats[i % len(formats)]


onnxruntime/test/python/transformers/test_sparse_attention.py

+def get_test_cases(provider: str, has_past_kv:bool, comprehensive: bool, debug=False):
+    if provider == "CUDAExecutionProvider" and not has_cuda_support():
+        return
+        yield


tianleiwu added 3 commits June 19, 2024 20:38

cpu flash attention by duanqn

4924554

refactoring

f65e55d

Add header

9371e31

tianleiwu requested a review from a team as a code owner June 20, 2024 03:43

tianleiwu marked this pull request as draft June 20, 2024 03:43

github-advanced-security bot found potential problems Jun 20, 2024

View reviewed changes

onnxruntime/contrib_ops/cpu/sparse/sparse_attention.h Fixed Show fixed Hide fixed

github-advanced-security bot found potential problems Jun 20, 2024

View reviewed changes

tianleiwu added 7 commits June 19, 2024 21:59

fix linux build

e05241d

fix linux non amd64 build

3eaef7a

fix build warnings

8c4779e

handle unknown l2 cache size

b7dc09d

format and static_cast

4ef0fe7

test intra_op_num_threads

bb031d0

l2 cache size for mac os and BSD

c42e4eb

github-advanced-security bot found potential problems Jun 20, 2024

View reviewed changes

onnxruntime/contrib_ops/cpu/bert/multihead_attention.cc Fixed Show fixed Hide fixed

onnxruntime/contrib_ops/cpu/bert/multihead_attention.cc Fixed Show fixed Hide fixed

tianleiwu added 6 commits June 20, 2024 17:31

use smart pointer

c321c72

update doc

e68f60c

rename row to block, and tune block size

afc4325

output benchmark to csv

9af0703

move PackVIntoRotaryQKV to a new header file

a65bc41

Add test cases (no sparse)

4f4c814

github-advanced-security bot found potential problems Jun 27, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CPU] SparseAttention op #21110

[CPU] SparseAttention op #21110

tianleiwu commented Jun 20, 2024 •

edited

Loading

github-advanced-security bot left a comment

		@@ -0,0 +1,321 @@
		// Copyright (c) Microsoft Corporation. All rights reserved.

		@@ -0,0 +1,210 @@
		// Copyright (c) Microsoft Corporation. All rights reserved.

[CPU] SparseAttention op #21110

Are you sure you want to change the base?

[CPU] SparseAttention op #21110

Conversation

tianleiwu commented Jun 20, 2024 • edited Loading

Description

Motivation and Context

github-advanced-security bot left a comment

Choose a reason for hiding this comment

tianleiwu commented Jun 20, 2024 •

edited

Loading