-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CPU] SparseAttention op #21110
base: main
Are you sure you want to change the base?
[CPU] SparseAttention op #21110
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PREfast found more than 20 potential problems in the proposed changes. Check the Files changed tab for more details.
@@ -0,0 +1,321 @@ | |||
// Copyright (c) Microsoft Corporation. All rights reserved. |
Check warning
Code scanning / lintrunner
CLANGFORMAT/format Warning
Run lintrunner -a to apply this patch.
@@ -0,0 +1,210 @@ | |||
// Copyright (c) Microsoft Corporation. All rights reserved. |
Check warning
Code scanning / lintrunner
CLANGFORMAT/format Warning
Run lintrunner -a to apply this patch.
sequence_length = sequence_lengths[i % len(sequence_lengths)] | ||
num_heads = heads[i % len(heads)] | ||
head_size = head_sizes[i % len(head_sizes)] | ||
format = formats[i % len(formats)] |
Check warning
Code scanning / CodeQL
Variable defined multiple times Warning
redefined
def get_test_cases(provider: str, has_past_kv:bool, comprehensive: bool, debug=False): | ||
if provider == "CUDAExecutionProvider" and not has_cuda_support(): | ||
return | ||
yield |
Check warning
Code scanning / CodeQL
Unreachable code Warning
Description
Add SparseAttention cpu implementation. It depends on CPU Flash Attention in #20805.
This work is still in progress:
Motivation and Context