-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Filter null rows before aggregation #23052
base: master
Are you sure you want to change the base?
Conversation
{ | ||
PlanNode source = node.getSource(); | ||
if (source instanceof FilterNode) { | ||
return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why can't the new filter added by this rule be appended to the existing filter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was proposed to prevent the AggregationNode being optimized many times. Sure,it should be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the check should be something like "have all is not null filters already been added to the input to the aggregation node", instead of "is the input to the aggregation node is a filter node"
Can you add unit tests? Also there must be a feature flag controlling this since I am not sure if this optimization will be always effective. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution @jackychen718 !
- This changes the sql semantics and can cause wrong results. Therefore this rule should be behind a feature flag and should definitely be disabled by default
- Please add unit tests.
- There are many related test failures. Please fix them before marking it ready for review
{ | ||
PlanNode source = node.getSource(); | ||
if (source instanceof FilterNode) { | ||
return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the check should be something like "have all is not null filters already been added to the input to the aggregation node", instead of "is the input to the aggregation node is a filter node"
} | ||
Collection<Aggregation> aggregations = node.getAggregations().values(); | ||
for (Aggregation aggregation : aggregations) { | ||
if (aggregation.getCall().getArguments().size() > 1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why limit to aggregate functions of only one argument?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see the example of the problem has only one argument. As the first attempt, I only consider one argument. I will consider arbitrary arguments.
public Result apply(AggregationNode aggregationNode, Captures captures, Context context) | ||
{ | ||
PlanNode source = aggregationNode.getSource(); | ||
if (context.getLookup().resolveGroup(source).findFirst().filter(node -> node instanceof FilterNode) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not necessary since you already check for this in the pattern detection
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I will modify the pattern detection part later on.
|
||
RowExpression andForm = new SpecialFormExpression(AND, BOOLEAN, ImmutableList.copyOf(nullFormList)); | ||
RowExpression notCallPredicate = new CallExpression("not", | ||
new FunctionResolution(createTestFunctionAndTypeManager().getFunctionAndTypeResolver()).notFunction(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do not use createTestFunctionAndTypeManager. Get it from the optimizer metadata. See other optimizer rules for examples
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
I will add unit tests. Currently, I did not consider count(*) as a special way. From your reminding, I do not think I can still add filter before aggregation in this scenario. I will exclude this scenario in the pattern match process. Is it ok? @jaystarshot |
e647d6f
to
fd4f28a
Compare
nullFormList.add(nullForm); | ||
} | ||
|
||
RowExpression andForm = new SpecialFormExpression(AND, BOOLEAN, ImmutableList.copyOf(nullFormList)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I would use ImmutableList.builder()
for nullFormList
, call add
in the loop, and then pass nullFormList.build()
when generating the andForm
expression. It prevents copying the list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
fd4f28a
to
86538ad
Compare
Description
Filter rows with all fileds null before aggregation.
Motivation and Context
The issue comes from #17486
Add optimiztion to drop groups/rows with all fields null
Impact
Add another opitimization rule: RemoveNullRowInAggregation.java
Test Plan
Test if any error made after applying the new optimization rule.
Contributor checklist
Release Notes
Please follow release notes guidelines and fill in the release notes below.
== NO RELEASE NOTE ==