Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Allows table scans to be disabled, even for multi key lookups #9631

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

AlanConfluent
Copy link
Member

Description

Allows table scans to be disabled, even when there are multiple keys or multi-column keys. Table scans were introduced to fix a serde bug in https://github.com/confluentinc/ksql/pull/8876/files#diff-ebcd3b261b6ef2904811e7d63a7158f0c4273b0f814c81bbfb8c8fc139a31322R125 . This worked, but always stopped the key lookup optimization for multiple keys or multi-column keys.

Now, this PR gives the option to set ksql.query.pull.table.scan.enabled=false in a request to still get the optimization, if the serde bug isn't an issue. Ideally, we would fix the issue so that table scans weren't attempted in this scenario anyway, but that's for a followup PR.

Testing done

Added unit tests, ran manually.

Reviewer checklist

  • Ensure docs are updated if necessary. (eg. if a user visible feature is being added or changed).
  • Ensure relevant issues are linked (description should include text like "Fixes #")

@AlanConfluent AlanConfluent requested a review from a team as a code owner October 7, 2022 18:34
@@ -210,6 +210,12 @@ private PullPhysicalPlanType getPlanType() {
} else if (lookupConstraints.stream().anyMatch(lc -> lc instanceof NonKeyConstraint)) {
lookupConstraints = Collections.emptyList();
return PullPhysicalPlanType.TABLE_SCAN;
// For issue #7174. Temporarily enable table scans if there are more than one key or if that
// key is a multi-column key.
} else if (lookupConstraints.stream().anyMatch(lc -> ((KeyConstraint) lc).getKey().size() > 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check is just checking whether any of the key constraints involve more than one column right now. Should we also check if the table has multiple key columns, or is that handled elsewhere?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, is the PR description still accurate? The description mentions the config ksql.query.pull.table.scan.enabled but the actual implementation seems different.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check is just checking whether any of the key constraints involve more than one column right now. Should we also check if the table has multiple key columns, or is that handled elsewhere?

Extracting the key information has already happened at this point and should match the schema for the table. A KeyConstraint will only be created if the AST makes references to columns which fully resolve a key, otherwise we throw an error. So no addition error checking should be required at this point. It's already happened at the logical planner level.

Also, is the PR description still accurate? The description mentions the config ksql.query.pull.table.scan.enabled but the actual implementation seems different.

queryPlannerOptions.getTableScansEnabled() is doing that config check in PullQueryConfigPlannerOptions

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

queryPlannerOptions.getTableScansEnabled() is doing that config check in PullQueryConfigPlannerOptions

But the change in this PR means that pull queries involving multiple key constraints against a table with a single key column will never be executed as a table scan now, regardless of whether the config is set to true or false, right? The PR description makes it sound like these types of queries will only be executed as non-table-scans if the config is set to false.

And for pull queries with key constraints against tables with multiple key columns, these queries will fail if the config is not enabled. (Is that what's already happening today?)

Copy link
Contributor

@hli21 hli21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link

cla-assistant bot commented Nov 15, 2023

CLA assistant check
All committers have signed the CLA.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants