Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-48709][SQL] Fix varchar type resolution mismatch for DataSourceV2 CTAS #47082

Closed
wants to merge 1 commit into from

Conversation

wangyum
Copy link
Member

@wangyum wangyum commented Jun 25, 2024

What changes were proposed in this pull request?

This PR fixes varchar type resolution mismatch for DataSourceV2 CTAS. For example:

set spark.sql.storeAssignmentPolicy=LEGACY;
CREATE TABLE testcat.ns.t1 (d1 string, d2 varchar(200)) USING parquet;
CREATE TABLE testcat.ns.t2 USING foo as select * from testcat.ns.t1

Error message:

org.apache.spark.sql.AnalysisException: LEGACY store assignment policy is disallowed in Spark data source V2. Please set the configuration spark.sql.storeAssignmentPolicy to other values.

Why are the changes needed?

Avoid query failures.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Unit test.

Was this patch authored or co-authored using generative AI tooling?

No.

@github-actions github-actions bot added the SQL label Jun 25, 2024
@wangyum
Copy link
Member Author

wangyum commented Jun 25, 2024

cc @cloud-fan

@@ -1739,6 +1739,16 @@ class DataSourceV2SQLSuiteV1Filter
}
}

test("SPARK-48709: varchar resolution mismatch for DataSourceV2 CTAS") {
withSQLConf(
SQLConf.STORE_ASSIGNMENT_POLICY.key -> SQLConf.StoreAssignmentPolicy.LEGACY.toString) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just for my education, why it's only a problem with the legacy store assignment?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can test all options here

Copy link
Member Author

@wangyum wangyum Jun 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will validate store assignment policy if !v2Write.outputResolved is true.

case v2Write: V2WriteCommand
if v2Write.table.resolved && v2Write.query.resolved && !v2Write.outputResolved =>
validateStoreAssignmentPolicy()

Similar to outType, we should get RawType to make output resolved.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this end up with the error org.apache.spark.sql.AnalysisException: LEGACY store assignment policy is disallowed in Spark data source V2.?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, it looks like the right error.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh nvm, when we don't need to do store assignment, then the mode doesn't matter.

@yaooqinn yaooqinn closed this in e23d69b Jun 26, 2024
@yaooqinn
Copy link
Member

Merged to master, Thank you @wangyum @cloud-fan

Could you send a backport PR to branch-3.5? @wangyum

@wangyum wangyum deleted the SPARK-48709 branch June 26, 2024 13:55
yaooqinn pushed a commit that referenced this pull request Jun 27, 2024
…SourceV2 CTAS

Backport of #47082.

### What changes were proposed in this pull request?

This PR fixes varchar type resolution mismatch for DataSourceV2 CTAS. For example:
```sql
set spark.sql.storeAssignmentPolicy=LEGACY;
CREATE TABLE testcat.ns.t1 (d1 string, d2 varchar(200)) USING parquet;
CREATE TABLE testcat.ns.t2 USING foo as select * from testcat.ns.t1
```
Error message:
```
org.apache.spark.sql.AnalysisException: LEGACY store assignment policy is disallowed in Spark data source V2. Please set the configuration spark.sql.storeAssignmentPolicy to other values.
```

### Why are the changes needed?

Avoid query failures.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Unit test.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #47103 from wangyum/SPARK-48709-branch-3.5.

Authored-by: Yuming Wang <[email protected]>
Signed-off-by: Kent Yao <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
3 participants