-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: data diffing with SQLMesh does NOT work correctly if primary keys match with duplicates #2810
Comments
this is user error. grain is not validated and up to the user to ensure interegrity through an audit. |
That's a non-obvious thing for a user trying this for the first time or the 50th time. We should reduce that friction with a clear message that for accurate results, the grain should be tested. Example below. (venv) ➜ sqlmesh-demos git:(migration-demo) ✗ sqlmesh table_diff prod:dev demo.full_model --show-sample
grain should have unique and not_null audits for accurate results
Schema Diff Between 'PROD' and 'DEV' environments for model 'demo.full_model':
└── Schemas match
Row Counts:
└── FULL MATCH: 6 rows (100.0%)
COMMON ROWS column comparison stats:
pct_match
num_orders 100.0
COMMON ROWS sample data differences:
All joined rows match
|
maybe a count distinct could be shoved into the data diff query, if you or perhaps @Themiscodes has time you could make a pr for this. basically i'm looking for a way to put in some kind of uniqueness check into the singular data diff query, it could be an optional flag that is injected by default but turned off due to overhead |
I'll leave it to @Themiscodes What about checking for nulls? Those get missed too and mislead the user that there's no data diff. |
Sure I'll look into this @sungchun12 |
The above is an example tablediff using the default sqlmesh projected generated with the CLI and using duckdb. The expected result should error out and say that the primary key (
grain
) is not unique and not null and prompt the user to update thegrain
or fix the primary key issue.Steps to reproduce:
sqlmesh init duckdb
sqlmesh plan
and apply the changesfull_model
belowsqlmesh plan dev
and apply the changessqlmesh table_diff prod:dev sqlmesh_example.full_model --show-sample
The text was updated successfully, but these errors were encountered: