Raise/warn on incomplete columns in normalize #1504

steinitzu · 2024-06-21T01:40:21Z

Description

Turns the "unbound column" warning into an exception for not-null columns and move it to normalize

Related Issues

Fixes Wrong Merge Key Not Throwing Error #1463

Additional Context

netlify · 2024-06-21T01:40:38Z

✅ Deploy Preview for dlt-hub-docs canceled.

Name	Link
🔨 Latest commit	`7f36f97`
🔍 Latest deploy log	https://app.netlify.com/sites/dlt-hub-docs/deploys/667de26731a47600086202fd

sh-rp · 2024-06-24T11:54:54Z

I'm wondering if we should not do this in the extraction step already. All columns that are non-nullable (and merge and primary keys should be that) should raise if not populated. Extraction spends time on I/O mostly and not on python code as the normalizer, so the check would not make a big difference in performance in my opinion.

steinitzu · 2024-06-24T13:55:51Z

I'm wondering if we should not do this in the extraction step already. All columns that are non-nullable (and merge and primary keys should be that) should raise if not populated. Extraction spends time on I/O mostly and not on python code as the normalizer, so the check would not make a big difference in performance in my opinion.

I agree it would be much better to fail early if possible. Ideally we could tell right after the first data item is extracted.
But I wasn't sure if we can always tell whether the column is populated in extract. The "seen data" marker for the table is only set in normalize so I was going by that. But I'll give it a try.

dlt/common/schema/exceptions.py

sh-rp · 2024-06-26T14:04:32Z

tests/load/pipeline/test_merge_disposition.py

@@ -989,3 +989,24 @@ def r():
    with pytest.raises(PipelineStepFailed) as pip_ex:
        p.run(r())
    assert isinstance(pip_ex.value.__context__, SchemaException)
+
+
+@pytest.mark.parametrize(


Can you write a test (or check if one exists) to see what happens when we do a merge on merge keys but some rows have null in the merge key? It's not super important right now, but if it would be interesting to know what happens :)

I couldn't find a test so I added one. This was raising an exception already through schema.coerce_row in normalize

sh-rp

Looks good, small requests

steinitzu · 2024-06-26T23:56:11Z

I'm wondering if we should not do this in the extraction step already. All columns that are non-nullable (and merge and primary keys should be that) should raise if not populated. Extraction spends time on I/O mostly and not on python code as the normalizer, so the check would not make a big difference in performance in my opinion.

Was looking into if this was possible also, but I don't think so without moving a lot of normalize logic into extract. I wasn't sure how much schema inferrence is done in extract, seems there is none.

Raise on not-nullable columns to catch e.g. misspelled merge/primary key key

sh-rp reviewed Jun 26, 2024

View reviewed changes

dlt/common/schema/exceptions.py Outdated Show resolved Hide resolved

sh-rp reviewed Jun 26, 2024

View reviewed changes

sh-rp requested changes Jun 26, 2024

View reviewed changes

sh-rp added sprint Marks group of tasks with core team focus at this moment labels Jun 26, 2024

steinitzu force-pushed the fix/error-missing-merge-key branch 2 times, most recently from d16217f to a855f32 Compare June 26, 2024 23:49

steinitzu added 5 commits June 27, 2024 14:47

Raise/warn on incomplete columns in normalize

c32970e

Raise on not-nullable columns to catch e.g. misspelled merge/primary key key

Update error msg

b5a8972

Test for null values

abd7baf

Lint

a2b80d9

Delete now invalid tests

0d0afa5

steinitzu force-pushed the fix/error-missing-merge-key branch from 1db75e1 to 0d0afa5 Compare June 27, 2024 18:47

Fix common test

7f36f97

steinitzu marked this pull request as ready for review June 27, 2024 22:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Raise/warn on incomplete columns in normalize #1504

Raise/warn on incomplete columns in normalize #1504

steinitzu commented Jun 21, 2024

netlify bot commented Jun 21, 2024 •

edited

Loading

sh-rp commented Jun 24, 2024

steinitzu commented Jun 24, 2024

sh-rp Jun 26, 2024

steinitzu Jun 26, 2024

sh-rp left a comment

steinitzu commented Jun 26, 2024

Raise/warn on incomplete columns in normalize #1504

Are you sure you want to change the base?

Raise/warn on incomplete columns in normalize #1504

Conversation

steinitzu commented Jun 21, 2024

Description

Related Issues

Additional Context

netlify bot commented Jun 21, 2024 • edited Loading

✅ Deploy Preview for dlt-hub-docs canceled.

sh-rp commented Jun 24, 2024

steinitzu commented Jun 24, 2024

sh-rp Jun 26, 2024

Choose a reason for hiding this comment

steinitzu Jun 26, 2024

Choose a reason for hiding this comment

sh-rp left a comment

Choose a reason for hiding this comment

steinitzu commented Jun 26, 2024

netlify bot commented Jun 21, 2024 •

edited

Loading