Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with SCD Type 2 models #2782

Closed
zerodarkzone opened this issue Jun 17, 2024 · 9 comments · Fixed by #2830
Closed

Problem with SCD Type 2 models #2782

zerodarkzone opened this issue Jun 17, 2024 · 9 comments · Fixed by #2830
Assignees

Comments

@zerodarkzone
Copy link

Hi,
Could you explain what does the enable-preview flag actually does?

I don't know if I explained the problem correctly.

Basically, sqlmesh plan is not updating the schema of the SCD Type 2 models and the backfill fails when trying to do a restatement.
Is that the expected behaviour?

The original model was something like:

MODEL (
  name silver.articulos_curado,
  kind SCD_TYPE_2_BY_COLUMN (
    unique_key id_articulo,
    columns ARRAY(
      estatus_articulo,
      descripcion,
      clave_tipo_articulo,
      clave_marca_proveedor,
      clave_acabado,
      marca_price_shoes
    ),
    on_destructive_change allow
  ),
  cron '@daily',
  grain sid,
  description 'Todos los elementos de la tabla articulo, es decir, todos los productos que maneja la empresa'
);

SELECT
  UUID() AS sid, /* Unique ID */
  ID_ART AS id_articulo, /* Identificador del articulo */
  AR_ESTATUS_STR AS estatus_articulo, /* Estatus del articulo */
  AR_DESC_STR AS descripcion, /* Descripcion del articulo */
  TAR_CVE_N AS clave_tipo_articulo, /* Clave del tipo de articulo ( calzado, accesorio, ropa o folletos) */
  MAR_CVE_N AS clave_marca_proveedor, /* Clave de la marca del articulo */
  ACA_CVE_N AS clave_acabado, /* Clave del acabado del articulo */
  MP_CVE_N AS marca_price_shoes /* Clave de la marca price */
FROM raw.articulo;

Then I just renamed a column and now the model is:

MODEL (
  name silver.articulos_curado,
  kind SCD_TYPE_2_BY_COLUMN (
    unique_key id_articulo,
    columns ARRAY(
      estatus_articulo,
      descripcion,
      clave_tipo_articulo,
      clave_marca_proveedor,
      clave_acabado,
      clave_marca_price
    ),
    on_destructive_change allow
  ),
  cron '@daily',
  grain sid,
  description 'Todos los elementos de la tabla articulo, es decir, todos los productos que maneja la empresa'
);

SELECT
  UUID() AS sid, /* Unique ID */
  ID_ART AS id_articulo, /* Identificador del articulo */
  AR_ESTATUS_STR AS estatus_articulo, /* Estatus del articulo */
  AR_DESC_STR AS descripcion, /* Descripcion del articulo */
  TAR_CVE_N AS clave_tipo_articulo, /* Clave del tipo de articulo ( calzado, accesorio, ropa o folletos) */
  MAR_CVE_N AS clave_marca_proveedor, /* Clave de la marca del articulo */
  ACA_CVE_N AS clave_acabado, /* Clave del acabado del articulo */
  MP_CVE_N AS clave_marca_price /* Clave de la marca price */
FROM raw.articulo;

When running sqlmesh plan dev the view schema is not changed.
When running --restate-model "silver.articulos_curado"
Then I get the following error:

databricks.sql.exc.ServerOperationError: [UNRESOLVED_COLUMN.WITH_SUGGESTION] A column, variable, or function parameter with name `marca_price_shoes` cannot be resolved. Did you    
mean one of the following? [`clave_marca_price`, `id_articulo`, `clave_acabado`, `clave_division`, `clave_horma`]. SQLSTATE: 42703; line 1 pos 1000

It is basically telling me that the renamed column does not exists.

Please don't close an issue without responding to the actual problem.

Thanks

Originally posted by @zerodarkzone in #2764 (comment)

@zerodarkzone
Copy link
Author

Hi, I opened a new issue because you closed the latest one without answerig the question.

@izeigerman
Copy link
Member

@eakmanrq can you please have a look at this one.

@izeigerman
Copy link
Member

izeigerman commented Jun 20, 2024

@zerodarkzone when you're restating that model which environment do you target?

@izeigerman
Copy link
Member

@zerodarkzone also can you please attach full logs for the run that causes the reported error.

@zerodarkzone
Copy link
Author

zerodarkzone commented Jun 24, 2024

Hi,
I'm targeting a dev environment.

I tried to replicate everything and now I'm getting different kind of errors

This is the schema of the table when I run the original model.
image
This is the log:
sqlmesh_2024_06_24_10_37_49.log

Then when I change the column name and do the plan again I get the following error:

2024-06-24 10:46:56,221 - MainThread - sqlmesh.core.context - ERROR - Apply Failure: Traceback (most recent call last):
  File "/home/juanb/mambaforge-pypy3/envs/priceshoes/lib/python3.10/site-packages/sqlmesh/utils/concurrency.py", line 69, in _process_node
    self.fn(node)
  File "/home/juanb/mambaforge-pypy3/envs/priceshoes/lib/python3.10/site-packages/sqlmesh/utils/concurrency.py", line 165, in <lambda>
    lambda s_id: fn(snapshots_by_id[s_id]),
  File "/home/juanb/mambaforge-pypy3/envs/priceshoes/lib/python3.10/site-packages/sqlmesh/core/snapshot/evaluator.py", line 293, in <lambda>
    lambda s: self._create_snapshot(
  File "/home/juanb/mambaforge-pypy3/envs/priceshoes/lib/python3.10/site-packages/sqlmesh/core/snapshot/evaluator.py", line 672, in _create_snapshot
    self.adapter.alter_table(alter_expressions)
  File "/home/juanb/mambaforge-pypy3/envs/priceshoes/lib/python3.10/site-packages/sqlmesh/core/engine_adapter/base.py", line 807, in alter_table
    self.execute(alter_expression)
  File "/home/juanb/mambaforge-pypy3/envs/priceshoes/lib/python3.10/site-packages/sqlmesh/core/engine_adapter/base.py", line 1831, in execute
    self._execute(sql, **kwargs)
  File "/home/juanb/mambaforge-pypy3/envs/priceshoes/lib/python3.10/site-packages/sqlmesh/core/engine_adapter/base.py", line 1837, in _execute
    self.cursor.execute(sql, **kwargs)
  File "/home/juanb/mambaforge-pypy3/envs/priceshoes/lib/python3.10/site-packages/databricks/sql/client.py", line 768, in execute
    execute_response = self.thrift_backend.execute_command(
  File "/home/juanb/mambaforge-pypy3/envs/priceshoes/lib/python3.10/site-packages/databricks/sql/thrift_backend.py", line 869, in execute_command
    return self._handle_execute_response(resp, cursor)
  File "/home/juanb/mambaforge-pypy3/envs/priceshoes/lib/python3.10/site-packages/databricks/sql/thrift_backend.py", line 961, in _handle_execute_response
    final_operation_state = self._wait_until_command_done(
  File "/home/juanb/mambaforge-pypy3/envs/priceshoes/lib/python3.10/site-packages/databricks/sql/thrift_backend.py", line 794, in _wait_until_command_done
    self._check_command_not_in_error_or_closed_state(
  File "/home/juanb/mambaforge-pypy3/envs/priceshoes/lib/python3.10/site-packages/databricks/sql/thrift_backend.py", line 598, in _check_command_not_in_error_or_closed_state
    raise ServerOperationError(
databricks.sql.exc.ServerOperationError: [DELTA_UNSUPPORTED_DROP_COLUMN] DROP COLUMN is not supported for your Delta table. 
Please enable Column Mapping on your Delta table with mapping mode 'name'.
You can use one of the following commands.

If your table is already on the required protocol version:
ALTER TABLE table_name SET TBLPROPERTIES ('delta.columnMapping.mode' = 'name')

If your table is not on the required protocol version and requires a protocol upgrade:
ALTER TABLE table_name SET TBLPROPERTIES (
   'delta.columnMapping.mode' = 'name',
   'delta.minReaderVersion' = '2',
   'delta.minWriterVersion' = '5')
Refer to table versioning at https://docs.databricks.com/delta/versioning.html

sqlmesh_2024_06_24_10_44_41.log

After this, if I run the plan again, it finishes "without problems":
sqlmesh_2024_06_24_10_48_12.log
But the schema stays the same:
image

At the end, when I try to do a restatement, I get the following error (Leaving the backfill dates blank):
image

2024-06-24 10:59:25,621 - MainThread - sqlmesh.core.context - ERROR - Apply Failure: Traceback (most recent call last):
  File "/home/juanb/mambaforge-pypy3/envs/priceshoes/lib/python3.10/site-packages/sqlmesh/core/context.py", line 1172, in apply
    self._apply(plan, circuit_breaker)
  File "/home/juanb/mambaforge-pypy3/envs/priceshoes/lib/python3.10/site-packages/sqlmesh/core/context.py", line 1723, in _apply
    self._scheduler.create_plan_evaluator(self).evaluate(plan, circuit_breaker=circuit_breaker)
  File "/home/juanb/mambaforge-pypy3/envs/priceshoes/lib/python3.10/site-packages/sqlmesh/core/plan/evaluator.py", line 112, in evaluate
    self._backfill(
  File "/home/juanb/mambaforge-pypy3/envs/priceshoes/lib/python3.10/site-packages/sqlmesh/core/plan/evaluator.py", line 162, in _backfill
    is_run_successful = scheduler.run(
  File "/home/juanb/mambaforge-pypy3/envs/priceshoes/lib/python3.10/site-packages/sqlmesh/core/scheduler.py", line 232, in run
    validate_date_range(start, end)
  File "/home/juanb/mambaforge-pypy3/envs/priceshoes/lib/python3.10/site-packages/sqlmesh/utils/date.py", line 301, in validate_date_range
    raise ValueError(
ValueError: Start date / time (1719100800000) can't be greater than end date / time (1718668800000)
 (context.py:1180)

sqlmesh_2024_06_24_10_57_53.log

If I input the backfill dates, then I get the folliwing error:

databricks.sql.exc.ServerOperationError: [UNRESOLVED_COLUMN.WITH_SUGGESTION] A column, variable, or function parameter with name `marca_price_shoes` cannot be resolved. Did you mean one of the following? [`clave_marca_price`, `id_articulo`, `clave_acabado`, `clave_marca_proveedor`, `descripcion`]. SQLSTATE: 42703; line 1 pos 340

sqlmesh_2024_06_24_11_01_44.log

@izeigerman
Copy link
Member

izeigerman commented Jun 26, 2024

@zerodarkzone you're getting this error:

ValueError: Start date / time (1719100800000) can't be greater than end date / time (1718668800000)
 (context.py:1180)

because in dev the default start for forward-only changes is "yesterday" (2024-06-23) in your case, not the "beginning of history" as the message suggests. We should've fixed the message, which version are you on?

So this error will be gone if you set the start date to before 06-18 or adjust your end date.

@izeigerman
Copy link
Member

With regard to the DROP error, try setting the following in the model definition and see if it helps:

MODEL (
  ...,
  physical_properties (
    'delta.columnMapping.mode' = 'name'
  )
);

To reflect the suggestion in the error message.

@izeigerman
Copy link
Member

The fact that subsequent plan succeeds after previously failing to update the table's schema is an actual bug here. Basically SQLMesh creates a dev table but fails to update its schema. Next time the plan runs, it checks that the table already exists and skips it. This is something we're going to address.

@izeigerman
Copy link
Member

We will also update the docs to explain how to deal with the databricks.sql.exc.ServerOperationError: [DELTA_UNSUPPORTED_DROP_COLUMN] DROP COLUMN is not supported for your Delta table. error

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants