Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-665] [Feature] Allow control over the outer CTE identifier generated in ephemeral materialization #5273

Closed
1 task done
miro-ur opened this issue May 18, 2022 · 4 comments · May be fixed by dbt-labs/dbt-adapters#236 or #10290
Closed
1 task done
Labels
enhancement New feature or request help_wanted Trickier changes, with a clear starting point, good for previous/experienced contributors stale Issues that have gone stale Team:Adapters Issues designated for the adapter area of the code

Comments

@miro-ur
Copy link

miro-ur commented May 18, 2022

Is this your first time opening an issue?

Describe the Feature

We use a dot notation to group and organize dbt model files with a naming pattern of {group}.{model} such as ingest.source and dw.dimension etc. With specially complex models this provides a level of organization and consistency that keeps files organized, unique and faster to identify and find.

We rely on config aliases to then properly name relations - otherwise adding schema would be an invalid fully qualified relation name like schema.layer.table. This works for both table and view materialization, however causes the generated sql to be invalid when using the ephemeral materialization due to the outer CTE identifier being generated as __dbt__cte__ingest.source.

If model alias is configured it should be used everywhere as the identifier of the output instead of the file name - regardless of materialization used to give end user better control over generated SQL code and avoid errors or potential invalid code being generated when using an unconventional naming of model files.

Describe alternatives you've considered

An alternative might be some form of sanitization of CTE name during generation to avoid generating invalid CTE identifier - like stripping non alphanumeric characters or generating a non model name linked name - like an incremental __dbt__cte__1 - as long as its a valid and unique CTE name to enable stacking CTEs.

Who will this benefit?

This feature would introduce consistency in behavior compared to other materialization strategies (alias being used as table or view name), give uses control over naming of models vs files and allow end users to come up with creative naming convention and organization of model files without having to worry that switching materialization will cause a failure because of the model file name.

Are you interested in contributing this feature?

Happy to contribute, looking for feedback and some pointers on this from the team

Anything else?

No response

@miro-ur miro-ur added enhancement New feature or request triage labels May 18, 2022
@github-actions github-actions bot changed the title [Feature] Allow control over the outer CTE identifier generated in ephemeral materialization [CT-665] [Feature] Allow control over the outer CTE identifier generated in ephemeral materialization May 18, 2022
@jtcohen6 jtcohen6 added help_wanted Trickier changes, with a clear starting point, good for previous/experienced contributors Team:Adapters Issues designated for the adapter area of the code labels May 19, 2022
@jtcohen6
Copy link
Contributor

@miro-ur Thanks for opening!

This is something that's defined in Python today, but within the "adapter" interface, to accommodate the fact that different databases support different naming conventions:

@staticmethod
def add_ephemeral_prefix(name: str):
return f"__dbt__cte__{name}"
@classmethod
def create_ephemeral_from_node(
cls: Type[Self],
config: HasQuoting,
node: Union[ParsedNode, CompiledNode],
) -> Self:
# Note that ephemeral models are based on the name.
identifier = cls.add_ephemeral_prefix(node.name)
return cls.create(
type=cls.CTE,
identifier=identifier,
).quote(identifier=False)

I don't know that we'd get much benefit from turning this into a macro, and it would require a lot of code plumbing to set up a Jinja rendering context where it isn't currently needed.

I think both of your alternative proposals are totally reasonable:

  • Use node.alias instead of node.name for creating the CTE name
  • Replace non-alphanumeric characters with _ when creating the CTE name

I can't think offhand of any risks or downstream implications. It will be worth verifying with our automated tests, and of course adding a new one. We have a few existing tests for "models with dots in their names" here and here, since this is a capability that we know some users rely on, and we want to avoid any future regressions.

I'd welcome a PR for this!

@jtcohen6 jtcohen6 removed the triage label May 19, 2022
@miro-ur
Copy link
Author

miro-ur commented Jul 18, 2022

@jtcohen6 Appreciate the guidance and apologies for the delay.
I just forked the repo and will do my best to make the changes and submit a PR.

@github-actions
Copy link
Contributor

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.

@github-actions github-actions bot added the stale Issues that have gone stale label Jul 14, 2023
@github-actions
Copy link
Contributor

Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest. Just add a comment to notify the maintainers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help_wanted Trickier changes, with a clear starting point, good for previous/experienced contributors stale Issues that have gone stale Team:Adapters Issues designated for the adapter area of the code
Projects
None yet
2 participants