You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
This is similar to #701 -- but a distributed version.
People don't want to change column/dataframe/artifact names. This conflicts with Hamilton. Is there some API we can support?
Describe the solution you'd like
One idea is that you pipe transforms together and have the framework group things so that there isn't a renaming issue.
E.g.
defdata_set(...) ->pd.DataFrame:
...
returndf@mutatedef_data_set(data_set: pd.DataFrame) ->pd.DataFrame:
# some mutationreturndf@mutatedef_data_set(data_set: pd.DataFrame) ->pd.DataFrame:
# some other mutationreturndf
Notes:
python modules can only expose one function with the same name -- this is the last one defined.
this means that anything we want to use downstream can only be defined once.
the mutating functions here in the above are prefixed _ which is reserved for private functions. Which is fine I think because these transform functions aren't useful by themselves -- and shouldn't be exposed directly. It also gets around the naming issue of (1) --- we can have the decorator register and capture these. Open decision as to what "declares" the connection to the function -- the first argument name? or the name of the function? or?
Order matters. The idea is that the decorator builds up an ordered list of transforms. This allows one to experiment with commenting out functions etc as they're developing...
When Hamilton inspects this module, it then pulls data_set and then checks what was registered against it via @mutate. One initial constraint we can have is that @mutate has to be in the same module; We should design for multi-module, but as a first pass constrain to same module...
Hamilton would then render this correctly exposing those nodes in the graph... and expose data_set as the result of applying all those transforms.
Describe alternatives you've considered
Alternative / additional decorator use could be:
@mutate("data_set")def_some_func_name(arg1: pd.DataFrame) ->pd.DataFrame:
# assumes arg1 maps to data_set ? # some other mutationreturndf
To enable one to have functions names that don't have to match.
This could then help one to write mutations like this -- which I think is a potential vote for allowing multi-module registration --and using module order then to imply transform order application.
Is your feature request related to a problem? Please describe.
This is similar to #701 -- but a distributed version.
People don't want to change column/dataframe/artifact names. This conflicts with Hamilton. Is there some API we can support?
Describe the solution you'd like
One idea is that you pipe transforms together and have the framework group things so that there isn't a renaming issue.
E.g.
Notes:
_
which is reserved for private functions. Which is fine I think because these transform functions aren't useful by themselves -- and shouldn't be exposed directly. It also gets around the naming issue of (1) --- we can have the decorator register and capture these. Open decision as to what "declares" the connection to the function -- the first argument name? or the name of the function? or?data_set
and then checks what was registered against it via@mutate
. One initial constraint we can have is that@mutate
has to be in the same module; We should design for multi-module, but as a first pass constrain to same module...data_set
as the result of applying all those transforms.Describe alternatives you've considered
Alternative / additional decorator use could be:
To enable one to have functions names that don't have to match.
This could then help one to write mutations like this -- which I think is a potential vote for allowing multi-module registration --and using module order then to imply transform order application.
Additional context
Here's some code that proves you can intercept and register the functions in mutate decorator.
Then:
The text was updated successfully, but these errors were encountered: