Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable mutation of the output of nodes in a linear fashion via decorators #922

Open
skrawcz opened this issue May 27, 2024 · 0 comments
Open
Labels
decorators enhancement New feature or request

Comments

@skrawcz
Copy link
Collaborator

skrawcz commented May 27, 2024

Is your feature request related to a problem? Please describe.
This is similar to #701 -- but a distributed version.

People don't want to change column/dataframe/artifact names. This conflicts with Hamilton. Is there some API we can support?

Describe the solution you'd like
One idea is that you pipe transforms together and have the framework group things so that there isn't a renaming issue.

E.g.

def data_set(...) -> pd.DataFrame:
  ...
  return df

@mutate
def _data_set(data_set: pd.DataFrame) -> pd.DataFrame:
  # some mutation
  return df

@mutate
def _data_set(data_set: pd.DataFrame) -> pd.DataFrame:
  # some other mutation
  return df

Notes:

  1. python modules can only expose one function with the same name -- this is the last one defined.
  2. this means that anything we want to use downstream can only be defined once.
  3. the mutating functions here in the above are prefixed _ which is reserved for private functions. Which is fine I think because these transform functions aren't useful by themselves -- and shouldn't be exposed directly. It also gets around the naming issue of (1) --- we can have the decorator register and capture these. Open decision as to what "declares" the connection to the function -- the first argument name? or the name of the function? or?
  4. Order matters. The idea is that the decorator builds up an ordered list of transforms. This allows one to experiment with commenting out functions etc as they're developing...
  5. When Hamilton inspects this module, it then pulls data_set and then checks what was registered against it via @mutate. One initial constraint we can have is that @mutate has to be in the same module; We should design for multi-module, but as a first pass constrain to same module...
  6. Hamilton would then render this correctly exposing those nodes in the graph... and expose data_set as the result of applying all those transforms.

Describe alternatives you've considered
Alternative / additional decorator use could be:

@mutate("data_set")
def _some_func_name(arg1: pd.DataFrame) -> pd.DataFrame:
  # assumes arg1 maps to data_set ? 
  # some other mutation
  return df

To enable one to have functions names that don't have to match.

This could then help one to write mutations like this -- which I think is a potential vote for allowing multi-module registration --and using module order then to imply transform order application.

for helper_func in helper_funcs:
    mutate.register("data_set", helper_func)

Additional context

Here's some code that proves you can intercept and register the functions in mutate decorator.

import collections
function_registry = collections.defaultdict(list)

import functools

def mutate(func):
    @functools.wraps(func)
    def wrapper(*args, **kwargs):
        return func(*args, **kwargs)

    qualified_key = func.__module__ + "." + func.__name__
    # Modify the function's name
    func.__name__ = func.__name__ + str(len(function_registry[qualified_key]))
    wrapper.__name__ = func.__name__

    # Register the function
    function_registry[qualified_key].append(wrapper)

    return wrapper

Then:

from decorator import mutate
import pandas as pd

def my_function(input: pd.Series) -> pd.Series:
    return input

@mutate
def _my_function(my_function: pd.Series) -> pd.Series:
    return my_function + 1

@mutate
def _my_function(my_function: pd.Series) -> pd.Series:
    return my_function * 2
@skrawcz skrawcz added enhancement New feature or request decorators labels May 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
decorators enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant