Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: OnDemandFeatureView type inference for array types #4310

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

alexmirrington
Copy link

@alexmirrington alexmirrington commented Jun 24, 2024

What and Why

OnDemandFeatureView.feature_transformation.infer_features should be able to infer features from python types for all supported feast data types, for all transformation backends. This PR passes values to python_type_to_feast_value_type so that list types can be inferred correctly.

Tests

  • Unit tests passing in CI for all Python versions on my fork.
  • Tested against a local project by patching PandasTransformation.infer_features, e.g.
from feast.transformation.pandas_transformation import PandasTransformation

def __patched_infer_features(
    self, random_input: dict[str, list[Any]]
) -> list[Field]:
    df = pd.DataFrame.from_dict(random_input)
    output_df: pd.DataFrame = self.transform(df)
    return [
        Field(
            name=f,
            dtype=from_value_type(
                python_type_to_feast_value_type(
                    f, value=output_df[f].tolist()[0], type_name=str(dt)
                )
            ),
        )
        for f, dt in zip(output_df.columns, output_df.dtypes)
    ]

PandasTransformation.infer_features = __patched_infer_features

Note to reviewers
If you can point me to where I could add some feature repository tests to catch any regressions in the future that would be great 🙏

Fixes

Fixes #4308

@alexmirrington alexmirrington changed the title Fix OnDemandFeatureView type inference for array types fix: OnDemandFeatureView type inference for array types Jun 24, 2024
@alexmirrington alexmirrington marked this pull request as ready for review June 24, 2024 10:49
@tokoko
Copy link
Collaborator

tokoko commented Jun 24, 2024

I think the best way for this to be tested would be in respective unit tests: test_on_demand_pandas_transformation.py, test_on_demand_python_transformation.py and test_substrait_transformation.py

@alexmirrington alexmirrington force-pushed the fix-odfv-type-inference branch 2 times, most recently from 3defb2a to 9594997 Compare June 25, 2024 05:19
@alexmirrington
Copy link
Author

@tokoko Added some tests for lists of bools, strings, floats and ints for both python and pandas ODFVs, as well as a request source in there. LMK what you think!

@@ -155,6 +155,7 @@ def python_type_to_feast_value_type(
"uint16": ValueType.INT32,
"uint8": ValueType.INT32,
"int8": ValueType.INT32,
"bool_": ValueType.BOOL,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is this type bool_?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a numpy bool type, but harder to identify since we use type(value).__name__, e.g.

>>> import pandas as pd
>>> data = pd.DataFrame({"a": [True, False]})
>>> type(data["a"].iloc[0])
<class 'numpy.bool_'>
>>> type(data["a"].iloc[0]).__name__
'bool_'

We need this mapping for boolean features in pandas transformations.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gotcha. add a simple doc would be better.

@@ -44,7 +44,9 @@ def infer_features(self, random_input: dict[str, list[Any]]) -> list[Field]:
Field(
name=f,
dtype=from_value_type(
python_type_to_feast_value_type(f, type_name=type(dt[0]).__name__)
python_type_to_feast_value_type(
f, value=dt[0], type_name=type(dt[0]).__name__
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could the list be empty and index out of range?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's certainly possible if the transformation written by the user doesn't return a list of length 1 - but I think under the assumption that the transformation is written correctly, then we wouldn't expect index errors. Perhaps we could add some validation checks in here with some nice error messages for users, instead of just bubbling up IndexErrors during UDF development?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

validation sounds good to me

@alexmirrington
Copy link
Author

alexmirrington commented Jun 27, 2024

@HaoXuAI FYI seeing a few rate limits for bigtable and redshift in integration tests, doesn't seem related to my changes but maybe you know who best could take a look?

assert type(result["acc_rate"]) == float
assert type(result["avg_daily_trips"]) == int
# On-demand view
assert type(result["avg_daily_trips_plus_one"]) == int
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need these type asserts? The fact that you provided a schema above should be enough, as feast itself checks that inferred and provided schemas match? wdyt?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to remove them if you think they are cluttering tests

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I suppose checking return values can also matter in some cases

@tokoko
Copy link
Collaborator

tokoko commented Jun 28, 2024

@alexmirrington Can you make a dummy commit to force a rerun?
@HaoXuAI Some tests still need First-time contributor approval

Copy link
Collaborator

@tokoko tokoko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants