-
Notifications
You must be signed in to change notification settings - Fork 955
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: OnDemandFeatureView type inference for array types #4310
base: master
Are you sure you want to change the base?
fix: OnDemandFeatureView type inference for array types #4310
Conversation
feac39c
to
6b7afb9
Compare
I think the best way for this to be tested would be in respective unit tests: test_on_demand_pandas_transformation.py, test_on_demand_python_transformation.py and test_substrait_transformation.py |
3defb2a
to
9594997
Compare
Signed-off-by: Alex Mirrington <[email protected]>
9594997
to
62208ea
Compare
@tokoko Added some tests for lists of bools, strings, floats and ints for both python and pandas ODFVs, as well as a request source in there. LMK what you think! |
@@ -155,6 +155,7 @@ def python_type_to_feast_value_type( | |||
"uint16": ValueType.INT32, | |||
"uint8": ValueType.INT32, | |||
"int8": ValueType.INT32, | |||
"bool_": ValueType.BOOL, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is this type bool_
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a numpy bool type, but harder to identify since we use type(value).__name__
, e.g.
>>> import pandas as pd
>>> data = pd.DataFrame({"a": [True, False]})
>>> type(data["a"].iloc[0])
<class 'numpy.bool_'>
>>> type(data["a"].iloc[0]).__name__
'bool_'
We need this mapping for boolean features in pandas transformations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gotcha. add a simple doc would be better.
@@ -44,7 +44,9 @@ def infer_features(self, random_input: dict[str, list[Any]]) -> list[Field]: | |||
Field( | |||
name=f, | |||
dtype=from_value_type( | |||
python_type_to_feast_value_type(f, type_name=type(dt[0]).__name__) | |||
python_type_to_feast_value_type( | |||
f, value=dt[0], type_name=type(dt[0]).__name__ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could the list be empty and index out of range?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's certainly possible if the transformation written by the user doesn't return a list of length 1 - but I think under the assumption that the transformation is written correctly, then we wouldn't expect index errors. Perhaps we could add some validation checks in here with some nice error messages for users, instead of just bubbling up IndexErrors during UDF development?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
validation sounds good to me
@HaoXuAI FYI seeing a few rate limits for bigtable and redshift in integration tests, doesn't seem related to my changes but maybe you know who best could take a look? |
assert type(result["acc_rate"]) == float | ||
assert type(result["avg_daily_trips"]) == int | ||
# On-demand view | ||
assert type(result["avg_daily_trips_plus_one"]) == int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need these type asserts? The fact that you provided a schema above should be enough, as feast itself checks that inferred and provided schemas match? wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy to remove them if you think they are cluttering tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I suppose checking return values can also matter in some cases
@alexmirrington Can you make a dummy commit to force a rerun? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
What and Why
OnDemandFeatureView.feature_transformation.infer_features
should be able to infer features from python types for all supported feast data types, for all transformation backends. This PR passes values topython_type_to_feast_value_type
so that list types can be inferred correctly.Tests
PandasTransformation.infer_features
, e.g.Note to reviewers
If you can point me to where I could add some feature repository tests to catch any regressions in the future that would be great 🙏
Fixes
Fixes #4308