You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
The column checks on polars LazyFrames are not registering errors when they should. Values outside of a defined range pass validation with no warnings or errors. This is not true for polars DataFrame which does register an error.
It looks like this was addressed in a recent PR but I am still seeing the bug in the 0.19.3 release.
I have checked that this issue has not already been reported.
The issue has been reported and merged to main, but is still persisting in the most recent release
[ x] I have confirmed this bug exists on the latest version of pandera.
(optional) I have confirmed this bug exists on the main branch of pandera.
Code Sample,
# This code is taken from the examples page [here](https://pandera--1373.org.readthedocs.build/en/1373/polars.html)# With values changed to be outside the define range.importpandera.polarsaspaimportpolarsasplschema=pa.DataFrameSchema(
{
"state": pa.Column(str),
"city": pa.Column(str),
"price": pa.Column(int, pa.Check.in_range(min_value=5, max_value=20)), # check is defined
}
)
lf=pl.LazyFrame(
{
"state": ["FL", "FL", "FL", "CA", "CA", "CA"],
"city": [
"Orlando",
"Miami",
"Tampa",
"San Francisco",
"Los Angeles",
"San Diego",
],
"price": [2, 12, 10, 16, 20, 180], # values outside of defined range are passed
}
)
print(schema.validate(lf).collect()) # no errors are raised
Expected behavior
I would expect a pandera.errors.SchemaError to be raised. Note that the polars.DataFrame version of this code does raise and error.
I think this behaviour is expected. pa.Check.in_range(min_value=5, max_value=20) cannot be performed on pl.LazyFrame object as it requires reading of the data.
So are checks never assessed for LazyFrame objects?
I feel like the documentation should make this more explicit or a warning should be issued. The top example comes directly from Pandera documentation and having a check that is never assessed creates a false sense of coverage.
Checks are assessed for LazyFrame objects, but only those that don't require data being present in the memory are evaluated - so most importantly data types
Describe the bug
The column checks on polars LazyFrames are not registering errors when they should. Values outside of a defined range pass validation with no warnings or errors. This is not true for polars DataFrame which does register an error.
It looks like this was addressed in a recent PR but I am still seeing the bug in the 0.19.3 release.
Code Sample,
Expected behavior
I would expect a
pandera.errors.SchemaError
to be raised. Note that thepolars.DataFrame
version of this code does raise and error.Desktop (please complete the following information):
The text was updated successfully, but these errors were encountered: