Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scalar return for check in polars-backed model fails on validation with lazy=True #1659

Open
3 tasks done
SandroCasagrande opened this issue May 27, 2024 · 0 comments
Open
3 tasks done
Labels
bug Something isn't working

Comments

@SandroCasagrande
Copy link

Describe the bug
A clear and concise description of what the bug is.

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of pandera.
  • (optional) I have confirmed this bug exists on the main branch of pandera.

Code Sample, a copy-pastable example

I am using scalar returns for checks on a polars-backed DataFrameModel

import pandera.polars as pa
import polars as pl

class MyModel(pa.DataFrameModel):
    a: pl.Int64

    @pa.check("a")
    def failing_check(self, _) -> bool:
        return False

MyModel(pl.DataFrame({"a": [1]}), lazy=True)

which results in the following traceback

Traceback (most recent call last):
  File "/Users/sandro/code/pandera/scalar_returns_polars.py", line 12, in <module>
    MyModel(pl.DataFrame({"a": [1]}), lazy=True)
  File "/Users/sandro/code/pandera/pandera/api/dataframe/model.py", line 138, in __new__
    DataFrameBase[TDataFrameModel], cls.validate(*args, **kwargs)
                                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/sandro/code/pandera/pandera/api/dataframe/model.py", line 289, in validate
    cls.to_schema().validate(
  File "/Users/sandro/code/pandera/pandera/api/polars/container.py", line 58, in validate
    output = self.get_backend(check_obj).validate(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/sandro/code/pandera/pandera/backends/polars/container.py", line 89, in validate
    results = check(*args)  # type: ignore[operator]
              ^^^^^^^^^^^^
  File "/Users/sandro/code/pandera/pandera/backends/polars/container.py", line 179, in run_schema_component_checks
    result = schema_component.validate(check_obj, lazy=lazy)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/sandro/code/pandera/pandera/api/polars/components.py", line 146, in validate
    output = self.get_backend(check_obj).validate(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/sandro/code/pandera/pandera/backends/polars/components.py", line 80, in validate
    error_handler = self.run_checks_and_handle_errors(
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/sandro/code/pandera/pandera/backends/polars/components.py", line 146, in run_checks_and_handle_errors
    error_handler.collect_error(
  File "/Users/sandro/code/pandera/pandera/api/base/error_handler.py", line 69, in collect_error
    else len(schema_error.failure_cases)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: object of type 'bool' has no len()

Expected behavior

A specific SchemaErrors should be raise, like it happens with an analogous pandas-backed model

import pandas as pd
import pandera as pa

class MyModel(pa.DataFrameModel):
    a: pd.Int64Dtype

    @pa.check("a")
    def failing_check(self, _) -> bool:
        return False

MyModel(pd.DataFrame({"a": [1]}), lazy=True)

which results in

Traceback (most recent call last):
...
pandera.errors.SchemaErrors: {
    "DATA": {
        "DATAFRAME_CHECK": [
            {
                "schema": "MyModel",
                "column": "a",
                "check": "failing_check",
                "error": "Column 'a' failed series or dataframe validator 0: <Check failing_check>"
            }
        ]
    }
}

Desktop (please complete the following information):

  • pandera version: 0.19.3 and c24dda9
  • polars version: 0.20.30

Additional context

There is an obvious difference between the handling of scalar False values in the polars backend vs the pandas backend. In the former failure_case remains scalar, which leads to the error when counting failure cases (using lazy=True). Applying an analogous wrapping like scalar_failure_case for pandas seems to work in the very simple case above, but I'm not sure if this approach holds for anything beyond my example.

@SandroCasagrande SandroCasagrande added the bug Something isn't working label May 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant