Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SchemaFieldNotFoundError with custom check function if no alias is provided. #1657

Open
3 tasks done
philiporlando opened this issue May 24, 2024 · 0 comments
Open
3 tasks done
Labels
bug Something isn't working

Comments

@philiporlando
Copy link
Contributor

philiporlando commented May 24, 2024

Describe the bug

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of pandera.
  • (optional) I have confirmed this bug exists on the main branch of pandera.

Code Sample, a copy-pastable example

I'm trying to validate a polars dataframe using a custom check function.

import polars as pl
import pandera.polars as pa


# Custom check function
def check_custom_condition(df: pa.PolarsData) -> pl.DataFrame:
    return df.lazyframe.select(
        pl.when(
            pl.col("column1").is_null()
            & pl.col(df.key).is_null()
        )
        .then(False)
        .otherwise(True)
        # .alias("check_result")  # Uncomment this line to avoid the issue
    )

# Define the schema for the DataFrame
schema = pa.DataFrameSchema({
    "column1": pa.Column(
        dtype=str,
        nullable=True,
    ),
    "column2": pa.Column(
        dtype=str,
        nullable=True,
        checks=[
            pa.Check(check_fn=check_custom_condition),
        ],
    ),
})

# Example DataFrame
data = {
    "column1": [None, "x", "y"],
    "column2": ["a", None, "c"]
}

df = pl.DataFrame(data)

# Validate the DataFrame using the schema and custom check
schema.validate(df, lazy=True)

The example above produces the following error:

{
    "DATA": {
        "CHECK_ERROR": [
            {
                "schema": null,
                "column": "column2",
                "check": "check_custom_condition",
                "error": "SchemaFieldNotFoundError(\"literal\")"
            }
        ]
    }
}

Expected behavior

I would expect the schema validation to run successfully here. When we uncomment the .alias("check_result") line, the schema validation runs without error. I'm trying to understand if this behavior is expected, or if this is a bug.

Desktop (please complete the following information):

  • OS: Windows 10 & Ubuntu 22.04.4
  • Browser: Chrome
  • Version: pandera==0.19.3
@philiporlando philiporlando added the bug Something isn't working label May 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant