-
-
Notifications
You must be signed in to change notification settings - Fork 294
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat: Adding more pyarrow types to pandas engine #1676
Comments
Hi @aaravind100 the prototype looks good, can you make a PR? Will just have to add some unit tests.
I'll leave that to you and others in the community to prioritize :) Which ones are left that are currently unsupported? |
@cosmicBboy created pr #1699
These types are compatible with pandas which are not added. I'll try adding some next week.
|
+1. Came looking for WorkaroundThe below seems to work as a workaround for me for now. import pandas as pd
import pandera as pa
import datetime as dt
from pandera.engines.pandas_engine import Engine, immutable, pd, pyarrow, dtypes, DataType
@Engine.register_dtype(
equivalents=[
"date64[pyarrow]",
pyarrow.date64,
pd.ArrowDtype(pyarrow.date64()),
]
)
@immutable
class ArrowDate64(DataType, dtypes.Date):
"""Semantic representation of a :class:`pyarrow.date64`."""
type = pd.ArrowDtype(pyarrow.date64())
bit_width: int = 64
class DFSchema(pa.DataFrameModel):
"""Schema for a dataframe of jobs from the endpoint
https://algodon.de-prod.dk/api/hadrian/joblist/{environment}
"""
model: str = pa.Field()
notationtime: ArrowDate64 = pa.Field()
value: int = pa.Field()
df = pd.DataFrame({
"model": ["A", "B", "A", "B"],
"notationtime": ["2024-01-01", "2024-01-01", "2024-01-02", "2024-01-02"],
"value": [1,2,3,4]
})
df.notationtime=pd.to_datetime(df.notationtime).astype("date64[pyarrow]")
DFSchema(df) |
Is your feature request related to a problem? Please describe.
I'd like to continue to add some of the remaining pyarrow types to the pandas engine. I've come across these two apart from the existing types.
Describe the solution you'd like
Extend pandas_engine with
ArrowList
andArrowStruct
types.I do have a working prototype here and can raise a pr.
Additional context
Would you like to add or prioritize some other types from here?
The text was updated successfully, but these errors were encountered: