Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert polars DataFrame back to datasets #6984

Open
ljw20180420 opened this issue Jun 19, 2024 · 1 comment · May be fixed by #6986
Open

Convert polars DataFrame back to datasets #6984

ljw20180420 opened this issue Jun 19, 2024 · 1 comment · May be fixed by #6986
Labels
enhancement New feature or request

Comments

@ljw20180420
Copy link

Feature request

This returns error.

from datasets import Dataset

dsdf = Dataset.from_dict({"x": [[1, 2], [3, 4, 5]], "y": ["a", "b"]})
Dataset.from_polars(dsdf.to_polars())

ValueError: Arrow type large_list<item: int64> does not have a datasets dtype equivalent.

Motivation

When datasets contain Sequence data type, it will be converted to Arrow type large_list. However, the reverse (from large_list to Sequence) does not work.

Your contribution

No

@ljw20180420 ljw20180420 added the enhancement New feature or request label Jun 19, 2024
@arthasking123 arthasking123 linked a pull request Jun 19, 2024 that will close this issue
@lhoestq
Copy link
Member

lhoestq commented Jun 25, 2024

Hi ! Thanks for reporting :)

We don't support large_list yet, though it should be added to Sequence IMO (maybe with a parameter large=True ?)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants