Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FIX-#7329: Do not sort columns on df.update #7330

Merged
merged 3 commits into from
Jun 28, 2024

Conversation

YarShev
Copy link
Collaborator

@YarShev YarShev commented Jun 26, 2024

What do these changes do?

  • first commit message and PR title follow format outlined here

    NOTE: If you edit the PR title to match this format, you need to add another commit (even if it's empty) or amend your last commit for the CI job that checks the PR title to pick up the new PR title.

  • passes flake8 modin/ asv_bench/benchmarks scripts/doc_checker.py
  • passes black --check modin/ asv_bench/benchmarks scripts/doc_checker.py
  • signed commit with git commit -s
  • Resolves BUG: columns mismatch after df.update #7329
  • tests added and passing
  • module layout described at docs/development/architecture.rst is up-to-date

@@ -3886,7 +3886,7 @@ def n_ary_op(
1,
new_right_frames,
join_type,
sort=True,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was sort=True here?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to add more tests because you are changing the behavior of a low-level function that is used to implement many high-level functions.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My assumption is that someone was thinking this should work for all cases.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What tests would you suggest to add?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have pretty extensive test suite for checking high-level functions.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My assumption is that someone was thinking this should work for all cases.

Why do you think that this should always work without sorting now? My doubts, for example, are based on this line:

sort=not self.get_axis(axis).equals(other.get_axis(axis)),

What tests would you suggest to add?

I don't have any specific functions, you can add for any few you find.

We already have pretty extensive test suite for checking high-level functions.

I'm not sure we wrote many tests taking into account the order of the columns.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, handling sort here is a bit tricky. Changed the logic and added a test for some bin op.

Signed-off-by: Igoshev, Iaroslav <[email protected]>
sort=True,
sort=(
not all(
self.get_axis(1).equals(right.get_axis(1))
Copy link
Collaborator

@arunjose696 arunjose696 Jun 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it make sense to move this logic to def _copartition and do this there if sort is None?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense, done

anmyachev
anmyachev previously approved these changes Jun 27, 2024
Signed-off-by: Igoshev, Iaroslav <[email protected]>
Copy link
Collaborator

@devin-petersohn devin-petersohn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @YarShev !

@YarShev YarShev merged commit 759d548 into modin-project:main Jun 28, 2024
38 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: columns mismatch after df.update
4 participants