Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding type hints for cleanlab/filter #598

Draft
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

unna97
Copy link
Contributor

@unna97 unna97 commented Jan 10, 2023

No description provided.

@unna97 unna97 marked this pull request as draft January 10, 2023 21:48
@codecov
Copy link

codecov bot commented Jan 10, 2023

Codecov Report

Merging #598 (ce7d561) into master (5eb89fc) will increase coverage by 0.00%.
The diff coverage is 100.00%.

❗ Current head ce7d561 differs from pull request most recent head aa549e7. Consider uploading reports for the commit aa549e7 to get more accurate results

@@           Coverage Diff           @@
##           master     #598   +/-   ##
=======================================
  Coverage   96.23%   96.23%           
=======================================
  Files          60       60           
  Lines        4705     4707    +2     
  Branches      817      817           
=======================================
+ Hits         4528     4530    +2     
  Misses         91       91           
  Partials       86       86           
Impacted Files Coverage Δ
cleanlab/filter.py 99.02% <100.00%> (+<0.01%) ⬆️

cleanlab/filter.py Outdated Show resolved Hide resolved
@jwmueller jwmueller requested a review from elisno January 13, 2023 02:48
Adding the types for variables within the functions. type cheatsheet:
1. np array of float type: npt.NDArray["np.floating[T]"]
2. np array of int type: npt.NDArray[np.int_]
3. np array of bool type: npt.NDArray[np.bool_]
4. np.array of either bool or int: npt.NDArray[Union[np.bool_, np.int_]]
Copy link
Contributor Author

@unna97 unna97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@elisno Can you approve the np arrays as I have been using them? (cheatsheet in the commit message)

@@ -298,16 +298,16 @@ class 0, 1, ..., K-1.
# Create `prune_count_matrix` with the number of examples to remove in each class and
# leave at least min_examples_per_class examples per class.
# `prune_count_matrix` is transposed relative to the confident_joint.
prune_count_matrix = _keep_at_least_n_per_class(
prune_count_matrix:npt.NDArray[np.int_] = _keep_at_least_n_per_class(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The prune_count_matrix changes the data type (from an array of int to float). Should I declare it as a float? Although the function here returns int explicitly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll have to take a better look later today.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@elisno did you have a chance to look at this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see where the datatype should change in this function. I think this annotation is fine for now.

confident_joint: Optional[np.ndarray] = None,
n_jobs: Optional[int] = None,
verbose: bool = False,
) -> np.ndarray:
) -> npt.NDArray[Union[np.bool_, np.int_]]:
#TODO: add docstring in the same format as other functions with args and returns
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can I add to-dos wherever the doc doesn't have the same format as the other functions?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you referring to the "shape"-format?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The format of doc itself.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohh right, now I see what you mean! Thank you for pointing this out!

That's not related to this PR, so you shouldn't add this particular to-do in this PR.
Feel free to open an issue!
If you find other instances, you can add them to the same issue.

We want to address these kinds of docstrings in a separate PR.

@elisno
Copy link
Member

elisno commented Jun 29, 2023

I've updated some of the type-hints to pass more of the strict checks (not 100% just yet).

CI is now failing on "Test without optional dependencies and with minimum compatible versions of dependencies", where numpy.typing is not recognized.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants