Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#860] Adding Spurious Correlation feature #1140

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

allincowell
Copy link
Contributor

Summary

🎯 Purpose: Adding Spurious Correlation feature for Image datasets.

📜 Example Usage: Finds correlation score between one of the image properties like dark score, blurry score, information score, size, aspect_ratio, etc. and the class labels using certain metrics like baseline accuracy and held-out accuracy by fitting a univariate model.

[ ✏️ Write your summary here. ]

  1. Added spurious_correlation.py module in cleanlab/datalab/internal location.
  2. Added a private instance method _spurious_correlation in Datalab class that uses an instance of SpuriousCorrelations class.

Links to Relevant Issues or Conversations

Issue Link: #860
Early PR attempted: #872

@jwmueller jwmueller requested a review from elisno June 13, 2024 22:07
@@ -635,3 +636,64 @@ def load(path: str, data: Optional[Dataset] = None) -> "Datalab":
load_message = f"Datalab loaded from folder: {path}"
print(load_message)
return datalab

def _spurious_correlation(
self, properties_of_interest: Optional[List[str]] = None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, let's omit this argument in Datalab._spurious_correlation().

Remember to remove the parameter in the docstring as well.

odd_aspect_ratio_score 0.900000
"""
try:
issues = self.get_issues()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a validation step that ensures that the issues dataframe has all the relevant (image-specific) scores.
If it doesn't an error with a helpful message should be raised.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a validation step here to cjeck all vision/image issues are present in the correlations dataframe

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear, the issues dataframe should be validated, not the correlations_df.

cleanlab/datalab/datalab.py Outdated Show resolved Hide resolved
Copy link

codecov bot commented Jun 15, 2024

Codecov Report

Attention: Patch coverage is 39.62264% with 32 lines in your changes missing coverage. Please review.

Project coverage is 95.75%. Comparing base (18dfb0d) to head (454f980).

Current head 454f980 differs from pull request most recent head 2b01057

Please upload reports for the commit 2b01057 to get more accurate results.

Files Patch % Lines
cleanlab/datalab/internal/spurious_correlation.py 45.23% 23 Missing ⚠️
cleanlab/datalab/datalab.py 18.18% 9 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1140      +/-   ##
==========================================
+ Coverage   94.34%   95.75%   +1.41%     
==========================================
  Files          80       81       +1     
  Lines        6100     6153      +53     
  Branches     1079     1019      -60     
==========================================
+ Hits         5755     5892     +137     
+ Misses        261      168      -93     
- Partials       84       93       +9     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@allincowell allincowell requested a review from elisno June 18, 2024 19:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants