Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delete all validation files since they may be old #318

Draft
wants to merge 13 commits into
base: master
Choose a base branch
from

Conversation

hrshdhgd
Copy link
Contributor

Fixes #317

@matentzn
Copy link
Collaborator

All of them? Oh wow, we must have not tested roundtripping for ages!

See https://github.com/mapping-commons/sssom-py/blob/validate-data-files/tests/test_conversion.py#L68

Can you reintroduce some of these checks? I felt quite protected having them there!

@hrshdhgd
Copy link
Contributor Author

hrshdhgd commented Sep 26, 2022

Ok, so when I put in created files in the validate_data folder and run the validation check, it still fails (literally the same file copy-pasted). I dug deeper and the root cause is in the filecmp.cmp() function.

    s1 = _sig(os.stat(f1))
    s2 = _sig(os.stat(f2))
    if s1[0] != stat.S_IFREG or s2[0] != stat.S_IFREG:
        return False
    if shallow and s1 == s2:
        return True
    if s1[1] != s2[1]:
        return False

Apparently s1 is not == s2.

s1 = (32768, 98892, 1664213293.0535548)
s2 = (32768, 98892, 1664212433.827459)

os.stat(f1) = os.stat_result(st_mode=33152, st_ino=16032794, st_dev=16777225, st_nlink=1, st_uid=502, st_gid=20, st_size=98892, st_atime=1664213294, st_mtime=1664213293, st_ctime=1664213293)

os.stat(f2) = os.stat_result(st_mode=33152, st_ino=49568904, st_dev=16777225, st_nlink=1, st_uid=502, st_gid=20, st_size=98892, st_atime=1664213022, st_mtime=1664212433, st_ctime=1664213021)

So the st_atime does not match and hence the files are determined to be not equal. This is the last access time.

stat.ST_ATIME
Time of last access.

The documentation says:
"If shallow is true and the os.stat() signatures (file type, size, and modification time) of both files are identical, the files are taken to be equal."

modification time will never match in our case.
Would this be the correct way of checking files? Should we just write a different test to test these files?

Update:

  • I implemented checks on just file type and size (dropping modification time) for now. Ideally shallow = False should do the same but it seems every time a file is generated , it seems that some of the lines get shuffled as compared the to prior time out was generated. We can discuss further in out meeting.
  • Windows tests fail

@matentzn
Copy link
Collaborator

matentzn commented Feb 4, 2024

@hrshdhgd can you update this PR?

@matentzn matentzn marked this pull request as draft February 4, 2024 10:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Clean up tests/data/validation_data
2 participants