Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Memory leak when using add_coco_labels for instance segmentation with coco_id_field set #4407

Open
1 of 3 tasks
h-fernand opened this issue May 22, 2024 · 3 comments
Open
1 of 3 tasks
Labels
bug Bug fixes

Comments

@h-fernand
Copy link

h-fernand commented May 22, 2024

Describe the problem

When trying to add COCO format instance segmentation prediction data to my dataset using add_coco_labels the program will begin rapidly using RAM until eventually it runs out of RAM and crashes. This only happens if I set the coco_id_field to coco_id so that I can sync up my annotations with my samples properly. If I omit the coco_id_field and let the function run with the default behavior, my annotations get mismatched but the program does not eat nearly as much RAM and actually does finish running. This code also produces the same erroneous behavior if I provide add_coco_labels with a view containing only the test data split instead of the whole dataset.

Code to reproduce issue

import fiftyone as fo
import fiftyone.utils.coco as fouc

dataset_name = "dataset"
splits = ['train', 'val', 'test']
dataset_root = '/path/to/dataset/root'
annotations_dir = 'annotations
annfile_template = 'instances_{split}.json'

predictions_file = '/path/to/predictions/file.json'

combined_dataset = fo.Dataset(name=dataset_name, persistent=True)

for split in splits:
    print(f"Loading: {split} dataset")

    annfile = f"{dataset_root}/{annotations_dir}/{annfile_template.format(split=split)}"
    data_path = f"{dataset_root}/{split}"
    split_dataset_name = f"ground_truth_{split}"

    split_dataset = fo.Dataset.from_dir(
        data_path=data_path,
        labels_path=annfile,
        dataset_type=fo.types.COCODetectionDataset,
        name=split_dataset_name,
        include_id=True,
        persistent=True
    )
    split_dataset.tag_samples(split)
    combined_dataset.merge_samples(split_dataset)

with open(predictions_file, 'r') as f:
    prediction_data = json.load(f)

predictions = prediction_data['annotations']
classes = prediction_data['categories']
classes = [x['name'] for x in classes]

fouc.add_coco_labels(combined_dataset, "predictions", predictions, classes, label_type="segmentations", coco_id_field="coco_id")

System information

  • OS Platform and Distribution: Linux Ubuntu 22.04
  • Python version: Python 3.10.12
  • FiftyOne version (fiftyone --version): v0.23.8
  • FiftyOne installed from (pip or source): pip

Willingness to contribute

The FiftyOne Community encourages bug fix contributions. Would you or another
member of your organization be willing to contribute a fix for this bug to the
FiftyOne codebase?

  • Yes. I can contribute a fix for this bug independently
  • Yes. I would be willing to contribute a fix for this bug with guidance
    from the FiftyOne community
  • No. I cannot contribute a bug fix at this time
@h-fernand h-fernand added the bug Bug fixes label May 22, 2024
@h-fernand
Copy link
Author

As an update, it appears that this dramatic memory usage occurs no matter how the function is used successfully. The reason it did not eat all of the RAM without the coco_id_field set was because the annotations were created in the wrong order. When fixing the order of the annotations the memory leak occurs. I'm convinced this is a memory leak because my prediction annotation file is only 2GB, there are only 1000 images in the test set that I'm adding predictions to, and the program ends up eating all of the RAM on a system with 256GB of RAM.

@brimoor
Copy link
Contributor

brimoor commented May 23, 2024

@h-fernand this sounds similar to the issue reported in #4293 which has been resolved in #4354.

(FYI the above patch will be released in fiftyone==0.24.0 which is scheduled for next week)

@h-fernand
Copy link
Author

That's great news, I'll try the patch out once it's released and hopefully it resolves the issue. I'll post an update in this thread once it's been released.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Bug fixes
Projects
None yet
Development

No branches or pull requests

2 participants