Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wandb shows unused labels after COCO transfer-learning #13987

Open
1 of 2 tasks
iokarkan opened this issue Jun 25, 2024 · 5 comments
Open
1 of 2 tasks

wandb shows unused labels after COCO transfer-learning #13987

iokarkan opened this issue Jun 25, 2024 · 5 comments
Labels
bug Something isn't working

Comments

@iokarkan
Copy link

Search before asking

  • I have searched the YOLOv8 issues and found no similar bug report.

YOLOv8 Component

Train, Integrations

Bug

I am doing a transfer-learning training based on the pre-trained COCO YOLOv8 checkpoint, where I am removing 76/80 of the COCO classes from the config and renaming the rest for my own labels to predict, according to my training dataset that contains only those 4 indices.

The logs correctly show:
4 Overriding model.yaml nc=80 with nc=4

However, checking the produced output during and after training in wandb I am noticing that the Validation-Table table images are showing other COCO class labels, that were excluded during training, with their original COCO names.

I am even renaming the 'person' class label in the YAML config used for training, but I can see bounding boxes with the 'person' class label in the wandb validation table images.

I have checked the trained model checkpoint in many frames similar to the ones in the validation table and the model never seems to predict other COCO labels, as intended, regardless of the confidence threshold set. So the bug seems to be localized in the integration between ultralytics and wandb.

Environment

Ultralytics YOLOv8.1.27 🚀 Python-3.10.12 torch-2.1.0+cu121 CUDA:0 (NVIDIA GeForce RTX 3060, 12036MiB)
Setup complete ✅

OS Linux-5.15.0-112-generic-x86_64-with-glibc2.35
Environment Linux
Python 3.10.12

matplotlib ✅ 3.9.0>=3.3.0
opencv-python ✅ 4.9.0.80>=4.6.0
pillow ✅ 9.4.0>=7.1.2
pyyaml ✅ 6.0.1>=5.3.1
requests ✅ 2.28.2>=2.23.0
scipy ✅ 1.13.1>=1.4.1
torch ✅ 2.1.0>=1.8.0
torchvision ✅ 0.16.0>=0.9.0
tqdm ✅ 4.66.4>=4.64.0
psutil ✅ 5.9.8
py-cpuinfo ✅ 9.0.0
thop ✅ 0.1.1-2209072238>=0.1.1
pandas ✅ 2.2.2>=1.1.4
seaborn ✅ 0.13.2>=0.11.0

Minimal Reproducible Example

The training script is the following:

import wandb
wandb.login()

from ultralytics import YOLO, settings
from wandb.integration.ultralytics.callback import add_wandb_callback

# View all settings
print(settings)

# Download/load the YOLOv8 model in the _weights folder
model_size = 'n'
model = YOLO(f'../_weights/yolov8{model_size}.pt')

# initialize a wandb project
wandb.init(project="...", name="...", job_type="training")

# track training with wandb
add_wandb_callback(model, enable_model_checkpointing=True)

results = model.train(
    project="..."
    name="...",
    data="...yaml",
    epochs=100,
    patience=50,
    optimizer="Adam",
    seed=7,
    imgsz=1280,
    batch=8,
    device=0,
    dropout=0.0,
    resume=False,
    save_period=10
    )

# finalize the W&B Run
wandb.finish()

# reset all settings to default values
settings.reset()

Additional

No response

Are you willing to submit a PR?

  • Yes I'd like to help by submitting a PR!
@iokarkan iokarkan added the bug Something isn't working label Jun 25, 2024
@glenn-jocher
Copy link
Member

@iokarkan hello,

Thank you for providing detailed information about your issue. It appears that the problem lies in the integration between Ultralytics YOLOv8 and Weights & Biases (W&B), specifically with the display of unused COCO class labels in the validation table images.

To help us investigate further, could you please provide a minimal reproducible example that includes the specific YAML configuration file you are using? This will allow us to replicate the issue more accurately. You can find guidance on creating a minimal reproducible example here.

Additionally, please ensure that you are using the latest versions of both Ultralytics YOLOv8 and Weights & Biases packages. Sometimes, issues like these are resolved in newer releases.

Here's a quick checklist to verify:

  1. Ensure your ultralytics package is up-to-date:
    pip install --upgrade ultralytics
  2. Ensure your wandb package is up-to-date:
    pip install --upgrade wandb

If the issue persists after updating, please share the YAML configuration and any additional relevant code snippets. This will help us diagnose and address the problem more effectively.

Thank you for your cooperation! 😊

@iokarkan
Copy link
Author

iokarkan commented Jun 27, 2024

The strategy to reproduce the bug is to use a OpenImagesv7 checkpoint and train on a modified COCO-8 dataset with 2 classes.

The venv uses ultralytics==8.1.27 and wandb==0.17.0, as in the original bug description. I noted I do get a warning:

wandb: WARNING This integration is tested and supported for ultralytics v8.0.238 and below.
wandb: WARNING             Please report any issues to https://github.com/wandb/wandb/issues with the tag `yolov8`.

therefore I will investigate also with the reported as supported version (changing the requirements line to 8.0.238 and re-running seems to give the same wandb result).

After training for 2 epochs, the following shows up in wandb:
image

From what I understand the extra classes are coming from the OpenImagesv7 dataset, and should not be predicted in my validation at all as I am not using them in my transfer-learning.


Below are the files used in the process:

  • coco8-reduced.yaml
    • This is a points to a modified coco8 dataset, with only 1 picture kept and the corresponding txt class label keeping only 2 classes, renamed to 0 and 1 (to match the yaml). I downloaded it and modified it to be able to train, as I could not find a faster way to prepare a dataset for transfer learning:

e.g. 000000000009.txt as

0 0.479492 0.688771 0.955609 0.5955
0 0.736516 0.247188 0.498875 0.476417
0 0.339438 0.418896 0.678875 0.7815
1 0.646836 0.132552 0.118047 0.0969375
1 0.773148 0.129802 0.0907344 0.0972292
1 0.668297 0.226906 0.131281 0.146896
1 0.642859 0.0792187 0.148063 0.148062
# Ultralytics YOLO 🚀, AGPL-3.0 license
# COCO8 dataset (first 8 images from COCO train2017) by Ultralytics
# Documentation: https://docs.ultralytics.com/datasets/detect/coco8/
# Example usage: yolo train data=coco8.yaml
# parent
# ├── ultralytics
# └── datasets
#     └── coco8  ← downloads here (1 MB)

# Train/val/test sets as 1) dir: path/to/imgs, 2) file: path/to/imgs.txt, or 3) list: [path/to/imgs1, path/to/imgs2, ..]
path: ../datasets/coco8 # dataset root dir
train: images/train # train images (relative to 'path') 4 images
val: images/val # val images (relative to 'path') 4 images
test: # test images (optional)

# Classes
names:
  0: cake
  1: potted plant

# Download script/URL (optional)
download: https://ultralytics.com/assets/coco8.zip
absl-py==2.1.0
asttokens==2.4.1
astunparse==1.6.3
beautifulsoup4==4.12.3
cachetools==5.3.3
certifi==2022.12.7
charset-normalizer==2.1.1
click==8.1.7
comm==0.2.2
contourpy==1.2.1
cycler==0.12.1
debugpy==1.8.1
decorator==5.1.1
decord==0.6.0
docker-pycreds==0.4.0
exceptiongroup==1.2.0
executing==2.0.1
filelock==3.9.0
fire==0.6.0
flatbuffers==24.3.25
fonttools==4.51.0
fsspec==2023.4.0
gast==0.5.4
gdown==5.1.0
gitdb==4.0.11
GitPython==3.1.43
google-auth==2.29.0
google-auth-oauthlib==1.0.0
google-pasta==0.2.0
grpcio==1.62.1
h5py==3.11.0
idna==3.4
ipykernel==6.29.4
ipython==8.23.0
jedi==0.19.1
Jinja2==3.1.2
jupyter_client==8.6.1
jupyter_core==5.7.2
keras==2.14.0
kiwisolver==1.4.5
libclang==18.1.1
Markdown==3.6
markdown-it-py==3.0.0
MarkupSafe==2.1.3
matplotlib==3.8.4
matplotlib-inline==0.1.7
mdurl==0.1.2
ml-dtypes==0.2.0
mpmath==1.3.0
nest-asyncio==1.6.0
networkx==3.2.1
numpy==1.26.3
nvidia-cublas-cu11==11.11.3.6
nvidia-cuda-cupti-cu11==11.8.87
nvidia-cuda-nvcc-cu11==11.8.89
nvidia-cuda-runtime-cu11==11.8.89
nvidia-cudnn-cu11==8.7.0.84
nvidia-cufft-cu11==10.9.0.58
nvidia-curand-cu11==10.3.0.86
nvidia-cusolver-cu11==11.4.1.48
nvidia-cusparse-cu11==11.7.5.86
nvidia-nccl-cu11==2.16.5
oauthlib==3.2.2
# NOTE: there are conflicts when both libraries are installed
# https://stackoverflow.com/questions/55313610/importerror-libgl-so-1-cannot-open-shared-object-file-no-such-file-or-directo
opencv-python==4.9.0.80
opencv-python-headless==4.9.0.80
opencv-contrib-python-headless==4.9.0.80
opt-einsum==3.3.0
packaging==24.0
pandas==2.2.2
parso==0.8.4
pexpect==4.9.0
pillow==10.2.0
platformdirs==4.2.0
prompt-toolkit==3.0.43
protobuf==4.25.3
psutil==5.9.8
ptyprocess==0.7.0
pure-eval==0.2.2
py-cpuinfo==9.0.0
pyasn1==0.6.0
pyasn1_modules==0.4.0
pybboxes==0.1.6
Pygments==2.17.2
pyparsing==3.1.2
PySocks==1.7.1
python-dateutil==2.9.0.post0
pytz==2024.1
PyYAML==6.0.1
pyzmq==26.0.0
requests==2.28.1
requests-oauthlib==2.0.0
retina-face==0.0.16
rich==13.7.1
rsa==4.9
# TODO: change this when 0.11.17 is accepted in PyPI
# sahi==0.11.16
git+https://github.com/obss/[email protected]#egg=sahi
scipy==1.13.0
seaborn==0.13.2
sentry-sdk==2.2.1
setproctitle==1.3.3
shapely==2.0.4
six==1.16.0
smmap==5.0.1
soupsieve==2.5
stack-data==0.6.3
sympy==1.12
tensorboard==2.14.1
tensorboard-data-server==0.7.2
tensorflow==2.14.0
tensorflow-estimator==2.14.0
tensorflow-io-gcs-filesystem==0.36.0
tensorrt==8.5.3.1
termcolor==2.4.0
terminaltables==3.1.10
thop==0.1.1.post2209072238
--extra-index-url https://download.pytorch.org/whl/cu118
torch==2.1.1+cu118
torchvision==0.16.1+cu118
tornado==6.4
tqdm==4.66.2
traitlets==5.14.2
triton==2.1.0
typing_extensions==4.8.0
tzdata==2024.1
ultralytics==8.1.27
urllib3==1.26.13
wandb==0.17.0
wcwidth==0.2.13
Werkzeug==3.0.2
wrapt==1.14.1
import wandb
wandb.login()

from ultralytics import YOLO, settings
from wandb.integration.ultralytics.callback import add_wandb_callback
settings.update({
    'datasets_dir': '../datasets/',
    'runs_dir': '../runs/',
    })

# View all settings
print(settings)

# Download/load the YOLOv8 model in the _weights folder
model_size = 'n'
model = YOLO(f'../_weights/yolov8{model_size}-oiv7.pt')

# initialize a wandb project
wandb.init(project="ultralytics-issue", name="test", job_type="training")

# track training with wandb
add_wandb_callback(model, enable_model_checkpointing=True)

results = model.train(
    project="ultralytics-issue",
    name="test",
    data="./coco8-reduced.yaml",
    epochs=2,
    patience=50,
    optimizer="Adam",
    seed=7,
    imgsz=640,
    batch=8,
    dropout=0.0,
    resume=False,
    device=0
    )

# finalize the W&B Run
wandb.finish()

# reset all settings to default values
settings.reset()

@glenn-jocher
Copy link
Member

Hello @iokarkan,

Thank you for providing the detailed information and the reproducible example. It’s very helpful for diagnosing the issue.

From your description and the provided code, it seems that the problem lies in the integration between Ultralytics YOLOv8 and Weights & Biases (W&B), where unused COCO class labels are still appearing in the validation table images.

Steps to Address the Issue:

  1. Verify Package Versions: Ensure you are using the latest versions of both ultralytics and wandb. The warning you received indicates that the integration is tested for ultralytics v8.0.238 and below. However, you mentioned using ultralytics v8.1.27. Please try updating to the latest versions to see if the issue persists:

    pip install --upgrade ultralytics wandb
  2. Check Class Mappings: Ensure that the class mappings in your coco8-reduced.yaml file are correctly set and that the model is properly initialized with the new class labels. It appears you have done this, but double-checking might help.

  3. W&B Callback: The W&B callback should correctly log the new class labels. Ensure that the callback is correctly added and that the enable_model_checkpointing parameter is set to True:

    from wandb.integration.ultralytics import add_wandb_callback
    add_wandb_callback(model, enable_model_checkpointing=True)
  4. Debugging: To further debug, you might want to print out the class labels and predictions during the validation phase to ensure that the model is not predicting the excluded classes:

    results = model.val()
    print(results)

Example Code Snippet:

Here’s a concise example to ensure everything is set up correctly:

import wandb
from ultralytics import YOLO, settings
from wandb.integration.ultralytics.callback import add_wandb_callback

# Initialize W&B
wandb.login()
wandb.init(project="ultralytics-issue", name="test", job_type="training")

# Load the model
model = YOLO('../_weights/yolov8n-oiv7.pt')

# Add W&B callback
add_wandb_callback(model, enable_model_checkpointing=True)

# Train the model
results = model.train(
    project="ultralytics-issue",
    name="test",
    data="./coco8-reduced.yaml",
    epochs=2,
    imgsz=640,
    batch=8,
    device=0
)

# Validate the model
val_results = model.val()
print(val_results)

# Finalize W&B run
wandb.finish()

Additional Resources:

For more detailed guidance on integrating Ultralytics YOLOv8 with Weights & Biases, you can refer to the Ultralytics documentation on W&B integration.

If the issue persists after these steps, please let us know, and we can further investigate. Your cooperation and detailed reporting are greatly appreciated! 😊

@iokarkan
Copy link
Author

Thank you for taking the time @glenn-jocher.

A couple of remarks:

  • The "extra" / "unused" labels in wandb come from the pre-trained checkpoint in both cases detailed in my posts, so in my latest MRE post the labels come from OIv7, not COCO.
  • I edited my post to say I did change to ultralytics==8.0.238 but the observed wandb behaviour persists.
  • The suggested wandb setup in the training script looks identical to what I posted, please let me know if there's something I missed.

Based on your other suggestion, I validated the model with model.val():

YOLOv8n summary (fused): 168 layers, 3006038 parameters, 0 gradients, 8.1 GFLOPs
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 1/1 [00:00<00:00, 39.25it/s]
                   all          4          1    0.00182          1     0.0212     0.0129
                  cake          4          1    0.00182          1     0.0212     0.0129

@glenn-jocher
Copy link
Member

Hello @iokarkan,

Thank you for the detailed follow-up and for providing the additional context. Your observations are very helpful in diagnosing the issue.

Key Points:

  1. Source of Extra Labels: It’s clear that the extra labels in W&B are originating from the pre-trained OpenImagesv7 (OIv7) checkpoint. This indicates that the issue is related to how W&B logs the class labels from the pre-trained model, even after they have been overridden.

  2. Version Verification: You've confirmed that the issue persists with ultralytics==8.0.238, which is within the supported range for W&B integration. This helps narrow down the potential causes.

  3. Validation Results: Your validation results show that the model is indeed predicting only the intended classes (cake), which aligns with your training setup. This further suggests that the issue is specific to the W&B logging rather than the model's predictions.

Next Steps:

To address the issue with W&B logging extra labels, consider the following steps:

  1. Explicit Class Mapping: Ensure that the class mapping is explicitly set in the W&B configuration. This can sometimes help in overriding the default labels from the pre-trained checkpoint.

  2. Custom Callback: You might want to create a custom W&B callback to ensure that only the intended classes are logged. Here’s a quick example of how you might modify the callback:

    from wandb.integration.ultralytics import WandbCallback
    
    class CustomWandbCallback(WandbCallback):
        def on_val_end(self, trainer, pl_module):
            # Custom logic to filter out unwanted labels
            super().on_val_end(trainer, pl_module)
            # Ensure only the intended classes are logged
            trainer.logger.experiment.log({"custom_classes": ["cake", "potted plant"]})
    
    # Use the custom callback
    model.add_callback(CustomWandbCallback())
  3. W&B Support: Since this issue seems to be specific to the W&B integration, it might be beneficial to reach out to W&B support or check their GitHub issues for similar reports. They might have additional insights or fixes for this behavior.

Conclusion:

The issue appears to be with how W&B logs class labels from the pre-trained checkpoint. By ensuring explicit class mapping and possibly using a custom callback, you can mitigate this issue. If the problem persists, reaching out to W&B support would be a prudent next step.

Thank you for your patience and detailed reporting. If you have any further questions or need additional assistance, feel free to ask. 😊

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants