Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem creating datasets with Azure storage when multi file #1285

Open
SamGalanakis opened this issue Jun 19, 2024 · 4 comments
Open

Problem creating datasets with Azure storage when multi file #1285

SamGalanakis opened this issue Jun 19, 2024 · 4 comments
Labels
bug Something isn't working

Comments

@SamGalanakis
Copy link

I am using Azure for storing everything and the authentication works e.g.

from clearml import Dataset
dataset = Dataset.create(dataset_name="sanity_test", dataset_project="LOGOCube")

dataset.add_files("README.md")
dataset.upload(
)
dataset.finalize()

Works fine and I see it on Azure. Also tried with larger (single) files with no issue.

But when I try to run the provided example which has a folder of files:

# Download CIFAR dataset and create a dataset with ClearML's Dataset class
from clearml import StorageManager, Dataset

manager = StorageManager()

dataset_path = manager.get_local_copy(
    remote_url="https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz"
)

dataset = Dataset.create(
    dataset_name="cifar_dataset", dataset_project="dataset_examples"
)

# Prepare and clean data here before it is added to the dataset

dataset.add_files(path=dataset_path)

# Dataset is uploaded to the ClearML Server by default
dataset.upload()

dataset.finalize()

It logs the folder, some metadata and then starts throwing errors:

python clearml_dataset_creation.py 
ClearML results page: https://app.clear.ml/projects/14ccfc7a20f54b02b2539ba3b36da47c/experiments/bfeb2290824d4adbb2b67e22236ea53d/output/log
ClearML dataset page: https://app.clear.ml/datasets/simple/14ccfc7a20f54b02b2539ba3b36da47c/experiments/bfeb2290824d4adbb2b67e22236ea53d
Generating SHA2 hash for 8 files
100%|█████████████████████████████████████████████████████████████████████| 8/8 [00:00<00:00, 72.26it/s]
Hash generation completed
2024-06-19 09:11:01,093 - clearml.storage - ERROR - Exception encountered while uploading Failed uploading object /dataset_examples/.datasets/cifar_dataset/cifar_dataset.bfeb2290824d4adbb2b67e22236ea53d/metrics/HTML/readme.html/HTML_readme.html_00000000.html (403): <?xml version="1.0" encoding="utf-8"?>
<Error><Code>AuthenticationFailed</Code><Message>Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature.
RequestId:ed761ce3-201e-0024-4428-c29c5e000000
Time:2024-06-19T09:10:58.9893281Z</Message><AuthenticationErrorDetail>Authentication scheme Bearer is not supported in this version.</AuthenticationErrorDetail></Error>
2024-06-19 09:11:01,093 - clearml.metrics - WARNING - Failed uploading to https://clearmltest.blob.core.windows.net/clearml (Failed uploading object /dataset_examples/.datasets/cifar_dataset/cifar_dataset.bfeb2290824d4adbb2b67e22236ea53d/metrics/HTML/readme.html/HTML_readme.html_00000000.html (403): <?xml version="1.0" encoding="utf-8"?>
<Error><Code>AuthenticationFailed</Code><Message>Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature.
RequestId:ed761ce3-201e-0024-4428-c29c5e000000
Time:2024-06-19T09:10:58.9893281Z</Message><AuthenticationErrorDetail>Authentication scheme Bearer is not supported in this version.</AuthenticationErrorDetail></Error>)
2024-06-19 09:11:01,094 - clearml.metrics - ERROR - Not uploading 1/5 events because the data upload failed
Uploading dataset changes (8 files compressed to 162.15 MiB) to azure://clearmltest.blob.core.windows.net/clearml

What is the issue here?

I am using the following config:

sdk {
    development {
        default_output_uri: azure://clearmltest.blob.core.windows.net/clearml/
    }
    azure.storage {
        containers: [
            {
                account_name: ${AZURE_STORAGE_ACCOUNT}
                account_key: ${AZURE_STORAGE_KEY}
                container_name: clearml
            }
        ]
    }
}
@SamGalanakis SamGalanakis added the bug Something isn't working label Jun 19, 2024
@jkhenning
Copy link
Member

Hi @SamGalanakis , this seems perhaps to be related to the azure python package version - can you share the python packages versions you're using?

@SamGalanakis
Copy link
Author

Hi @jkhenning this is the pip freeze

requirements.txt

@SamGalanakis
Copy link
Author

Also I see that it does store the main data but fails on some metadata / auxillary files.

RequestId:48002e24-301e-005a-022e-c20c19000000
Time:2024-06-19T09:52:41.5150691Z</Message><AuthenticationErrorDetail>Authentication scheme Bearer is not supported in this version.</AuthenticationErrorDetail></Error>)
2024-06-19 09:52:43,794 - clearml.metrics - ERROR - Not uploading 1/5 events because the data upload failed
Uploading dataset changes (8 files compressed to 162.15 MiB) to azure://clearmltest.blob.core.windows.net/clearml
File compression and upload completed: total size 162.15 MiB, 1 chunk(s) stored (average size 162.15 MiB)

@SamGalanakis
Copy link
Author

@jkhenning Any update on this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants