You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I believe I discovered a bug or at least weird behavior in the ClearML scalar reporting mechanism.
In my data processing task, I have a metric, which in theory as well as in the implementation can only ever increase in value. I report the scalar in each iteration of the loop.
However, when viewed in ClearML, it shows that the scalar is actually dropping in value in certain runs of the tasks.
It is apparent, that the ordering of the reported iterations is incorrect and as such, earlier iterations are actually reported later.
This does not occur all the time, however.
Additionally, I am confused by the scalar metric in general, since I clearly have iterations going from 0 to X in incremental steps of 1. But the plot actually shows it going from iteration 0 to something like iteration 6 or 7.
So there's also something incorrect there.
Correct report: Incorrect report:
To reproduce
Create a task in function
Store a variable starting with 0
Run a loop in the task
Perform a lengthy task (calling a subprocess for example doing data processing)
Increase variable by X
Report variable in each iteration of the loop
Retry those steps some amount of times and view the report in ClearML.
Code that produced the issue for me
defcapture_design(design_folder: str):
importsubprocess, os, shutilfromclearmlimportTaskprint(f"Capturing designs from {design_folder}...")
task=Task.current_task()
logger=task.get_logger()
design_files= [fforfinos.listdir(design_folder) ifos.path.isfile(os.path.join(design_folder, f))]
iflen(design_files) ==0:
print(f"No design files found in {design_folder}")
returnwidgets= {}
forwidgetinimplemented_types:
widgets[widget] =0files= []
errors=0logger.report_scalar(title='Generator', series='total_widgets', value=sum(widgets.values()), iteration=0)
logger.report_scalar(title='Generator', series='errors', value=errors, iteration=0)
forwidgetinwidgets:
logger.report_scalar(title='Widget metrics', series=widget, value=widgets[widget], iteration=0)
fori, design_fileinenumerate(design_files):
print(f"Iteration: {i+1}/{len(design_files)} - {design_file}")
attempts=0success=False# NOTE Retry mechanism due to possible MemoryErrors when dynamically allocating screenshot data (Trust in the OS to clean up the mess)whilenotsuccessandattempts<4:
print(f"Running design generator on file {design_file}")
gen=subprocess.run([os.path.abspath(env['mpy_path']), os.path.abspath(env['mpy_main']), '-m', 'design', '-o', 'screenshot.jpg', '-f', os.path.abspath(os.path.join(design_folder, design_file)), '--normalize'], cwd=os.path.abspath(os.path.curdir), capture_output=True, text=True)
ifgen.returncode!=0:
print(f"Failed to generate UI from design file {design_file}:\n{gen.stdout}\n{gen.stderr}")
attempts+=1continuesuccess=Trueifnotsuccess:
print(f"Failed to generate UI from design file {design_file} after {attempts} attempts")
errors+=1continuetmp_image=os.path.abspath(os.path.join(os.path.abspath(os.path.curdir), "screenshot.jpg"))
tmp_text=os.path.abspath(os.path.join(os.path.abspath(os.path.curdir), "screenshot.txt"))
ifnotos.path.exists(tmp_image) ornotos.path.exists(tmp_text):
print(f"Failed to find generated UI files from design file {design_file}")
errors+=1continuegen_image=os.path.abspath(os.path.join(env['output_folder'], f"ui_{i}.jpg"))
gen_text=os.path.abspath(os.path.join(env['output_folder'], f"ui_{i}.txt"))
try:
shutil.move(tmp_image, gen_image)
shutil.move(tmp_text, gen_text)
exceptFileNotFoundErrorase:
print(f"Failed to move files from design file {design_file}:\n{tmp_image} -> {gen_image}\n{tmp_text} -> {gen_text}\n{e}")
errors+=1continuefiles.append((gen_image, gen_text))
annotation_errors= []
withopen(gen_text, 'r+') asf:
# Each line is in this format: "class x y w h" (Need to grab class)new_lines= []
fori, lineinenumerate(f.readlines()):
widget, x, y, w, h=line.split(' ')
x, y, w, h=float(x), float(y), float(w), float(h)
ifany([x<0.0, y<0.0, w<0.0, h<0.0]) orany([x>1.0, y>1.0, w>1.0, h>1.0]):
errors+=1print(f"[Line {i}] Invalid bounding box found in annotation file of {design_file}")
print(f"Removed: {widget}{x}{y}{w}{h}")
annotation_errors.append(i)
continuenew_lines.append(line)
ifwidgetinwidgets:
widgets[widget] +=1else:
errors+=1print(f"[Line {i}] Unknown widget class {widget} found in annotation file of {design_file}")
# NOTE Delete invalid annotations in label filef.seek(0)
f.writelines(new_lines)
f.truncate()
delnew_lineslogger.report_scalar(title='Generator', series='total_widgets', value=sum(widgets.values()), iteration=i+1)
logger.report_scalar(title='Generator', series='errors', value=errors, iteration=i+1)
forwidgetinwidgets:
logger.report_scalar(title='Widget metrics', series=widget, value=widgets[widget], iteration=i+1)
generated_files=len(files)
env['generated_files'] =generated_filesenv['files'] =files
Expected behaviour
Scalar plot should display the reported values for each iteration in the order that they were reported in. (i.e. each iteration in sequence)
Describe the bug
I believe I discovered a bug or at least weird behavior in the ClearML scalar reporting mechanism.
In my data processing task, I have a metric, which in theory as well as in the implementation can only ever increase in value. I report the scalar in each iteration of the loop.
However, when viewed in ClearML, it shows that the scalar is actually dropping in value in certain runs of the tasks.
It is apparent, that the ordering of the reported iterations is incorrect and as such, earlier iterations are actually reported later.
This does not occur all the time, however.
Additionally, I am confused by the scalar metric in general, since I clearly have iterations going from 0 to X in incremental steps of 1. But the plot actually shows it going from iteration 0 to something like iteration 6 or 7.
So there's also something incorrect there.
Correct report:
Incorrect report:
To reproduce
Retry those steps some amount of times and view the report in ClearML.
Code that produced the issue for me
Expected behaviour
Scalar plot should display the reported values for each iteration in the order that they were reported in. (i.e. each iteration in sequence)
Environment
Related Discussion
https://clearml.slack.com/archives/CTK20V944/p1715875927944579
The text was updated successfully, but these errors were encountered: