Dynamic GPU/Queue Allocation for Workers in ClearML #1283

jabuarab · 2024-06-11T16:46:51Z

Description:

I am looking for guidance on:

Whether it's possible to configure multiple workers on the same machine with specific GPU assignments.
How to set up conditional queues that check GPU availability.
Suggestions or best practices for implementing a service-task to manage GPU allocation dynamically.
I am currently using ClearML on a machine with two GPUs. I have configured a single worker that utilizes both GPUs and assigned a queue specifically for this worker. However, I want to improve the flexibility of resource allocation and task management. Specifically, I would like to achieve the following setup:

Three Workers on the Same Machine:

One worker assigned to GPU 0.
One worker assigned to GPU 1.
One worker that can utilize both GPUs.

Conditional Queues:

Use queues that can check GPU availability or whether a task is already running on the GPUs before enqueuing new tasks.
Additionally, I have considered creating a service-task that checks GPU availability before enqueuing tasks into the appropriate queues, potentially managing this with a fourth queue.

Steps to Reproduce

Set up a machine with 2 GPUs.
Install and configure ClearML with one worker that uses both GPUs.
Attempt to create three separate workers with the described GPU assignments.
Explore the possibility of setting up conditional queues or a service-task to manage GPU availability.

Expected Behavior

The system should be able to dynamically assign tasks to the appropriate worker based on GPU availability:

If GPU 0 is free, assign the task to the worker using GPU 0.
If GPU 1 is free, assign the task to the worker using GPU 1.
If both GPUs are free, assign the task to the worker using both GPUs.
If a task is enqueued on both-gpus queue and the other workes are running a task,wait for them to wait and make sure they don´t start a new task

Actual Behavior

Currently, there is only one worker that utilizes both GPUs, which limits the flexibility in task management and GPU utilization.

Environment

OS: Ubuntu 22.04.4 LTS
ClearML Version: WebApp: 1.15.0-472 • Server: 1.15.0-472 • API: 2.29
GPU: 2 x Tesla V100-FHHL-16GB

Thank you for your assistance!

jabuarab changed the title ~~Dynamic GPU Allocation for Workers in ClearML~~ Dynamic GPU/Queue Allocation for Workers in ClearML Jun 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dynamic GPU/Queue Allocation for Workers in ClearML #1283

Dynamic GPU/Queue Allocation for Workers in ClearML #1283

jabuarab commented Jun 11, 2024

Dynamic GPU/Queue Allocation for Workers in ClearML #1283

Dynamic GPU/Queue Allocation for Workers in ClearML #1283

Comments

jabuarab commented Jun 11, 2024

Description:

Three Workers on the Same Machine:

Conditional Queues:

Steps to Reproduce

Expected Behavior

Actual Behavior

Environment