Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dynamic GPU/Queue Allocation for Workers in ClearML #1283

Open
jabuarab opened this issue Jun 11, 2024 · 0 comments
Open

Dynamic GPU/Queue Allocation for Workers in ClearML #1283

jabuarab opened this issue Jun 11, 2024 · 0 comments

Comments

@jabuarab
Copy link

Description:

I am looking for guidance on:

Whether it's possible to configure multiple workers on the same machine with specific GPU assignments.
How to set up conditional queues that check GPU availability.
Suggestions or best practices for implementing a service-task to manage GPU allocation dynamically.
I am currently using ClearML on a machine with two GPUs. I have configured a single worker that utilizes both GPUs and assigned a queue specifically for this worker. However, I want to improve the flexibility of resource allocation and task management. Specifically, I would like to achieve the following setup:

Three Workers on the Same Machine:

One worker assigned to GPU 0.
One worker assigned to GPU 1.
One worker that can utilize both GPUs.

Conditional Queues:

Use queues that can check GPU availability or whether a task is already running on the GPUs before enqueuing new tasks.
Additionally, I have considered creating a service-task that checks GPU availability before enqueuing tasks into the appropriate queues, potentially managing this with a fourth queue.

Steps to Reproduce

Set up a machine with 2 GPUs.
Install and configure ClearML with one worker that uses both GPUs.
Attempt to create three separate workers with the described GPU assignments.
Explore the possibility of setting up conditional queues or a service-task to manage GPU availability.

Expected Behavior

The system should be able to dynamically assign tasks to the appropriate worker based on GPU availability:

If GPU 0 is free, assign the task to the worker using GPU 0.
If GPU 1 is free, assign the task to the worker using GPU 1.
If both GPUs are free, assign the task to the worker using both GPUs.
If a task is enqueued on both-gpus queue and the other workes are running a task,wait for them to wait and make sure they don´t start a new task

Actual Behavior

Currently, there is only one worker that utilizes both GPUs, which limits the flexibility in task management and GPU utilization.

Environment

OS: Ubuntu 22.04.4 LTS
ClearML Version: WebApp: 1.15.0-472 • Server: 1.15.0-472 • API: 2.29
GPU: 2 x Tesla V100-FHHL-16GB

Thank you for your assistance!

@jabuarab jabuarab changed the title Dynamic GPU Allocation for Workers in ClearML Dynamic GPU/Queue Allocation for Workers in ClearML Jun 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant