Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document different timeout types #4083

Open
wdbaruni opened this issue Jun 14, 2024 · 0 comments
Open

Document different timeout types #4083

wdbaruni opened this issue Jun 14, 2024 · 0 comments
Assignees
Labels
th/documentation Theme: Related to documentation, including tutorials and API docs

Comments

@wdbaruni
Copy link
Member

ExecutionTimeout is the time a single execution should take and that an execution should be failed it takes longer. When an execution fails in one node, even due to ExecutionTimeout, we can retry on another node. ExecutionTimeout will be reset when an execution is rescheduled on another node

TotalTimeout covers the time end to end from when the job was submitted. So it includes all executions, retries, and also the time job spent being scheduled.

If a user only defines ExecutionTimeout, then queueing is not enabled, and we only fail executions due to timeouts and not the job itself. We will fail a job if it exhausted all of its retries. Users mainly define ExecutionTimeout if they want to preserve resources and avoid allocating compute resources for a job more than what they should've

If a user only defines QueueTimeout, then queueing is enabled and a job/execution can run indefinitely in a compute node until it completes and won't be interrupted by bacalhau

If a user only defines TotalTimeout, then queueing is disabled, an execution and the job will be marked as failed after the timeout and there won't be room to retry on another node.

I expect a combination of QueueTimeout and TotalTimeout to make more sense where TotalTimeout needs to be higher than QueueTimeout, and the option ExecutionTimeout to only make sense to power users who want more control

Reference:

@wdbaruni wdbaruni added the th/documentation Theme: Related to documentation, including tutorials and API docs label Jun 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
th/documentation Theme: Related to documentation, including tutorials and API docs
Projects
Status: Next
Development

No branches or pull requests

2 participants