Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: Continuous health checks on services with reactions like pod-replacements #1349

Open
devidw opened this issue Jun 21, 2024 · 1 comment
Labels

Comments

@devidw
Copy link

devidw commented Jun 21, 2024

Problem

The only reliable way to know that a service is healthy is to test it aka perform a health check that does a minimal processing of task it should do

Since this is a end to end test, its a good indicator that the service is really healthy and can take load

If there is something in the network, the health check will fail, which for example is not accounted for, by standard restart policies on container-exit

Solution

Because of this it would be extremely helpful to have health check support in dstack, and then have configuration options how to react to those changes

In order to react, it would be helpful to have a config option to set how many failures we want to consider a unhealthy, for example 3 failed ones

Then one reaction could be to try to restart the pod

Another reaction could be to remove the pod and replace it with a new one

Basically the idea is to always ensure the configured number of replicas is really healthy

Workaround

https://github.com/devidw/gingo created this to perform health checks and then perform pod restarts/adding/removing based on the health status of pods in a configured cluster

can be extended by writing other connectors, currently just has a runpod one

Would you like to help us implement this feature by sending a PR?

No

@devidw devidw added the feature label Jun 21, 2024
@peterschmidt85
Copy link
Contributor

@devidw, gonna discuss this with the team next week and get back to you with an update on when we can support this. Stay tuned.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants