Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ISSUE-5705] Support for degraded health #5741

Open
wants to merge 17 commits into
base: main
Choose a base branch
from

Conversation

seonWKim
Copy link
Contributor

@seonWKim seonWKim commented Jun 8, 2024

Motivation:

Add support for degraded health.

Modifications:

  • Add HealthStatus enum which can represent the health status of a server
  • Add healthStatus() method in HealthChecker for fine-grained representation of server's status
  • Add degradedResponse which shows that the server's status is being degraded.

I think adding new health check service which returns HealthStatus(instead of boolean) might be an alternate option. But because this PR is before any reviews, I've just implemented feature on top of HealthCheckService.

Result:

Users can now receive notification for degraded server status like below

{"healthy":true, "degraded":true}

@seonWKim seonWKim force-pushed the feature/ISSUE-5705-degradable-health-checker branch from f20ed57 to 665ec13 Compare June 8, 2024 08:20
Copy link
Member

@trustin trustin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @seonWKim! 🙇

@seonWKim seonWKim force-pushed the feature/ISSUE-5705-degradable-health-checker branch from 52fdd7b to aed0a93 Compare June 14, 2024 01:24
Copy link
Contributor

@jrhee17 jrhee17 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall 👍 I think some client-side changes will be needed, but I prefer I do this to ensure compatibility with xDS-side requirements. Left some minor comments 🙇

final boolean useLongPolling;
if (longPollingTimeoutMillis > 0) {
final String expectedState =
Ascii.toLowerCase(req.headers().get(HttpHeaderNames.IF_NONE_MATCH, ""));
if ("\"healthy\"".equals(expectedState) || "w/\"healthy\"".equals(expectedState)) {
useLongPolling = isHealthy;
useLongPolling = healthStatus == HealthStatus.HEALTHY;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: this change means if HEALTHY or UNHEALTHY isn't set, then long polling never works since armeria client only sends healthy or unhealthy depending on the status code.

headers = builder.add(HttpHeaderNames.IF_NONE_MATCH,
wasHealthy ? "\"healthy\"" : "\"unhealthy\"")

Copy link
Contributor Author

@seonWKim seonWKim Jun 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see. Then, I think we can change the logic as follows:

When the expectedState is

  • healthy: use long polling when the status is HEALTHY or DEGRADED
  • unhealthy: use long polling when the status is UNHEALTHY, UNDER_MAINTENANCE, STOPPING

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understood the objective of this issue to propagate the status to clients.
If we only maintain expectedState, I think clients may not receive the status on update for long polling health check requests.

If we would like to maintain backwards compatibility, we could define a separate syntax for status matching.

e.g.

  • If-None-Match=[healthy|unhealthy]: propagates a signal whenever a status changes between healthy <-> unhealthy
  • If-None-Match=status=[HEALTHY|DEGRADED|UNHEALTHY|UNDER_MAINTENANCE|STOPPING]: propagates a signal when the status does not match the status exactly

Copy link
Contributor Author

@seonWKim seonWKim Jun 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added If-None-Match=status=[HEALTHY|DEGRADED|UNHEALTHY|UNDER_MAINTENANCE|STOPPING] and preserved If-None-Match=[healthy|unhealthy] in order to maintain backward compatibility.

  • When If-None-Match=[healthy|unhealthy] is present, return a response when the server becomes healthy(HEALTHY, DEGRADED) or unhealthy(STOPPING, UNHEALTHY, UNDER_MAINTENANCE)
  • When If-None-Match=status=[HEALTHY|DEGRADED|UNHEALTHY|UNDER_MAINTENANCE|STOPPING] is present, return a response when the server's HealthStatus is changed.

@@ -435,24 +435,35 @@ public HttpResponse serve(ServiceRequestContext ctx, HttpRequest req) throws Exc
if (updateResult != null) {
switch (updateResult) {
case HEALTHY:
serverHealth.setHealthy(true);
serverHealth.setHealthStatus(HealthStatus.HEALTHY);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: If the default HealthCheckUpdateHandler is used, only HEALTHY and UNHEALTHY will be returned.

I think users would find it surprising that the server is aware of options such as DEGRADED, etc.. but doesn't understand update semantics of such statuses

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh you're right. What do you think of adding an implementation of HealthCheckUpdateHandler to handle new status? 🤔

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added HealthStatusUpdateHandler 😄

*
* @return {@code this}
*/
public HealthCheckServiceBuilder degradedResponse(AggregatedHttpResponse degradedResponse) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: from the perspective of health checking in envoy, just sending a x-envoy-degraded header to denote an endpoint is probably enough. Checked that this won't interfere with ongoing xDS related changes.

Copy link
Contributor Author

@seonWKim seonWKim Jun 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems envoy use headers to denote the specific status. What do you think would be more appropriate for this case to denote specific status? The headers or status field in the json?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine to let users customize the response for now - later depending on requirements we may just introduce a separate EnvoyHealthCheckService.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably should do some research on the behavior of well known server implementations and make the default response covers them all, as a follow-up to this PR.

"{\"healthy\":false}");
private AggregatedHttpResponse healthyResponse =
AggregatedHttpResponse.of(HttpStatus.OK, MediaType.JSON_UTF_8,
"{\"healthy\":true,\"status\":\"HEALTHY\"}");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: In case this point was missed, armeria client is ignoring the body at the moment and determining health status via status code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh right. HttpHealthChecker only checks the status code. Should we handle the changes in this PR as well?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unsure of how we want to expose health check information on the client-side at the moment. From an xDS perspective, other signals such as 1) the ratio of failed health checks 2) time since last health check 3) etc.. are exposed.

I think it's better to focus on implementing backwards compatible server-side changes in this PR, and client-side changes are dealt with separately (after a consensus is made on the API direction).

Copy link
Member

@trustin trustin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost there!

* Handler which updates the healthiness of the {@link Server}. Supports {@code PUT}, {@code POST} and
* {@code PATCH} requests and tells if the {@link Server} needs to be marked as healthy or unhealthy.
*/
public enum DefaultHealthCheckUpdateHandler implements HealthCheckUpdateHandler {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the access modifier to public in order to let users select their update handler(DefaultHealthCheckUpdateHandlerorHealthStatusUpdateHandler`)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question) what do you think of just combining the two HealthCheckUpdateHandler and adding similar logic as HealthCheckService?

e.g.

if jsonNode.contains("healthy") {
  // previous logic
} else if jsonNode.contains("status") {
  // update to the status
} else {
  // 400
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once done, please don't forget to make this class package-private again.

@seonWKim seonWKim force-pushed the feature/ISSUE-5705-degradable-health-checker branch from 1548102 to 487997f Compare June 22, 2024 02:32
@seonWKim
Copy link
Contributor Author

Removed the [ISSUE-5741] prefix in the commit messages.

Copy link
Contributor

@jrhee17 jrhee17 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall 👍 Left a minor suggestion

* Handler which updates the healthiness of the {@link Server}. Supports {@code PUT}, {@code POST} and
* {@code PATCH} requests and tells if the {@link Server} needs to be marked as healthy or unhealthy.
*/
public enum DefaultHealthCheckUpdateHandler implements HealthCheckUpdateHandler {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question) what do you think of just combining the two HealthCheckUpdateHandler and adding similar logic as HealthCheckService?

e.g.

if jsonNode.contains("healthy") {
  // previous logic
} else if jsonNode.contains("status") {
  // update to the status
} else {
  // 400
}

@seonWKim
Copy link
Contributor Author

Question) what do you think of just combining the two HealthCheckUpdateHandler and adding similar logic as HealthCheckService?

Looks better. I don't think we need to split them into different classes. 👍

@jrhee17 jrhee17 added this to the 1.30.0 milestone Jun 26, 2024
Copy link
Contributor

@jrhee17 jrhee17 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably the last of my comments.
Please also address https://github.com/line/armeria/pull/5741/files#r1651918233 🙏

@@ -46,22 +44,45 @@ public SettableHealthChecker() {
* changed using {@link #setHealthy(boolean)}.
*/
public SettableHealthChecker(boolean isHealthy) {
this.isHealthy = new AtomicBoolean(isHealthy);
this.healthStatus = isHealthy ? HealthStatus.HEALTHY : HealthStatus.UNHEALTHY;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
this.healthStatus = isHealthy ? HealthStatus.HEALTHY : HealthStatus.UNHEALTHY;
healthStatus = isHealthy ? HealthStatus.HEALTHY : HealthStatus.UNHEALTHY;

* Returns the {@link HttpStatus} representing the health status of the {@link Server}.
* Override below method if you want more fine-grained health status.
*/
default HealthStatus healthStatus() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you modify that ScheduledHealthChecker also uses the healthStatus API?

return new CpuHealthChecker(targetSystemCpuUsage, targetProcessCpuUsage,
degradedTargetSystemCpuUsage, degradedTargetProcessCpuLoad);
}

/**
* Returns {@code true} if and only if the {@link Server} is healthy.
*/
boolean isHealthy();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: I imagine that some time in the future we may likely end up deprecating this API since we're only using HealthChecker#healthStatus internally in HealthCheckService.

I prefer this be handled separately though if necessary.

Copy link
Contributor

@jrhee17 jrhee17 Jun 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Meanwhile, can you go through the code in armeria and convert any usages using this API to use healthStatus instead?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support for degraded health
3 participants