-
Notifications
You must be signed in to change notification settings - Fork 900
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ISSUE-5705] Support for degraded health #5741
base: main
Are you sure you want to change the base?
[ISSUE-5705] Support for degraded health #5741
Conversation
f20ed57
to
665ec13
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @seonWKim! 🙇
core/src/main/java/com/linecorp/armeria/server/healthcheck/HealthCheckService.java
Outdated
Show resolved
Hide resolved
core/src/main/java/com/linecorp/armeria/server/healthcheck/HealthCheckServiceBuilder.java
Outdated
Show resolved
Hide resolved
core/src/main/java/com/linecorp/armeria/server/healthcheck/HealthStatus.java
Show resolved
Hide resolved
core/src/main/java/com/linecorp/armeria/server/healthcheck/SettableHealthStatusChecker.java
Outdated
Show resolved
Hide resolved
core/src/main/java/com/linecorp/armeria/server/healthcheck/HealthCheckServiceBuilder.java
Outdated
Show resolved
Hide resolved
core/src/main/java/com/linecorp/armeria/server/healthcheck/HealthCheckService.java
Show resolved
Hide resolved
52fdd7b
to
aed0a93
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall 👍 I think some client-side changes will be needed, but I prefer I do this to ensure compatibility with xDS-side requirements. Left some minor comments 🙇
core/src/main/java/com/linecorp/armeria/server/healthcheck/HealthStatus.java
Show resolved
Hide resolved
final boolean useLongPolling; | ||
if (longPollingTimeoutMillis > 0) { | ||
final String expectedState = | ||
Ascii.toLowerCase(req.headers().get(HttpHeaderNames.IF_NONE_MATCH, "")); | ||
if ("\"healthy\"".equals(expectedState) || "w/\"healthy\"".equals(expectedState)) { | ||
useLongPolling = isHealthy; | ||
useLongPolling = healthStatus == HealthStatus.HEALTHY; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: this change means if HEALTHY
or UNHEALTHY
isn't set, then long polling never works since armeria client only sends healthy
or unhealthy
depending on the status code.
armeria/core/src/main/java/com/linecorp/armeria/client/endpoint/healthcheck/HttpHealthChecker.java
Lines 105 to 106 in 579a6d0
headers = builder.add(HttpHeaderNames.IF_NONE_MATCH, | |
wasHealthy ? "\"healthy\"" : "\"unhealthy\"") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I see. Then, I think we can change the logic as follows:
When the expectedState
is
healthy
: use long polling when the status isHEALTHY
orDEGRADED
unhealthy
: use long polling when the status isUNHEALTHY
,UNDER_MAINTENANCE
,STOPPING
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understood the objective of this issue to propagate the status to clients.
If we only maintain expectedState
, I think clients may not receive the status on update for long polling health check requests.
If we would like to maintain backwards compatibility, we could define a separate syntax for status matching.
e.g.
If-None-Match=[healthy|unhealthy]
: propagates a signal whenever a status changes between healthy <-> unhealthyIf-None-Match=status=[HEALTHY|DEGRADED|UNHEALTHY|UNDER_MAINTENANCE|STOPPING]
: propagates a signal when the status does not match the status exactly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added If-None-Match=status=[HEALTHY|DEGRADED|UNHEALTHY|UNDER_MAINTENANCE|STOPPING]
and preserved If-None-Match=[healthy|unhealthy]
in order to maintain backward compatibility.
- When
If-None-Match=[healthy|unhealthy]
is present, return a response when the server becomeshealthy
(HEALTHY
,DEGRADED
) orunhealthy
(STOPPING
,UNHEALTHY
,UNDER_MAINTENANCE
) - When
If-None-Match=status=[HEALTHY|DEGRADED|UNHEALTHY|UNDER_MAINTENANCE|STOPPING]
is present, return a response when the server'sHealthStatus
is changed.
@@ -435,24 +435,35 @@ public HttpResponse serve(ServiceRequestContext ctx, HttpRequest req) throws Exc | |||
if (updateResult != null) { | |||
switch (updateResult) { | |||
case HEALTHY: | |||
serverHealth.setHealthy(true); | |||
serverHealth.setHealthStatus(HealthStatus.HEALTHY); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: If the default HealthCheckUpdateHandler
is used, only HEALTHY
and UNHEALTHY
will be returned.
I think users would find it surprising that the server is aware of options such as DEGRADED
, etc.. but doesn't understand update semantics of such statuses
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh you're right. What do you think of adding an implementation of HealthCheckUpdateHandler
to handle new status? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added HealthStatusUpdateHandler
😄
* | ||
* @return {@code this} | ||
*/ | ||
public HealthCheckServiceBuilder degradedResponse(AggregatedHttpResponse degradedResponse) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: from the perspective of health checking in envoy, just sending a x-envoy-degraded
header to denote an endpoint is probably enough. Checked that this won't interfere with ongoing xDS related changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems envoy use headers to denote the specific status. What do you think would be more appropriate for this case to denote specific status? The headers or status field in the json?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's fine to let users customize the response for now - later depending on requirements we may just introduce a separate EnvoyHealthCheckService
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably should do some research on the behavior of well known server implementations and make the default response covers them all, as a follow-up to this PR.
"{\"healthy\":false}"); | ||
private AggregatedHttpResponse healthyResponse = | ||
AggregatedHttpResponse.of(HttpStatus.OK, MediaType.JSON_UTF_8, | ||
"{\"healthy\":true,\"status\":\"HEALTHY\"}"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: In case this point was missed, armeria client is ignoring the body at the moment and determining health status via status code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh right. HttpHealthChecker
only checks the status code. Should we handle the changes in this PR as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm unsure of how we want to expose health check information on the client-side at the moment. From an xDS perspective, other signals such as 1) the ratio of failed health checks 2) time since last health check 3) etc.. are exposed.
I think it's better to focus on implementing backwards compatible server-side changes in this PR, and client-side changes are dealt with separately (after a consensus is made on the API direction).
core/src/main/java/com/linecorp/armeria/server/healthcheck/HealthCheckService.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Almost there!
core/src/main/java/com/linecorp/armeria/server/healthcheck/HealthStatus.java
Outdated
Show resolved
Hide resolved
core/src/main/java/com/linecorp/armeria/server/healthcheck/HealthStatus.java
Outdated
Show resolved
Hide resolved
* Handler which updates the healthiness of the {@link Server}. Supports {@code PUT}, {@code POST} and | ||
* {@code PATCH} requests and tells if the {@link Server} needs to be marked as healthy or unhealthy. | ||
*/ | ||
public enum DefaultHealthCheckUpdateHandler implements HealthCheckUpdateHandler { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed the access modifier to public in order to let users select their update handler(
DefaultHealthCheckUpdateHandleror
HealthStatusUpdateHandler`)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question) what do you think of just combining the two HealthCheckUpdateHandler
and adding similar logic as HealthCheckService
?
e.g.
if jsonNode.contains("healthy") {
// previous logic
} else if jsonNode.contains("status") {
// update to the status
} else {
// 400
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once done, please don't forget to make this class package-private again.
…bleHealthChecker`
Co-authored-by: jrhee17 <[email protected]>
…cify HealthStatus
- Add javadoc on `HealthStatus` - Checks for IF-NONE-MATCH header for specific `HealthStatus` - Add `HealthStatusUpdateHandler`
1548102
to
487997f
Compare
Removed the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall 👍 Left a minor suggestion
* Handler which updates the healthiness of the {@link Server}. Supports {@code PUT}, {@code POST} and | ||
* {@code PATCH} requests and tells if the {@link Server} needs to be marked as healthy or unhealthy. | ||
*/ | ||
public enum DefaultHealthCheckUpdateHandler implements HealthCheckUpdateHandler { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question) what do you think of just combining the two HealthCheckUpdateHandler
and adding similar logic as HealthCheckService
?
e.g.
if jsonNode.contains("healthy") {
// previous logic
} else if jsonNode.contains("status") {
// update to the status
} else {
// 400
}
Looks better. I don't think we need to split them into different classes. 👍 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is probably the last of my comments.
Please also address https://github.com/line/armeria/pull/5741/files#r1651918233 🙏
@@ -46,22 +44,45 @@ public SettableHealthChecker() { | |||
* changed using {@link #setHealthy(boolean)}. | |||
*/ | |||
public SettableHealthChecker(boolean isHealthy) { | |||
this.isHealthy = new AtomicBoolean(isHealthy); | |||
this.healthStatus = isHealthy ? HealthStatus.HEALTHY : HealthStatus.UNHEALTHY; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this.healthStatus = isHealthy ? HealthStatus.HEALTHY : HealthStatus.UNHEALTHY; | |
healthStatus = isHealthy ? HealthStatus.HEALTHY : HealthStatus.UNHEALTHY; |
* Returns the {@link HttpStatus} representing the health status of the {@link Server}. | ||
* Override below method if you want more fine-grained health status. | ||
*/ | ||
default HealthStatus healthStatus() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you modify that ScheduledHealthChecker
also uses the healthStatus
API?
return new CpuHealthChecker(targetSystemCpuUsage, targetProcessCpuUsage, | ||
degradedTargetSystemCpuUsage, degradedTargetProcessCpuLoad); | ||
} | ||
|
||
/** | ||
* Returns {@code true} if and only if the {@link Server} is healthy. | ||
*/ | ||
boolean isHealthy(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: I imagine that some time in the future we may likely end up deprecating this API since we're only using HealthChecker#healthStatus
internally in HealthCheckService
.
I prefer this be handled separately though if necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Meanwhile, can you go through the code in armeria and convert any usages using this API to use healthStatus
instead?
Motivation:
Add support for degraded health.
Modifications:
HealthStatus
enum which can represent the health status of a serverhealthStatus()
method inHealthChecker
for fine-grained representation of server's statusdegradedResponse
which shows that the server's status is being degraded.I think adding new health check service which returns
HealthStatus
(instead of boolean) might be an alternate option. But because this PR is before any reviews, I've just implemented feature on top ofHealthCheckService
.Result:
Users can now receive notification for degraded server status like below
degraded
server status in addtion to healthy and unhealthy.