Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AMF-991: Support for max.connection.duration.ms #483

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

niteshmor
Copy link
Contributor

@niteshmor niteshmor commented May 30, 2024

For situations where a number of instances of a rest-utils application are behind an NLB, long running client connections (especially for HTTP 2.0) can cause uneven load distribution.

This pull request adds support for terminating long running client connections after a specified duration passed in as a config item.

Add support for terminating long running client connections
after a specified duration.
Copy link

cla-assistant bot commented May 30, 2024

CLA assistant check
All committers have signed the CLA.

@niteshmor niteshmor marked this pull request as ready for review May 30, 2024 08:58
@niteshmor niteshmor requested review from a team as code owners May 30, 2024 08:58
@@ -478,6 +478,13 @@ public class RestConfig extends AbstractConfig {
+ "If the limit is set to a non-positive number, no limit is applied. Default is 0.";
private static final int SERVER_CONNECTION_LIMIT_DEFAULT = 0;

public static final String MAX_CONNECTION_DURATION_MS = "max.connection.duration.ms";
public static final String MAX_CONNECTION_DURATION_MS_DOC =
"The maximum duration in milliseconds that a connection can be open. "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth clarifying this is the TCP connection the requests come over, to help differentiate this config from

"Length of time, in ms, to allow the request to run. Default is 30000L.";
?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Member

@msn-tldr msn-tldr May 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CMIIW. As i understand, this config only applies for connections that are not-idle, i.e i.e. receiving http requests. So should this be renamed to MAX_NON_IDLE_TCP_CONNECTION_DURATION_MS ?

And add to the comment that for idle connections MAX_IDLE_TIMEOUT_MS should be configured appropriately. Wdyt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good points on documentation update. Will fix.

On the naming of the config item, MAX_NON_IDLE_TCP_CONNECTION_DURATION_MS could be confusing -- one might wonder whether the counter start when the connection is non idle and gets reset for idle connection, and all sorts of similar things. The config is an absolute maximum limit for how long a single connection is allowed to be open, regardless of whether that's idle or not.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The config is an absolute maximum limit for how long a single connection is allowed to be open, regardless of whether that's idle or not.

is this correct though?

Jetty Filter is run on a request, or a response. So if connection is idle, i.e. no request or response, this filter won't run. So only for non-idle connection this config would work. That's why i suggested above(quoted below) that for idle connections MAX_IDLE_TIMEOUT_MS should be set.

And add to the comment that for idle connections MAX_IDLE_TIMEOUT_MS should be configured appropriately.

re. naming, I am ok with the current name as naming is hard 😃 . As long the comment makes the right behaviour explicit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, you are right on the caveat of idle connections. My narrow-minded focus for adding this feature was specifically for very active connections that last days/weeks.
A combination of "idle timeout" + this "max connection duration (reworded differently)" should go hand-in-hand for any config tuning.
I will update the documentation to reflect that.

public class ConnectionDurationFilter implements Filter {

private static final Logger log = LoggerFactory.getLogger(ConnectionDurationFilter.class);
private static long MAX_CONNECTION_DURATION_MS = -1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a reason why MAX_CONNECTION_DURATION_MS is static?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo. will fix.
thanks for pointing out.


HttpServletResponse httpServletResponse = (HttpServletResponse) servletResponse;

HttpChannel channel = ((Response) httpServletResponse).getHttpChannel();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we use several unchecked cast here, are they all safe?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. The typecasts in general should be safe given that rest-utils is specifically targeting the HTTP request serving, but I will add some checks just to be on the safe side and to avoid any issues in the future.

if (connectionAge > MAX_CONNECTION_DURATION_MS) {
log.debug("Connection from remote peer {} has been active for {}ms. Closing the connection.",
channel.getRemoteAddress(), connectionAge);
channel.getEndPoint().close();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we also use the filterChain to stop processing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I know how to do it. The "don't keep processing the request" sounds reasonable (which is what I think you are referring to), but unfortunately Filterchain only exposes a doFilter method.
Do you have some suggestion on how to go about it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can do that by not calling filterChain.doFilter, i.e. return early in if block at line 68.

new MyAddressConfig(
ImmutableMap.of(
"listeners", "http://localhost:8080",
"http2.enabled", "true",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this applicable for http1.1?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes.
"http2.enabled" simply says whether http 2.0 is supported or not.
http 1 is supported by default

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we make the tests parameterized on http1.1 and http2

for (int i=0; i<5; i++) {

HttpGet request = createRequest(server.getURI());
assertEquals("HTTP/1.1", request.getProtocolVersion().toString());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why the request is http1.1 when your server has http2 enabled?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a limitation of the Apache http client.
HTTP client version 4.x only supports http 1.x.
HTTP 2.x support was added in version 5.x (see here).

Using 5.x http client would require a major version upgrade of that package -- there are a number of major changes (including relocated packages, etc) making 4.x -> 5.x migration non-trivial (probably a separate future pull request on its own).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have jetty http client code example across this repo. You might want to take a look.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really see the test where the client connection is terminated due to exceeding max duration, could you please give some pointeR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current tests echo back the client address (IP address + client-port), which is a proxy for the connection itself.
If the same connection is reused, then the (IP address + client-port) pair stays the same across the requests.
If a new connection is established, the client-port changes.
The current tests try to make a number of calls with a delay longer than the "max connection duration", and assert that a new (IP address + client-port) was used for each of them.

I now see how that is somewhat obtuse, and all this explanation probably needs to make it to the test itself as a comment. What do you think? Does that make sense and address the need for the tests?

On why I had to choose this complicated roundabout way of verifying that the connections are terminated -- http clients (both apache http client and okhttp client) don't really have any public interface to give socket level information back to the user. Both of them like to use internal mechanisms to maintain connection pools or attempt retries without necessarily exposing what's happening at the connection level back to the user. That's where I ended up conjuring up a server application that echoes back the (client IP + client-port) pair.

I should also add a case when multiple requests are made all within the "max connection duration", and show that they indeed are using the same connection under the hood.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merging the other comment https://github.com/confluentinc/rest-utils/pull/483/files#r1621942383 here as they are similar.

I now see how that is somewhat obtuse, and all this explanation probably needs to make it to the test itself as a comment.

+1

Both of them like to use internal mechanisms to maintain connection pools or attempt retries without necessarily exposing what's happening at the connection level back to the user.

Is it possible to set the http retries to 0 in http-clients? So that client only sends HTTP request once, max_duration exceeds -> connection closed, and request fails. As retries are 0, the error/exception is passed back to the test.

It would be gr8 to have the behaviour(connection disconnects -> test sees error) explicit in test, as that is much easier to reason in future.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can test this indirectly with client sending multiple requests for a span of time, and since the connection exceed the max time, the remaining requests will not complete.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah setting retries =>0 sounds like a good idea.
Let me rework the tests a bit incorporating all the feedback, thank you for all the suggestions.

I also realized that the "ephemeral client" is not really an ephemeral client. This snippet of code that I got from some other tests in the repo doesn't actually create ephemeral clients. Setting that to "-1" essentially means "No suggestion on keepalive strategy" (ref). 🤦

private static CloseableHttpClient createEphemeralClient() {
    return HttpClients.custom()
        .setConnectionManager(new BasicHttpClientConnectionManager())
        .setKeepAliveStrategy((httpResponse, httpContext) -> -1)
        .build();
  }

@@ -478,6 +478,13 @@ public class RestConfig extends AbstractConfig {
+ "If the limit is set to a non-positive number, no limit is applied. Default is 0.";
private static final int SERVER_CONNECTION_LIMIT_DEFAULT = 0;

public static final String MAX_CONNECTION_DURATION_MS = "max.connection.duration.ms";
public static final String MAX_CONNECTION_DURATION_MS_DOC =
"The maximum duration in milliseconds that a connection can be open. "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

*
*
* <p>Long-running client connections can be problematic for multiple server instances behind
* an NLB due to uneven load distribution (especially in case of HTTP/2.0 that specifically
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even HTTP 1.1 uses Persistent connections by default. So this could be helpful there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point. will clarify with a documentation update


HttpServletResponse httpServletResponse = (HttpServletResponse) servletResponse;

HttpChannel channel = ((Response) httpServletResponse).getHttpChannel();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@@ -478,6 +478,13 @@ public class RestConfig extends AbstractConfig {
+ "If the limit is set to a non-positive number, no limit is applied. Default is 0.";
private static final int SERVER_CONNECTION_LIMIT_DEFAULT = 0;

public static final String MAX_CONNECTION_DURATION_MS = "max.connection.duration.ms";
public static final String MAX_CONNECTION_DURATION_MS_DOC =
"The maximum duration in milliseconds that a connection can be open. "
Copy link
Member

@msn-tldr msn-tldr May 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CMIIW. As i understand, this config only applies for connections that are not-idle, i.e i.e. receiving http requests. So should this be renamed to MAX_NON_IDLE_TCP_CONNECTION_DURATION_MS ?

And add to the comment that for idle connections MAX_IDLE_TIMEOUT_MS should be configured appropriately. Wdyt?

@Test
public void connectionDurationTest_ApacheHttpClient_Ephemeral_HTTP1() throws Exception {

CloseableHttpClient client = createPersistentClient();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be creating an ephemeral client given the test is called ephemeral? Naming should be consistent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies. It's a typo. will filx

@Test
public void connectionDurationTest_ApacheHttpClient_Persistent_HTTP1() throws Exception {

CloseableHttpClient client = createEphemeralClient();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming should be consistent, similar to comment on line 83.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies. It's a typo. will filx


response.close();
addressList.add(responseBody);
Thread.sleep(MAX_CONNECTION_DURATION_MS + 10);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going by the test name, is this test creating "persistent" tcp connections? So should the request fail given the connection remains idle for longer than MAX_CONNECTION_DURATION_MS

@niteshmor
Copy link
Contributor Author

Just providing a quick update for reviewers on this -- I tried to use the jetty http client for some of the tests, and it seems like that client is not happy with the current approach (which is a good enough indication that there might be other clients in other languages exhibiting the same behavior).

Specifically, the current approach of using a request filter terminates the connection after the server has ack-ed (at the TCP level) that it received a request (this is for http 1.1). That causes EOF exceptions in the jetty http client because it is very much hoping to get a response back for a request it was able to successfully write on the socket.

Just for testing purposes, I tried to terminate the connection asynchronously a few milliseconds after the response was returned, and the jetty http client is much more happier and accommodating of the fact that the connection was already closed (i.e. socket writing failed) when it tried to send a subsequent request and automatically opens a new connection under the hood.

I am checking on how I can rework the solution (maybe in this PR or a separate one, depending on how drastically the changes look). Will post an update when I have something reasonably workable.

@trnguyencflt
Copy link
Member

@niteshmor thanks for the update 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants