AMF-991: Support for max.connection.duration.ms #483

niteshmor · 2024-05-30T00:53:41Z

For situations where a number of instances of a rest-utils application are behind an NLB, long running client connections (especially for HTTP 2.0) can cause uneven load distribution.

This pull request adds support for terminating long running client connections after a specified duration passed in as a config item.

Add support for terminating long running client connections after a specified duration.

cla-assistant · 2024-05-30T00:53:54Z

All committers have signed the CLA.

ehumber · 2024-05-30T10:41:01Z

core/src/main/java/io/confluent/rest/RestConfig.java

@@ -478,6 +478,13 @@ public class RestConfig extends AbstractConfig {
          + "If the limit is set to a non-positive number, no limit is applied. Default is 0.";
  private static final int SERVER_CONNECTION_LIMIT_DEFAULT = 0;

+  public static final String MAX_CONNECTION_DURATION_MS = "max.connection.duration.ms";
+  public static final String MAX_CONNECTION_DURATION_MS_DOC =
+      "The maximum duration in milliseconds that a connection can be open. "


Is it worth clarifying this is the TCP connection the requests come over, to help differentiate this config from

rest-utils/core/src/main/java/io/confluent/rest/RestConfig.java

Line 444 in a04ed80

"Length of time, in ms, to allow the request to run. Default is 30000L.";

?

CMIIW. As i understand, this config only applies for connections that are not-idle, i.e i.e. receiving http requests. So should this be renamed to MAX_NON_IDLE_TCP_CONNECTION_DURATION_MS ?

And add to the comment that for idle connections MAX_IDLE_TIMEOUT_MS should be configured appropriately. Wdyt?

Good points on documentation update. Will fix.

On the naming of the config item, MAX_NON_IDLE_TCP_CONNECTION_DURATION_MS could be confusing -- one might wonder whether the counter start when the connection is non idle and gets reset for idle connection, and all sorts of similar things. The config is an absolute maximum limit for how long a single connection is allowed to be open, regardless of whether that's idle or not.

The config is an absolute maximum limit for how long a single connection is allowed to be open, regardless of whether that's idle or not.

is this correct though?

Jetty Filter is run on a request, or a response. So if connection is idle, i.e. no request or response, this filter won't run. So only for non-idle connection this config would work. That's why i suggested above(quoted below) that for idle connections MAX_IDLE_TIMEOUT_MS should be set.

And add to the comment that for idle connections MAX_IDLE_TIMEOUT_MS should be configured appropriately.

re. naming, I am ok with the current name as naming is hard 😃 . As long the comment makes the right behaviour explicit.

Ah yes, you are right on the caveat of idle connections. My narrow-minded focus for adding this feature was specifically for very active connections that last days/weeks.
A combination of "idle timeout" + this "max connection duration (reworded differently)" should go hand-in-hand for any config tuning.
I will update the documentation to reflect that.

trnguyencflt · 2024-05-30T10:50:33Z

core/src/main/java/io/confluent/rest/filters/ConnectionDurationFilter.java

+public class ConnectionDurationFilter implements Filter {
+
+  private static final Logger log = LoggerFactory.getLogger(ConnectionDurationFilter.class);
+  private static long MAX_CONNECTION_DURATION_MS = -1;


is there a reason why MAX_CONNECTION_DURATION_MS is static?

Typo. will fix.
thanks for pointing out.

trnguyencflt · 2024-05-30T10:51:26Z

core/src/main/java/io/confluent/rest/filters/ConnectionDurationFilter.java

+
+    HttpServletResponse httpServletResponse = (HttpServletResponse) servletResponse;
+
+    HttpChannel channel = ((Response) httpServletResponse).getHttpChannel();


we use several unchecked cast here, are they all safe?

Good point. The typecasts in general should be safe given that rest-utils is specifically targeting the HTTP request serving, but I will add some checks just to be on the safe side and to avoid any issues in the future.

trnguyencflt · 2024-05-30T10:53:44Z

core/src/main/java/io/confluent/rest/filters/ConnectionDurationFilter.java

+    if (connectionAge > MAX_CONNECTION_DURATION_MS) {
+      log.debug("Connection from remote peer {} has been active for {}ms. Closing the connection.",
+          channel.getRemoteAddress(), connectionAge);
+      channel.getEndPoint().close();


should we also use the filterChain to stop processing?

Not sure I know how to do it. The "don't keep processing the request" sounds reasonable (which is what I think you are referring to), but unfortunately Filterchain only exposes a doFilter method.
Do you have some suggestion on how to go about it?

You can do that by not calling filterChain.doFilter, i.e. return early in if block at line 68.

trnguyencflt · 2024-05-30T10:54:09Z

core/src/test/java/io/confluent/rest/ConnectionDurationFilterTest.java

+            new MyAddressConfig(
+                ImmutableMap.of(
+                    "listeners", "http://localhost:8080",
+                    "http2.enabled", "true",


is this applicable for http1.1?

yes.
"http2.enabled" simply says whether http 2.0 is supported or not.
http 1 is supported by default

shall we make the tests parameterized on http1.1 and http2

trnguyencflt · 2024-05-30T10:55:29Z

core/src/test/java/io/confluent/rest/ConnectionDurationFilterTest.java

+    for (int i=0; i<5; i++) {
+
+      HttpGet request = createRequest(server.getURI());
+      assertEquals("HTTP/1.1", request.getProtocolVersion().toString());


why the request is http1.1 when your server has http2 enabled?

That's a limitation of the Apache http client.
HTTP client version 4.x only supports http 1.x.
HTTP 2.x support was added in version 5.x (see here).

Using 5.x http client would require a major version upgrade of that package -- there are a number of major changes (including relocated packages, etc) making 4.x -> 5.x migration non-trivial (probably a separate future pull request on its own).

I think we have jetty http client code example across this repo. You might want to take a look.

trnguyencflt · 2024-05-30T10:57:22Z

core/src/test/java/io/confluent/rest/ConnectionDurationFilterTest.java

I don't really see the test where the client connection is terminated due to exceeding max duration, could you please give some pointeR?

The current tests echo back the client address (IP address + client-port), which is a proxy for the connection itself.
If the same connection is reused, then the (IP address + client-port) pair stays the same across the requests.
If a new connection is established, the client-port changes.
The current tests try to make a number of calls with a delay longer than the "max connection duration", and assert that a new (IP address + client-port) was used for each of them.

I now see how that is somewhat obtuse, and all this explanation probably needs to make it to the test itself as a comment. What do you think? Does that make sense and address the need for the tests?

On why I had to choose this complicated roundabout way of verifying that the connections are terminated -- http clients (both apache http client and okhttp client) don't really have any public interface to give socket level information back to the user. Both of them like to use internal mechanisms to maintain connection pools or attempt retries without necessarily exposing what's happening at the connection level back to the user. That's where I ended up conjuring up a server application that echoes back the (client IP + client-port) pair.

I should also add a case when multiple requests are made all within the "max connection duration", and show that they indeed are using the same connection under the hood.

Merging the other comment https://github.com/confluentinc/rest-utils/pull/483/files#r1621942383 here as they are similar.

I now see how that is somewhat obtuse, and all this explanation probably needs to make it to the test itself as a comment.

+1

Both of them like to use internal mechanisms to maintain connection pools or attempt retries without necessarily exposing what's happening at the connection level back to the user.

Is it possible to set the http retries to 0 in http-clients? So that client only sends HTTP request once, max_duration exceeds -> connection closed, and request fails. As retries are 0, the error/exception is passed back to the test.

It would be gr8 to have the behaviour(connection disconnects -> test sees error) explicit in test, as that is much easier to reason in future.

I think you can test this indirectly with client sending multiple requests for a span of time, and since the connection exceed the max time, the remaining requests will not complete.

Yeah setting retries =>0 sounds like a good idea.
Let me rework the tests a bit incorporating all the feedback, thank you for all the suggestions.

I also realized that the "ephemeral client" is not really an ephemeral client. This snippet of code that I got from some other tests in the repo doesn't actually create ephemeral clients. Setting that to "-1" essentially means "No suggestion on keepalive strategy" (ref). 🤦

private static CloseableHttpClient createEphemeralClient() { return HttpClients.custom() .setConnectionManager(new BasicHttpClientConnectionManager()) .setKeepAliveStrategy((httpResponse, httpContext) -> -1) .build(); }

msn-tldr · 2024-05-30T10:59:34Z

core/src/main/java/io/confluent/rest/RestConfig.java

@@ -478,6 +478,13 @@ public class RestConfig extends AbstractConfig {
          + "If the limit is set to a non-positive number, no limit is applied. Default is 0.";
  private static final int SERVER_CONNECTION_LIMIT_DEFAULT = 0;

+  public static final String MAX_CONNECTION_DURATION_MS = "max.connection.duration.ms";
+  public static final String MAX_CONNECTION_DURATION_MS_DOC =
+      "The maximum duration in milliseconds that a connection can be open. "


msn-tldr · 2024-05-30T11:00:35Z

core/src/main/java/io/confluent/rest/filters/ConnectionDurationFilter.java

+ *
+ *
+ * <p>Long-running client connections can be problematic for multiple server instances behind
+ * an NLB due to uneven load distribution (especially in case of HTTP/2.0 that specifically


Even HTTP 1.1 uses Persistent connections by default. So this could be helpful there.

good point. will clarify with a documentation update

msn-tldr · 2024-05-30T11:01:48Z

core/src/main/java/io/confluent/rest/filters/ConnectionDurationFilter.java

+
+    HttpServletResponse httpServletResponse = (HttpServletResponse) servletResponse;
+
+    HttpChannel channel = ((Response) httpServletResponse).getHttpChannel();


msn-tldr · 2024-05-30T11:25:32Z

core/src/main/java/io/confluent/rest/RestConfig.java

@@ -478,6 +478,13 @@ public class RestConfig extends AbstractConfig {
          + "If the limit is set to a non-positive number, no limit is applied. Default is 0.";
  private static final int SERVER_CONNECTION_LIMIT_DEFAULT = 0;

+  public static final String MAX_CONNECTION_DURATION_MS = "max.connection.duration.ms";
+  public static final String MAX_CONNECTION_DURATION_MS_DOC =
+      "The maximum duration in milliseconds that a connection can be open. "


CMIIW. As i understand, this config only applies for connections that are not-idle, i.e i.e. receiving http requests. So should this be renamed to MAX_NON_IDLE_TCP_CONNECTION_DURATION_MS ?

And add to the comment that for idle connections MAX_IDLE_TIMEOUT_MS should be configured appropriately. Wdyt?

core/src/test/java/io/confluent/rest/ConnectionDurationFilterTest.java

msn-tldr · 2024-05-30T11:43:50Z

core/src/test/java/io/confluent/rest/ConnectionDurationFilterTest.java

+  @Test
+  public void connectionDurationTest_ApacheHttpClient_Ephemeral_HTTP1() throws Exception {
+
+    CloseableHttpClient client = createPersistentClient();


Should this be creating an ephemeral client given the test is called ephemeral? Naming should be consistent.

Apologies. It's a typo. will filx

msn-tldr · 2024-05-30T11:44:45Z

core/src/test/java/io/confluent/rest/ConnectionDurationFilterTest.java

+  @Test
+  public void connectionDurationTest_ApacheHttpClient_Persistent_HTTP1() throws Exception {
+
+    CloseableHttpClient client = createEphemeralClient();


Naming should be consistent, similar to comment on line 83.

Apologies. It's a typo. will filx

msn-tldr · 2024-05-30T11:45:43Z

core/src/test/java/io/confluent/rest/ConnectionDurationFilterTest.java

+
+      response.close();
+      addressList.add(responseBody);
+      Thread.sleep(MAX_CONNECTION_DURATION_MS + 10);


Going by the test name, is this test creating "persistent" tcp connections? So should the request fail given the connection remains idle for longer than MAX_CONNECTION_DURATION_MS

niteshmor · 2024-06-14T17:31:31Z

Just providing a quick update for reviewers on this -- I tried to use the jetty http client for some of the tests, and it seems like that client is not happy with the current approach (which is a good enough indication that there might be other clients in other languages exhibiting the same behavior).

Specifically, the current approach of using a request filter terminates the connection after the server has ack-ed (at the TCP level) that it received a request (this is for http 1.1). That causes EOF exceptions in the jetty http client because it is very much hoping to get a response back for a request it was able to successfully write on the socket.

Just for testing purposes, I tried to terminate the connection asynchronously a few milliseconds after the response was returned, and the jetty http client is much more happier and accommodating of the fact that the connection was already closed (i.e. socket writing failed) when it tried to send a subsequent request and automatically opens a new connection under the hood.

I am checking on how I can rework the solution (maybe in this PR or a separate one, depending on how drastically the changes look). Will post an update when I have something reasonably workable.

trnguyencflt · 2024-06-17T08:17:54Z

@niteshmor thanks for the update 👍

AMF-991: Support for max.connection.duration.ms

595e7d5

Add support for terminating long running client connections after a specified duration.

checkstyle fixes

a680325

niteshmor marked this pull request as ready for review May 30, 2024 08:58

niteshmor requested review from a team as code owners May 30, 2024 08:58

ehumber reviewed May 30, 2024

View reviewed changes

trnguyencflt reviewed May 30, 2024

View reviewed changes

msn-tldr reviewed May 30, 2024

View reviewed changes


		HttpServletResponse httpServletResponse = (HttpServletResponse) servletResponse;

		HttpChannel channel = ((Response) httpServletResponse).getHttpChannel();

AMF-991: Support for max.connection.duration.ms #483

Are you sure you want to change the base?

AMF-991: Support for max.connection.duration.ms #483

Conversation

niteshmor commented May 30, 2024 • edited Loading

cla-assistant bot commented May 30, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

msn-tldr May 30, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

msn-tldr May 30, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

niteshmor commented Jun 14, 2024

trnguyencflt commented Jun 17, 2024

niteshmor commented May 30, 2024 •

edited

Loading

cla-assistant bot commented May 30, 2024 •

edited

Loading

msn-tldr May 30, 2024 •

edited

Loading

msn-tldr May 30, 2024 •

edited

Loading