Avoid late records preemptively rotating/committing S3 output files #574

frankgrimes97 · 2022-10-21T15:08:19Z

When late data is arriving on a Kafka partition (e.g. data for the previous hourly encodedPartition) the following check triggers an immediate rotation and commit of files:

kafka-connect-storage-cloud/kafka-connect-s3/src/main/java/io/confluent/connect/s3/TopicPartitionWriter.java

Line 410 in 918730d

|| !encodedPartition.equals(currentEncodedPartition)

When late data is interleaved with up-to-date data arriving the problem is exacerbated.
When this happens, a quick succession of rotations cause a large number of small files to be committed to S3.
This affects both the performance/throughput of Kafka Connect as well as downstream consumers which need to deal with the many small file fragments.

This PR adds a new max.open.files.per.partition S3SinkConnectorConfig. It defaults to 1, which preserves the current existing behavior.

If set to a value > 1, the following behavior is enabled:

A separate commit file is kept open for each encodedPartition target up to a maximum of max.open.files.per.partition
Only when any of the encodedPartition targets hits its rotation condition (flush.size, rotate.interval.ms) does rotation occur, committing all open files. All files are committed so that S3Sink's pre-commit hook will commit a high watermark of offset to the Kafka consumer group. This avoids buffered gaps of data still being in-flight when that occurs.

It's worth noting that this issue/limitation was previously encountered and is well-described as part of:
"CC-2313 Handle late arriving records in storage cloud sink connectors" #187

However, that feature was subsequently reverted:
a2ce6fc confluentinc/kafka-connect-storage-common#87

N.B. Unlike the solution proposed on CC-2313, we do not opt to write late data to an incorrect encodedPartition. i.e. late data for hour 7 will not land in a path/file for hour 8

When late data is arriving on a Kafka partition (e.g. data for the previous hourly encodedPartition) the following check triggers an immediate rotation and commit of files: https://github.com/confluentinc/kafka-connect-storage-cloud/blob/918730d011dcd199e810ec3a68a03ab01c927f62/kafka-connect-s3/src/main/java/io/confluent/connect/s3/TopicPartitionWriter.java#L410 When late data is interleaved with up-to-date data arriving the problem is exacerbated. When this happens, a quick succession of rotations cause a large number of small files to be committed to S3. This affects both the performance/throughput of Kafka Connect as well as downstream consumers which need to deal with the many small file fragments. This PR adds a new `max.open.files.per.partition` S3SinkConnectorConfig. It defaults to 1, which preserves the current existing behavior. If set to a value > 1, the following behavior is enabled: - A separate commit file is kept open for each encodedPartition target up to a maximum of `max.open.files.per.partition` - Only when any of the encodedPartition targets hits its rotation condition (`flush.size`, `rotate.interval.ms`) does rotation occur, committing all open files. All files are committed so that S3Sink's pre-commit hook will commit a high watermark of offset to the Kafka consumer group. This avoids buffered gaps of data still being in-flight when that occurs. It's worth noting that this issue/limitation was previously encountered and is well-described as part of: "CC-2313 Handle late arriving records in storage cloud sink connectors" confluentinc#187 However, that feature was subsequently reverted: confluentinc@a2ce6fc confluentinc/kafka-connect-storage-common#87 N.B. Unlike the solution proposed on CC-2313, we do not opt to write late data to an incorrect encodedPartition. i.e. late data for hour 7 will not land in a path/file for hour 8

frankgrimes97 · 2022-11-01T14:08:16Z

Hi, it's not clear to me that the Jenkins-public-CI integration test failures are due to my code changes.
"AmazonS3Exception: Access Denied (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied;"
Is that CI check broken? (I see most other outside-contributed PRs have similar failed checks)

The default baseRecordTimestamp was incorrect leading to unnessary file committing and rotation. Update unit test case accordingly.

frankgrimes97 · 2022-12-08T15:36:52Z

Quick follow-up... we've been stably and successfully running this feature branch for a number of weeks now and it has improved both throughput and reduced the proliferation of unnecessarily small files when late data is being processed by the S3 Sink Connector.

We'd very much prefer to have this ultimately merged upstream to avoid needing to maintain our own fork moving forward.
Is there any interest on Confluent's side to work with us to get this merged?

Cheers!

frankgrimes97 · 2023-03-28T14:21:52Z

@kkonstantine I see you worked on and approved #187. Any chance you could help us get this PR reviewed and hopefully merged upstream? Thanks!

frankgrimes97 requested a review from a team as a code owner October 21, 2022 15:08

frankgrimes97 force-pushed the feature/avoid-many-small-s3-files-on-late-data branch from 9351a51 to 0af783e Compare October 21, 2022 15:22

frankgrimes97 changed the title ~~Avoid late records preemptively rotating/commiting S3 output files~~ Avoid late records preemptively rotating/committing S3 output files Oct 21, 2022

Fix bug in rotateOnTime

db365c4

The default baseRecordTimestamp was incorrect leading to unnessary file committing and rotation. Update unit test case accordingly.

schizhov mentioned this pull request Feb 6, 2024

stop rotating on partition change when rotate.interval.ms is set #715

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid late records preemptively rotating/committing S3 output files #574

Avoid late records preemptively rotating/committing S3 output files #574

frankgrimes97 commented Oct 21, 2022

frankgrimes97 commented Nov 1, 2022

frankgrimes97 commented Dec 8, 2022

frankgrimes97 commented Mar 28, 2023

Avoid late records preemptively rotating/committing S3 output files #574

Are you sure you want to change the base?

Avoid late records preemptively rotating/committing S3 output files #574

Conversation

frankgrimes97 commented Oct 21, 2022

frankgrimes97 commented Nov 1, 2022

frankgrimes97 commented Dec 8, 2022

frankgrimes97 commented Mar 28, 2023