Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature][Zeta] Add Master and Worker split mode deployment #6947

Open
wants to merge 8 commits into
base: dev
Choose a base branch
from

Conversation

EricJoy2048
Copy link
Member

@EricJoy2048 EricJoy2048 commented Jun 5, 2024

Purpose of this pull request

Zeta Master is separated from the Worker

Does this PR introduce any user-facing change?

Yes, Since this PR, the user can start zeta cluster node with -r <role>, the role can be master_and_worker,master,worker. More information can get from the document in the PR.

How was this patch tested?

Check list

@EricJoy2048 EricJoy2048 changed the title 240329 test split master worker [Feature][Zeta] Add Master and Worker split mode deployment Jun 5, 2024
@EricJoy2048 EricJoy2048 force-pushed the 240329_test_split_master_worker branch 13 times, most recently from 3cb8283 to 47f5f24 Compare June 11, 2024 13:09
@EricJoy2048 EricJoy2048 force-pushed the 240329_test_split_master_worker branch 16 times, most recently from ea34249 to c631ef7 Compare June 15, 2024 06:33
@EricJoy2048 EricJoy2048 force-pushed the 240329_test_split_master_worker branch 2 times, most recently from 1d1a58b to 94ff17c Compare June 15, 2024 13:24
@EricJoy2048 EricJoy2048 force-pushed the 240329_test_split_master_worker branch 4 times, most recently from 3c1378f to 324972a Compare June 18, 2024 11:57
@EricJoy2048 EricJoy2048 added this to the 2.3.6 milestone Jun 18, 2024
Copy link
Contributor

@gitfortian gitfortian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great

@EricJoy2048 EricJoy2048 force-pushed the 240329_test_split_master_worker branch from 12b212c to 7b6882f Compare June 27, 2024 02:43
Copy link
Contributor

@dailai dailai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM if ci passed

@@ -0,0 +1,70 @@
---
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


The Master service and Worker service of SeaTunnel Engine are mixed in the same process, and all nodes can run jobs and participate in the election to become master, that is, the master node is also running synchronous tasks simultaneously. In this mode, the Imap (which saves the status information of the task to provide support for the task's fault tolerance) data will be distributed across all nodes.

Usage Recommendation: It is recommended to use the [separated cluster mode](separated-cluster-deployment.md). In the hybrid cluster mode, the Master node needs to run tasks synchronously. When the task scale is large, it will affect the stability of the Master node. Once the Master node crashes or the heartbeat times out, it will cause the Master node to switch, and the Master node switch will cause all running tasks to perform fault tolerance, further increasing the load on the cluster. Therefore, we recommend using the [separated cluster mode](separated-cluster-deployment.md).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we mark separated cluster mode as experimental feature? When it ready on prd then change it to recommend.

# Other configurations
```

### 4.2_slot configuration
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
### 4.2_slot configuration
### 4.2 Slot Configuration

@@ -38,6 +38,12 @@ public class ServerCommandArgs extends CommandArgs {
description = "The cluster daemon mode")
private boolean daemonMode = false;

@Parameter(
names = {"-r", "--rule"},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
names = {"-r", "--rule"},
names = {"-r", "--role"},

@@ -0,0 +1,295 @@
/*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add some comment blocks to quickly locate which codes we have modified, so that we will not lose the codes when upgrading in the future?

@@ -37,5 +37,11 @@ hazelcast:
hazelcast.invocation.max.retry.count: 20
hazelcast.tcp.join.port.try.count: 30
hazelcast.logging.type: log4j2
hazelcast.operation.generic.thread.count: 50
hazelcast.operation.generic.thread.count: 100
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why change the default value of thread count? The thread in operation thread pool never release. So if there is no need, don't increase it.

@@ -450,6 +463,8 @@ private boolean prepareRestorePipeline() {
reset();
jobMaster.getCheckpointManager().reportedPipelineRunning(pipelineId, false);
jobMaster.getPhysicalPlan().addPipelineEndCallback(this);
log.info("Wait {}s and then restore the pipeline ", pipelineRestoreIntervalSeconds);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add job id and pipeline id info

@@ -450,6 +463,8 @@ private boolean prepareRestorePipeline() {
reset();
jobMaster.getCheckpointManager().reportedPipelineRunning(pipelineId, false);
jobMaster.getPhysicalPlan().addPipelineEndCallback(this);
log.info("Wait {}s and then restore the pipeline ", pipelineRestoreIntervalSeconds);
Thread.sleep(pipelineRestoreIntervalSeconds);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Thread.sleep(pipelineRestoreIntervalSeconds);
Thread.sleep(pipelineRestoreIntervalSeconds * 1000);

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants