-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature][Zeta] Add Master and Worker split mode deployment #6947
base: dev
Are you sure you want to change the base?
[Feature][Zeta] Add Master and Worker split mode deployment #6947
Conversation
3cb8283
to
47f5f24
Compare
ea34249
to
c631ef7
Compare
1d1a58b
to
94ff17c
Compare
3c1378f
to
324972a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great
...src/main/java/org/apache/seatunnel/engine/server/resourcemanager/ResourceManagerFactory.java
Outdated
Show resolved
Hide resolved
12b212c
to
7b6882f
Compare
...gine-server/src/main/java/org/apache/seatunnel/engine/server/LiteNodeDropOutTcpIpJoiner.java
Show resolved
Hide resolved
...unnel-engine-server/src/main/java/org/apache/seatunnel/engine/server/CoordinatorService.java
Outdated
Show resolved
Hide resolved
...unnel-engine-server/src/main/java/org/apache/seatunnel/engine/server/CoordinatorService.java
Outdated
Show resolved
Hide resolved
...nel-engine-server/src/main/java/org/apache/seatunnel/engine/server/SeaTunnelNodeContext.java
Show resolved
Hide resolved
...nel-engine-server/src/main/java/org/apache/seatunnel/engine/server/dag/physical/SubPlan.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM if ci passed
@@ -0,0 +1,70 @@ | |||
--- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's different with https://github.com/apache/seatunnel/blob/dev/docs/en/start-v2/locally/deployment.md?
|
||
The Master service and Worker service of SeaTunnel Engine are mixed in the same process, and all nodes can run jobs and participate in the election to become master, that is, the master node is also running synchronous tasks simultaneously. In this mode, the Imap (which saves the status information of the task to provide support for the task's fault tolerance) data will be distributed across all nodes. | ||
|
||
Usage Recommendation: It is recommended to use the [separated cluster mode](separated-cluster-deployment.md). In the hybrid cluster mode, the Master node needs to run tasks synchronously. When the task scale is large, it will affect the stability of the Master node. Once the Master node crashes or the heartbeat times out, it will cause the Master node to switch, and the Master node switch will cause all running tasks to perform fault tolerance, further increasing the load on the cluster. Therefore, we recommend using the [separated cluster mode](separated-cluster-deployment.md). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we mark separated cluster mode
as experimental feature? When it ready on prd then change it to recommend.
# Other configurations | ||
``` | ||
|
||
### 4.2_slot configuration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
### 4.2_slot configuration | |
### 4.2 Slot Configuration |
@@ -38,6 +38,12 @@ public class ServerCommandArgs extends CommandArgs { | |||
description = "The cluster daemon mode") | |||
private boolean daemonMode = false; | |||
|
|||
@Parameter( | |||
names = {"-r", "--rule"}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
names = {"-r", "--rule"}, | |
names = {"-r", "--role"}, |
@@ -0,0 +1,295 @@ | |||
/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add some comment blocks to quickly locate which codes we have modified, so that we will not lose the codes when upgrading in the future?
@@ -37,5 +37,11 @@ hazelcast: | |||
hazelcast.invocation.max.retry.count: 20 | |||
hazelcast.tcp.join.port.try.count: 30 | |||
hazelcast.logging.type: log4j2 | |||
hazelcast.operation.generic.thread.count: 50 | |||
hazelcast.operation.generic.thread.count: 100 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why change the default value of thread count? The thread in operation thread pool never release. So if there is no need, don't increase it.
@@ -450,6 +463,8 @@ private boolean prepareRestorePipeline() { | |||
reset(); | |||
jobMaster.getCheckpointManager().reportedPipelineRunning(pipelineId, false); | |||
jobMaster.getPhysicalPlan().addPipelineEndCallback(this); | |||
log.info("Wait {}s and then restore the pipeline ", pipelineRestoreIntervalSeconds); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add job id and pipeline id info
@@ -450,6 +463,8 @@ private boolean prepareRestorePipeline() { | |||
reset(); | |||
jobMaster.getCheckpointManager().reportedPipelineRunning(pipelineId, false); | |||
jobMaster.getPhysicalPlan().addPipelineEndCallback(this); | |||
log.info("Wait {}s and then restore the pipeline ", pipelineRestoreIntervalSeconds); | |||
Thread.sleep(pipelineRestoreIntervalSeconds); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thread.sleep(pipelineRestoreIntervalSeconds); | |
Thread.sleep(pipelineRestoreIntervalSeconds * 1000); |
Purpose of this pull request
Zeta Master is separated from the Worker
Does this PR introduce any user-facing change?
Yes, Since this PR, the user can start zeta cluster node with
-r <role>
, the role can bemaster_and_worker
,master
,worker
. More information can get from the document in the PR.How was this patch tested?
Check list
New License Guide
release-note
.