Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] [seatunnel-engine-server] slot申请时如果资源处于临界状态,可能导致本应成功的资源申请操作失败 #6973

Open
2 of 3 tasks
liangcw1111 opened this issue Jun 12, 2024 · 0 comments · May be fixed by #7049
Assignees
Labels
Milestone

Comments

@liangcw1111
Copy link

Search before asking

  • I had searched in the issues and found no similar issues.

What happened

集群由3台服务器组成, seatunnel.engine.slot-service.slot-num=6,集群共18个slot资源. 当slot已占用6个时,并行提交两个需要6个slot的任务,此时可能导致并行提交的任务部分失败(非必现,概率中等)或全部失败(概率较小),日志记录apply resource not success, release all already applied resource
初步分析可能是并行提交时, 先查询并预分配了slot应分配的服务器, 但是在实际分配时,该服务器的slot可能已被其它并行任务全部占用,导致实际分配失败.

SeaTunnel Version

2.3.5

SeaTunnel Config

seatunnel:
  engine:
    classloader-cache-mode: true
    backup-count: 1
    print-execution-info-interval: 120
    print-job-metrics-info-interval: 10
    queue-type: blockingqueue
    slot-service:
      dynamic-slot: false
      slot-num: 6
    checkpoint:
      interval: 30000
      timeout: 21474836460
      max-concurrent: 10
      tolerable-failure: 2

Running Command

-Dseatunnel.config=/alidata1/za-seatunnel/apache-seatunnel-2.3.5-SNAPSHOT/config/seatunnel.yaml -Dhazelcast.config=/alidata1/za-seatunnel/apache-seatunnel-2.3.5-SNAPSHOT/config/hazelcast.yaml -Dlog4j2.contextSelector=org.apache.logging.log4j.core.async.AsyncLoggerContextSelector -Dlog4j2.configurationFile=/alidata1/za-seatunnel/apache-seatunnel-2.3.5-SNAPSHOT/config/log4j2.properties -Dseatunnel.logs.path=/alidata1/za-seatunnel/apache-seatunnel-2.3.5-SNAPSHOT/logs -Dseatunnel.logs.file_name=seatunnel-engine-server -Xms3g -Xmx3g -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp/seatunnel/dump/zeta-server -XX:MaxMetaspaceSize=1g -XX:+UseG1GC -XX:+PrintGCDetails -Xloggc:/alidata1/za-seatunnel/logs/gc.log -XX:+PrintGCDateStamps -XX:MaxGCPauseMillis=5000 -XX:InitiatingHeapOccupancyPercent=50 -XX:+UseStringDeduplication -XX:GCTimeRatio=4 -XX:G1ReservePercent=15 -XX:ConcGCThreads=2 -XX:G1HeapRegionSize=4M

Error Exception

apply resource not success, release all already applied resource

Zeta or Flink or Spark Version

No response

Java or Scala Version

No response

Screenshots

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@Hisoka-X Hisoka-X self-assigned this Jun 12, 2024
@Hisoka-X Hisoka-X added this to the 2.3.6 milestone Jun 12, 2024
@Hisoka-X Hisoka-X linked a pull request Jun 22, 2024 that will close this issue
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants