Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] [FTP Connector] The reading and writing of FTP are very slow #7048

Open
3 tasks done
Xuzhengz opened this issue Jun 21, 2024 · 1 comment
Open
3 tasks done

[Bug] [FTP Connector] The reading and writing of FTP are very slow #7048

Xuzhengz opened this issue Jun 21, 2024 · 1 comment
Labels

Comments

@Xuzhengz
Copy link

Search before asking

  • I had searched in the issues and found no similar issues.

What happened

The read and write of FTP is very slow. I have tried to connect to FTP and it took a few seconds to complete. I have ruled out the reason for the slow connection. When reading, it takes a while to create a task, and then assigning the read FTP task to subtasks is also slow. When writing, the release classloader keeps releasing, and only one piece of data is written out, but the task takes a few minutes to complete.

SeaTunnel Version

dev-2.3.6

SeaTunnel Config

{
    "env": {
        "job.name": "Xml文件输出",
        "job.mode": "batch"
    },
    "preHandler": [

    ],
    "source": [
        {
            "plugin_name": "Jdbc",
            "driver": "com.mysql.cj.jdbc.Driver",
            "connection_check_timeout_sec": 100,
            "table_list": [
                {
                    "table_path": "test_data.device",
                    "query": "SELECT\n `device_id`,\n `name`,\n `type`,\n `longitude`,\n `latitude`,\n `height`,\n `radius`,\n `distance`,\n `service_address`,\n `status`,\n `term_type`,\n `properties`,\n `runway_name`,\n `direction`,\n `runway_code`,\n `delay`\nFROM\n `device`"
                }
            ],
            "database": "test_data",
            "url": "jdbc:mysql://******:3306/test_data?remarks=true&useInformationSchema=true&useCursorFetch=true&defaultFetchSize=2048&rewriteBatchedStatements=true",
            "user": "******",
            "password": "******",
            "result_table_name": "ot_b7ba264ac3a84eb4b4d1b3bb93373a20"
        }
    ],
    "transform": [

    ],
    "sink": [
        {
            "file_format_type": "xml",
            "custom_filename": true,
            "file_name_expression": "xml_test",
            "is_enable_transaction": false,
            "xml_root_tag": "RECORDS",
            "xml_row_tag": "RECORD",
            "xml_use_attr_format": false,
            "batch_size": 1000000000,
            "plugin_name": "FtpFile",
            "host": "******",
            "port": "******",
            "user": "******",
            "password": "******",
            "tmp_path": "/ottomi/tmp/ottomi",
            "path": "/ottomi/file-node/download/1793861143369256962/xml/",
            "result_table_name": "ot_16aad011b9314e15977921dac312ca5f",
            "source_table_name": [
                "ot_b7ba264ac3a84eb4b4d1b3bb93373a20"
            ]
        }
    ]
}

Running Command

bin/seatunnel.sh -c ftp.json

Error Exception

A small amount of data, but the task took a few minutes to complete, or even a long time without any response, and the client disconnected



java.lang.RuntimeException: org.apache.hadoop.fs.ftp.FTPException: Client not connected
at org.apache.seatunnel.engine.server.task.flow.SinkFlowLifeCycle.received(SinkFlowLifeCycle.java:262)
at org.apache.seatunnel.engine.server.task.flow.SinkFlowLifeCycle.received(SinkFlowLifeCycle.java:68)
at org.apache.seatunnel.engine.server.task.SeaTunnelTransformCollector.collect(SeaTunnelTransformCollector.java:39)
at org.apache.seatunnel.engine.server.task.SeaTunnelTransformCollector.collect(SeaTunnelTransformCollector.java:27)
at org.apache.seatunnel.engine.server.task.group.queue.IntermediateBlockingQueue.handleRecord(IntermediateBlockingQueue.java:70)
at org.apache.seatunnel.engine.server.task.group.queue.IntermediateBlockingQueue.collect(IntermediateBlockingQueue.java:50)
at org.apache.seatunnel.engine.server.task.flow.IntermediateQueueFlowLifeCycle.collect(IntermediateQueueFlowLifeCycle.java:51)
at org.apache.seatunnel.engine.server.task.TransformSeaTunnelTask.collect(TransformSeaTunnelTask.java:73)
at org.apache.seatunnel.engine.server.task.SeaTunnelTask.stateProcess(SeaTunnelTask.java:168)
at org.apache.seatunnel.engine.server.task.TransformSeaTunnelTask.call(TransformSeaTunnelTask.java:78)
at org.apache.seatunnel.engine.server.TaskExecutionService$BlockingWorker.run(TaskExecutionService.java:703)
at

Zeta or Flink or Spark Version

No response

Java or Scala Version

1.8

Screenshots

image

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@Xuzhengz Xuzhengz added the bug label Jun 21, 2024
@Xuzhengz
Copy link
Author

Compared to other file read and write plugins such as S3 and local, they are both fast, but FTP is particularly slow

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant