Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"cleanup" option does not remove staged files from S3 #5091

Open
stevekm opened this issue Jun 26, 2024 · 3 comments
Open

"cleanup" option does not remove staged files from S3 #5091

stevekm opened this issue Jun 26, 2024 · 3 comments

Comments

@stevekm
Copy link
Contributor

stevekm commented Jun 26, 2024

Bug report

When using Nextflow with the cleanup = true option, input files staged from S3 are left in the work dir.

Expected behavior and actual behavior

In order to automatically clean up the work directory after a successful pipeline run, I was hoping that the cleanup option described here might also remove the S3 input files that were staged during pipeline execution. This does not seem to be the case and the files remain in the work dir under a path such as work/stage-xyz

You can reproduce this by running a pipeline with input files on S3, and include the option cleanup = true in your nextflow.config file. The contents of the task work dirs are removed but the staged files remain.

Environment

  • Nextflow version: 24.04.2
  • Java version: openjdk version "17.0.10" 2024-01-16 LTS
  • Operating system: Ubuntu 22.04

Additional context

Not sure if this is intentional?

@bentsherman
Copy link
Member

bentsherman commented Jun 26, 2024

The cleanup only iterates through the task directories, that is why it doesn't delete those stage directories. In fact I don't think the cleanup works at all on S3 (see #3645).

You can use nf-boost which has an experimental cleanup that is more efficient, but I haven't implemented cleanup for the stage directories.

If I recall correctly, each run has it's own stage directory of the pattern work/stage-${sessionId}, so a simple solution would be to just delete that directory at the end. A more aggressive solution would be to delete individual subdirectories as soon as they aren't needed anymore, but I'm not sure how difficult that would be.

@stevekm
Copy link
Contributor Author

stevekm commented Jun 26, 2024

Thanks. I was hoping for some solution that could be bundled inside of the nextflow.config so that it would get run automatically. I will try out nf-boost as well though would still want some way to "un-stage" the S3 files at the end of the pipeline

@bentsherman
Copy link
Member

You might be able to do it with a workflow onComplete handler in the config file. Something like this:

// nextflow.config
worflow.onComplete = {
    workDir.resolve("stage-${workflow.sessionId}").deleteDir()
}

See also: https://nextflow.io/docs/latest/metadata.html#decoupling-metadata

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants