Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nova Linux build job runs as root inside its Docker container #5091

Open
huydhn opened this issue Apr 12, 2024 · 7 comments
Open

Nova Linux build job runs as root inside its Docker container #5091

huydhn opened this issue Apr 12, 2024 · 7 comments
Labels
bug Something isn't working triage review

Comments

@huydhn
Copy link
Contributor

huydhn commented Apr 12, 2024

The container in question https://hub.docker.com/r/pytorch/manylinux-builder/tags.

There are reports from ExecuTorch and Torchtune about the problematic way of running Nova Linux build job as root inside the container.

  1. On ExecuTorch, buck2 refuses to run as root https://github.com/pytorch/executorch/actions/runs/8655199802/job/23733768589#step:14:125, which blocks ET from using the workflow. For the context, ET use buck2 to gather the source files before passing them to cmake.
  2. On TorchTune, running the build as root leaves some artifacts like the conda folder on the runner, which could prevent subsequent jobs to clean it up as @clee2000 discovered it https://github.com/pytorch/torchtune/actions/runs/8639310064/job/23685409883?pr=688.

Usually, running thing as root is bad, and we should figure out a way to not do it anymore

cc @seemethere @atalman @kit1980 @clee2000 @dbort @malfet

@huydhn huydhn added bug Something isn't working high priority labels Apr 12, 2024
@clee2000
Copy link
Contributor

clee2000 commented Apr 16, 2024

cc @atalman for NOVA workflows

@malfet: This sort of runner clean up shoud be done by the runner and not in the workflows, ARC should be able to do this cc @DanilBaibak @jeanschmidt

@huydhn
Copy link
Contributor Author

huydhn commented Apr 30, 2024

AI: To figure out a POC going forward for Nova workflow.

@huydhn
Copy link
Contributor Author

huydhn commented Apr 30, 2024

cc @atalman

@dbort
Copy link
Contributor

dbort commented May 3, 2024

pytorch/executorch#3502 is caused by the hack I added to executorch to work around this "running as root" issue: buck2 does not like running as root.

dbort added a commit to dbort/executorch-1 that referenced this issue May 3, 2024
This hack is required to work around pytorch/test-infra#5091, which
runs some CI jobs as root, which buck2 doesn't like.

But we saw in pytorch#3502 that this can break things for some normal users.

Reduce the blast radius of this hack, only modifying HOME when actually
running as root.
facebook-github-bot pushed a commit to pytorch/executorch that referenced this issue May 4, 2024
Summary:
This hack is required to work around pytorch/test-infra#5091, which runs some CI jobs as root, which buck2 doesn't like.

But we saw in #3502 that this can break things for some normal users.

Reduce the blast radius of this hack, only modifying HOME when actually running as root.

Mitigates #3502

Pull Request resolved: #3507

Test Plan:
`./install_requirements.sh` succeeded locally.

The build-wheels jobs for this PR do not break during the buck2 phase, and show that they're unsetting HOME.

https://github.com/pytorch/executorch/actions/runs/8946028682/job/24575999986?pr=3507#step:14:118
```
2024-05-03T23:32:08.9616508Z temporarily unsetting HOME while running as root
```

https://github.com/pytorch/executorch/actions/runs/8946028682/job/24575999986?pr=3507#step:14:465
```
2024-05-03T23:32:08.9914557Z restored HOME
```

Reviewed By: larryliu0820

Differential Revision: D56958571

Pulled By: dbort

fbshipit-source-id: c7c6abdd52361af8253ce068002e3c23dee16f6b
dbort added a commit to dbort/executorch-1 that referenced this issue May 4, 2024
This hack is required to work around pytorch/test-infra#5091, which
runs some CI jobs as root, which buck2 doesn't like.

But we saw in pytorch#3502 that this can break things for some normal users.

Reduce the blast radius of this hack, only modifying HOME when actually
running as root.
mcr229 pushed a commit to pytorch/executorch that referenced this issue May 21, 2024
This hack is required to work around pytorch/test-infra#5091, which
runs some CI jobs as root, which buck2 doesn't like.

But we saw in #3502 that this can break things for some normal users.

Reduce the blast radius of this hack, only modifying HOME when actually
running as root.
@huydhn
Copy link
Contributor Author

huydhn commented Jun 11, 2024

This issue has been partially resolved now that we have a hook to chown everything back to the correct runner user.

@huydhn huydhn closed this as completed Jun 11, 2024
@dbort
Copy link
Contributor

dbort commented Jun 11, 2024

The files are now chowned, but does the job still run with root as the user?

dbort added a commit to dbort/executorch-1 that referenced this issue Jun 12, 2024
Some jobs ran as root, which buck2 doesn't like. We were able to
work around it by unsetting HOME while running.

But now that pytorch/test-infra#5091 is fixed,
thost jobs will not run as root, so we can remove this hack.

Test Plan:
Ran `./install_requirements.sh --pybind xnnpack` on my macbook and it
built/installed successfully.
@huydhn
Copy link
Contributor Author

huydhn commented Jun 12, 2024

Checking with @atalman on this, and we have no plan to fix the run by root part atm. But, the files are now chowned to the correct user before and after the job finishes. I close the issue because there is already the buck workaround on ET. However, if you plan to remove that patch eventually, I could keep the issue open for a future fix.

@huydhn huydhn reopened this Jun 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triage review
Projects
Status: Cold Storage
Development

No branches or pull requests

3 participants