-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nova Linux build job runs as root inside its Docker container #5091
Comments
cc @atalman for NOVA workflows @malfet: This sort of runner clean up shoud be done by the runner and not in the workflows, ARC should be able to do this cc @DanilBaibak @jeanschmidt |
AI: To figure out a POC going forward for Nova workflow. |
cc @atalman |
pytorch/executorch#3502 is caused by the hack I added to executorch to work around this "running as root" issue: buck2 does not like running as root. |
This hack is required to work around pytorch/test-infra#5091, which runs some CI jobs as root, which buck2 doesn't like. But we saw in pytorch#3502 that this can break things for some normal users. Reduce the blast radius of this hack, only modifying HOME when actually running as root.
Summary: This hack is required to work around pytorch/test-infra#5091, which runs some CI jobs as root, which buck2 doesn't like. But we saw in #3502 that this can break things for some normal users. Reduce the blast radius of this hack, only modifying HOME when actually running as root. Mitigates #3502 Pull Request resolved: #3507 Test Plan: `./install_requirements.sh` succeeded locally. The build-wheels jobs for this PR do not break during the buck2 phase, and show that they're unsetting HOME. https://github.com/pytorch/executorch/actions/runs/8946028682/job/24575999986?pr=3507#step:14:118 ``` 2024-05-03T23:32:08.9616508Z temporarily unsetting HOME while running as root ``` https://github.com/pytorch/executorch/actions/runs/8946028682/job/24575999986?pr=3507#step:14:465 ``` 2024-05-03T23:32:08.9914557Z restored HOME ``` Reviewed By: larryliu0820 Differential Revision: D56958571 Pulled By: dbort fbshipit-source-id: c7c6abdd52361af8253ce068002e3c23dee16f6b
This hack is required to work around pytorch/test-infra#5091, which runs some CI jobs as root, which buck2 doesn't like. But we saw in pytorch#3502 that this can break things for some normal users. Reduce the blast radius of this hack, only modifying HOME when actually running as root.
This hack is required to work around pytorch/test-infra#5091, which runs some CI jobs as root, which buck2 doesn't like. But we saw in #3502 that this can break things for some normal users. Reduce the blast radius of this hack, only modifying HOME when actually running as root.
This issue has been partially resolved now that we have a hook to chown everything back to the correct runner user. |
The files are now chowned, but does the job still run with |
Some jobs ran as root, which buck2 doesn't like. We were able to work around it by unsetting HOME while running. But now that pytorch/test-infra#5091 is fixed, thost jobs will not run as root, so we can remove this hack. Test Plan: Ran `./install_requirements.sh --pybind xnnpack` on my macbook and it built/installed successfully.
Checking with @atalman on this, and we have no plan to fix the run by root part atm. But, the files are now chowned to the correct user before and after the job finishes. I close the issue because there is already the buck workaround on ET. However, if you plan to remove that patch eventually, I could keep the issue open for a future fix. |
The container in question https://hub.docker.com/r/pytorch/manylinux-builder/tags.
There are reports from ExecuTorch and Torchtune about the problematic way of running Nova Linux build job as root inside the container.
Usually, running thing as root is bad, and we should figure out a way to not do it anymore
cc @seemethere @atalman @kit1980 @clee2000 @dbort @malfet
The text was updated successfully, but these errors were encountered: