Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: docker build cuda extension error #5732

Open
1 task done
apachemycat opened this issue May 20, 2024 · 7 comments
Open
1 task done

[BUG]: docker build cuda extension error #5732

apachemycat opened this issue May 20, 2024 · 7 comments
Labels
bug Something isn't working

Comments

@apachemycat
Copy link

Is there an existing issue for this bug?

  • I have searched the existing issues

🐛 Describe the bug

when docker build run follow command
RUN BUILD_EXT=1 pip install colossalai-nightly

RuntimeError: [extension] Could not find any kernel compatible with the current environment.
but if I run this command in a container (with gpu flag to use GPU cards) then It suceed
base image
FROM nvcr.io/nvidia/cuda:11.8.0-devel-ubuntu20.04

Environment

No response

@apachemycat apachemycat added the bug Something isn't working label May 20, 2024
@Issues-translate-bot
Copy link

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿


Title: [BUG]: docker build cuda extension error

@apachemycat
Copy link
Author

RUN BUILD_EXT=1 pip install colossalai-nightly:
#0 7.666 Collecting colossalai-nightly
#0 12.99 Downloading colossalai-nightly-2024.5.18.tar.gz (1.2 MB)
#0 13.69 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.2/1.2 MB 1.7 MB/s eta 0:00:00
#0 14.21 Preparing metadata (setup.py): started
#0 17.77 Preparing metadata (setup.py): finished with status 'error'
#0 17.78 error: subprocess-exited-with-error
#0 17.78
#0 17.78 × python setup.py egg_info did not run successfully.
#0 17.78 │ exit code: 1
#0 17.78 ╰─> [7 lines of output]
#0 17.78 No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'
#0 17.78 Traceback (most recent call last):
#0 17.78 File "", line 2, in
#0 17.78 File "", line 34, in
#0 17.78 File "/tmp/pip-install-4b2qtsp_/colossalai-nightly_41088cca51d34a4e95a34fd3ef65987c/setup.py", line 90, in
#0 17.78 raise RuntimeError("[extension] Could not find any kernel compatible with the current environment.")
#0 17.78 RuntimeError: [extension] Could not find any kernel compatible with the current environment.
#0 17.78 [end of output]
#0 17.78

@apachemycat
Copy link
Author

one
Created wheel for colossalai-nightly: filename=colossalai_nightly-2024.5.18-cp310-cp310-linux_x86_64.whl size=23673844 sha256=0a0bb55154c1ce9758ff8f9dd4b38e4b462647ad38e9714d9a4b2de6153b163e
Stored in directory: /root/.cache/pip/wheels/ef/39/0e/39263ec364cb9d67240001279c9bcb1808b102252ea4ecaf33
Building wheel for contexttimer (setup.py) ... done
Created wheel for contexttimer: filename=contexttimer-0.3.3-py3-none-any.whl size=5804 sha256=877270da42acb2811b2b5fbb097ce315895a4f6ed3b4da34aa5318a60c758006
Stored in directory: /root/.cache/pip/wheels/72/1c/da/cfd97201d88ccce214427fa84a5caeb91fef7c5a1b4c4312b4
Successfully built colossalai-nightly contexttimer
Installing collected packages: ninja, distlib, contexttimer, wrapt, virtualenv, pydantic-core, nodeenv, msgpack, invoke, identify, cfgv, bcrypt, annotated-types, pynacl, pydantic, pre-commit, google, deprecated, cryptography, tokenizers, paramiko, transformers, ray, fabric, galore_torch, colossalai-nightly
Attempting uninstall: tokenizers
Found existing installation: tokenizers 0.19.1
Uninstalling tokenizers-0.19.1:
Successfully uninstalled tokenizers-0.19.1
Attempting uninstall: transformers
Found existing installation: transformers 4.42.0.dev0
Uninstalling transformers-4.42.0.dev0:
Successfully uninstalled transformers-4.42.0.dev0
Successfully installed annotated-types-0.6.0 bcrypt-4.1.3 cfgv-3.4.0 colossalai-nightly-2024.5.18 contexttimer-0.3.3 cryptography-42.0.7 deprecated-1.2.14 distlib-0.3.8 fabric-3.2.2 galore_torch-1.0 google-3.0.0 identify-2.5.36 invoke-2.2.0 msgpack-1.0.8 ninja-1.11.1.1 nodeenv-1.8.0 paramiko-3.4.0 pre-commit-3.7.1 pydantic-2.7.1 pydantic-core-2.18.2 pynacl-1.5.0 ray-2.22.0 tokenizers-0.15.2 transformers-4.36.2 virtualenv-20.26.2 wrapt-1.16.0
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
root@71c5383a668b:/app#
root@71c5383a668b:/app#

run in container with gpu device param

@GrannyProgramming
Copy link

workaround

Install ColossalAI from a specific commit

ARG VERSION=main
RUN git clone -b ${VERSION} https://github.com/hpcaitech/ColossalAI.git &&
cd ColossalAI &&
git checkout 3e05c07 &&
BUILD_EXT=1 pip install -v --no-cache-dir . &&
cd .. &&
rm -rf ColossalA

@apachemycat
Copy link
Author

thanks

@Issues-translate-bot
Copy link

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿


thanks

@ver217
Copy link
Member

ver217 commented Jun 12, 2024

This is because docker buildkit is not compatible with current cuda extension. You can set export FORCE_CUDA=1 before install colossalai in docker. Or you can disable docker buildkit by setting export DOCKER_BUILDKIT=0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants