Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[E:onnxruntime:, sequential_executor.cc:516 onnxruntime::ExecuteKernel] Non-zero status code returned while running LayerNormalization node. #21012

Open
Jose17-ml opened this issue Jun 12, 2024 · 2 comments
Labels
ep:DML issues related to the DirectML execution provider feature request request for unsupported feature or enhancement model:transformer issues related to a transformer model: BERT, GPT2, Hugging Face, Longformer, T5, etc. platform:windows issues related to the Windows platform

Comments

@Jose17-ml
Copy link

Describe the feature request

Hi Experts,

I just started working AI/ML stuff recently. Currently trying to run Hugging Face - Optimum model on GPU using DML-EP

Platform: Windows 11

Model: https://huggingface.co/optimum/m2m100_418M

Changes:

import onnxruntime

session_opt = onnxruntime.SessionOptions()
session_opt.log_severity_level = 0
#provider = "CPUExecutionProvider"
provider = "DmlExecutionProvider"
NUM_ITERATIONS = 1

model_name = "optimum/m2m100_418M"

hi_text = "जीवन एक चॉकलेट बॉक्स की तरह है।"
chinese_text = "生活就像一盒巧克力。"

model = ORTModelForSeq2SeqLM.from_pretrained(model_name, provider=provider, session_options=session_opt)

When I use "DmlExecutionProvider", I see below error

2024-06-12 14:35:21.2694023 [E:onnxruntime:, sequential_executor.cc:516 onnxruntime::ExecuteKernel] Non-zero status code returned while running LayerNormalization node. Name:'/model/decoder/layer_norm/Mul/LayerNormFusion/' Status Message: C:\a_work\1\s\onnxruntime\core\providers\dml\DmlExecutionProvider\src\MLOperatorAuthorImpl.cpp(2468)\onnxruntime_pybind11_state.pyd!00007FFA9B5A09BF: (caller: 00007FFA9B5A2174) Exception(3) tid(1ff4) 887A0005 The GPU device instance has been suspended. Use GetDeviceRemovedReason to determine the appropriate action.

But where as with "CPUExecutionProvider", I don't see any issue and able to run the model successfully.

So, I need your help to resolve this issue and run with DML-EP.

Thanks

Describe scenario use case

Trying to huggingface-Optimum model with DML-EP

@Jose17-ml Jose17-ml added the feature request request for unsupported feature or enhancement label Jun 12, 2024
@github-actions github-actions bot added ep:DML issues related to the DirectML execution provider model:transformer issues related to a transformer model: BERT, GPT2, Hugging Face, Longformer, T5, etc. platform:windows issues related to the Windows platform labels Jun 12, 2024
@Jose17-ml
Copy link
Author

Hi Experts,

Need your inputs.

@Jose17-ml
Copy link
Author

Hi,

Any inputs?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ep:DML issues related to the DirectML execution provider feature request request for unsupported feature or enhancement model:transformer issues related to a transformer model: BERT, GPT2, Hugging Face, Longformer, T5, etc. platform:windows issues related to the Windows platform
Projects
None yet
Development

No branches or pull requests

1 participant