[E:onnxruntime:, sequential_executor.cc:516 onnxruntime::ExecuteKernel] Non-zero status code returned while running LayerNormalization node. #21012

Jose17-ml · 2024-06-12T09:32:58Z

Describe the feature request

Hi Experts,

I just started working AI/ML stuff recently. Currently trying to run Hugging Face - Optimum model on GPU using DML-EP

Platform: Windows 11

Model: https://huggingface.co/optimum/m2m100_418M

Changes:

import onnxruntime

session_opt = onnxruntime.SessionOptions()
session_opt.log_severity_level = 0
#provider = "CPUExecutionProvider"
provider = "DmlExecutionProvider"
NUM_ITERATIONS = 1

model_name = "optimum/m2m100_418M"

hi_text = "जीवन एक चॉकलेट बॉक्स की तरह है।"
chinese_text = "生活就像一盒巧克力。"

model = ORTModelForSeq2SeqLM.from_pretrained(model_name, provider=provider, session_options=session_opt)

When I use "DmlExecutionProvider", I see below error

2024-06-12 14:35:21.2694023 [E:onnxruntime:, sequential_executor.cc:516 onnxruntime::ExecuteKernel] Non-zero status code returned while running LayerNormalization node. Name:'/model/decoder/layer_norm/Mul/LayerNormFusion/' Status Message: C:\a_work\1\s\onnxruntime\core\providers\dml\DmlExecutionProvider\src\MLOperatorAuthorImpl.cpp(2468)\onnxruntime_pybind11_state.pyd!00007FFA9B5A09BF: (caller: 00007FFA9B5A2174) Exception(3) tid(1ff4) 887A0005 The GPU device instance has been suspended. Use GetDeviceRemovedReason to determine the appropriate action.

But where as with "CPUExecutionProvider", I don't see any issue and able to run the model successfully.

So, I need your help to resolve this issue and run with DML-EP.

Thanks

Describe scenario use case

Trying to huggingface-Optimum model with DML-EP

Jose17-ml · 2024-06-13T07:29:36Z

Hi Experts,

Need your inputs.

Jose17-ml · 2024-06-18T07:26:40Z

Hi,

Any inputs?

Jose17-ml added the feature request request for unsupported feature or enhancement label Jun 12, 2024

github-actions bot added ep:DML issues related to the DirectML execution provider model:transformer issues related to a transformer model: BERT, GPT2, Hugging Face, Longformer, T5, etc. platform:windows issues related to the Windows platform labels Jun 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[E:onnxruntime:, sequential_executor.cc:516 onnxruntime::ExecuteKernel] Non-zero status code returned while running LayerNormalization node. #21012

[E:onnxruntime:, sequential_executor.cc:516 onnxruntime::ExecuteKernel] Non-zero status code returned while running LayerNormalization node. #21012

Jose17-ml commented Jun 12, 2024

Jose17-ml commented Jun 13, 2024

Jose17-ml commented Jun 18, 2024

[E:onnxruntime:, sequential_executor.cc:516 onnxruntime::ExecuteKernel] Non-zero status code returned while running LayerNormalization node. #21012

[E:onnxruntime:, sequential_executor.cc:516 onnxruntime::ExecuteKernel] Non-zero status code returned while running LayerNormalization node. #21012

Comments

Jose17-ml commented Jun 12, 2024

Describe the feature request

Describe scenario use case

Jose17-ml commented Jun 13, 2024

Jose17-ml commented Jun 18, 2024