Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run all Nodes on GPU/DML with DML-EP #21013

Open
Jose17-ml opened this issue Jun 12, 2024 · 3 comments
Open

Run all Nodes on GPU/DML with DML-EP #21013

Jose17-ml opened this issue Jun 12, 2024 · 3 comments
Labels
ep:DML issues related to the DirectML execution provider feature request request for unsupported feature or enhancement model:transformer issues related to a transformer model: BERT, GPT2, Hugging Face, Longformer, T5, etc. platform:windows issues related to the Windows platform

Comments

@Jose17-ml
Copy link

Jose17-ml commented Jun 12, 2024

Describe the feature request

I tried to run optimum models with DML EP (on my windows PC), for example take optimum/vit-base-patch16-224 · Hugging Face

model = ORTModelForImageClassification.from_pretrained(model_name, provider=“DmlExecutionProvider”)

onnx 1.16.1
onnxruntime 1.18.0
onnxruntime-directml 1.18.0
optimum 1.20.0

I see nodes are distributed between CPU EP & DML EP. Also, noticed different instances of same node are placed on both DML and CPU.

from verbose logs

2024-06-05 11:11:22.1833502 [V:onnxruntime:, session_state.cc:1152 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Node(s) placed on [DmlExecutionProvider]. Number of nodes: 335

2024-06-05 11:11:22.2061078 [V:onnxruntime:, session_state.cc:1154 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Concat (Concat_25)

2024-06-05 11:11:22.8286509 [V:onnxruntime:, session_state.cc:1152 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Node(s) placed on [CPUExecutionProvider]. Number of nodes: 9
2024-06-05 11:11:22.8322004 [V:onnxruntime:, session_state.cc:1154 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Concat (Concat_7)

For example take “Concat ” node/operator, I believe this node is supported on DML(Concat_25 - is placed on DML), then why Concat_7 instance of this node is placed on CPU

Why the few node instances are placed on CPU, even though DML have support for those nodes?

Here I mentioned Concat node as an example, in the full log I'm seeing the same behavior with other nodes Gather, Squeeze, Unsqueeze etc...

I expect, with provider=“DmlExecutionProvider” option, all nodes should be placed on DML only (exception - if there is no native support on DML for a particular node). But in the above case, all the nodes placed on CPU, support is present on DML

How can I force all nodes to be placed on DML? If the nodes got distributed b/w CPU and DML, I expect some overhead due to data transfer b/w CPU and DML

Thanks,

Describe scenario use case

Trying the run the hugging face optimum model on GPU/DML with all noes on DML

@Jose17-ml Jose17-ml added the feature request request for unsupported feature or enhancement label Jun 12, 2024
@github-actions github-actions bot added ep:DML issues related to the DirectML execution provider model:transformer issues related to a transformer model: BERT, GPT2, Hugging Face, Longformer, T5, etc. platform:windows issues related to the Windows platform labels Jun 12, 2024
@sophies927
Copy link
Contributor

@smk2007 can you take a look?

@Jose17-ml
Copy link
Author

Hi @smk2007, did you get a chance to look into the above issue?

@Jose17-ml
Copy link
Author

Hi,

Can someone please check this and update?

Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ep:DML issues related to the DirectML execution provider feature request request for unsupported feature or enhancement model:transformer issues related to a transformer model: BERT, GPT2, Hugging Face, Longformer, T5, etc. platform:windows issues related to the Windows platform
Projects
None yet
Development

No branches or pull requests

2 participants