Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDNN_STATUS_MAPPING_ERROR when running directly after TensorRT #116

Open
ghost opened this issue Jan 31, 2021 · 2 comments
Open

CUDNN_STATUS_MAPPING_ERROR when running directly after TensorRT #116

ghost opened this issue Jan 31, 2021 · 2 comments

Comments

@ghost
Copy link

ghost commented Jan 31, 2021

How can I run kmcuda synchronously after a tensorRT model performs inference on the same GPU (in a loop)?

For instance, I already am allocating pagelocked buffers for my tensorRT model, but I don't explicitly allocate anything upfront for kmeans_cuda to run on. Doesn't that mean there might be a conflict if both processes are accessing the GPU and don't totally "cleanup" after themselves?

The error I get the next time tensorRT runs (only after kmcuda runs):

[TensorRT] ERROR: ../rtSafe/safeContext.cpp (133) - Cudnn Error in configure: 7 (CUDNN_STATUS_MAPPING_ERROR)
[TensorRT] ERROR: FAILED_EXECUTION: std::exception
reported here: NVIDIA/TensorRT#303

So I guess in general my question is how should/can I cleanup after kmcuda runs? The reason I think some how preallocating buffers would help is because a very similar SO issue reported that as the solution (for tensorflow and tensorRT on the same GPU)

Environment:

nvcr.io/nvidia/l4t-base:r32.4.4
cuda-10.2
tensorRT 7.1.3

@ghost
Copy link
Author

ghost commented Jan 31, 2021

What I do know is this problem can be solved with isolating tensorRT from kmeans_cuda.

Here's how I've hackily fixed it:
I simply run the tensorRT inference (with all it's pagelocked allocation, engine, stream, context, etc.) in one thread and then run kmeans_cuda in a separate thread. A thread-safe queue is used to pass the inference results through to the other thread that runs kmeans. There - isolation! No more errors.

But I have no idea why this works, and it feels extremely hacky. Are devs willing to comment on best practices and caveats to running kmeans_cuda synchronously with other calls to the GPU (using tensorRT or otherwise)?

@futureisatyourhand
Copy link

futureisatyourhand commented Sep 30, 2022

I also encountered the same problem, but I loaded two trt models at the same time.
My method is:
first, mapping the torch2trt inc and lib paths to the include and lib paths corresponding to tensorrt(e.g, TensorRT 8.2.3,TensorRT 7.1);
then, separating the two trt models initializing with two classes,respectively.
Finally, I use one class to use two model class respectively .
Note: when you execute every forward or call,you need add the code(torch.cuda.set_device('cuda:0')).
My problem is solved, and the stress test also passed.
My method is succeed for these environments(TensorRT 7.1.2 and torch2trt 0.3.0, TensorRT 8.2.3 and torch2trt-0.4.0).

I think these is a resource and GPU contention issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant