ONNX mmdet converted model runtime CUDNN error
See original GitHub issueCreated an ONNX model for mmdet faster_rcnn_r50_fpn_1x_coco model both as onnx_static and onnx_dynamic,
Creating the model works, and testing on CPU works,
When testing the model on GPU, with onnxruntime-gpu==1.8.1, both produce CUDNN error:
[E:onnxruntime:Default, cuda_call.cc:117 CudaCall] CUDNN failure 4: CUDNN_STATUS_INTERNAL_ERROR ; GPU=0 ; hostname=connor ; expr=cudnnFindConvolutionForwardAlgorithmEx( CudnnHandle(), s_.x_tensor, s_.x_data, s_.w_desc, s_.w_data, s_.conv_desc, s_.y_tensor, s_.y_data, 1, &algo_count, &perf, algo_search_workspace.get(), AlgoSearchWorkspaceSize);
2022-06-02 15:00:34.964696798 [E:onnxruntime:, sequential_executor.cc:339 Execute] Non-zero status code returned while running FusedConv node. Name:'Conv_0_Relu_1' Status Message: CUDNN error executing cudnnFindConvolutionForwardAlgorithmEx( CudnnHandle(), s_.x_tensor, s_.x_data, s_.w_desc, s_.w_data, s_.conv_desc, s_.y_tensor, s_.y_data, 1, &algo_count, &perf, algo_search_workspace.get(), AlgoSearchWorkspaceSize)
2022-06-02 15:00:34.964740210 [E:onnxruntime:Default, cuda_call.cc:117 CudaCall] CUDA failure 700: an illegal memory access was encountered ; GPU=0 ; hostname=connor ; expr=cudaEventRecord(current_deferred_release_event, static_cast<cudaStream_t>(GetComputeStream()));
Traceback (most recent call last):
File "main.py", line 80, in <module>
main()
File "main.py", line 63, in main
bbox_xyxy, cls_conf, cls_ids = inference_model(model, img)
File "main.py", line 10, in inference_model
bbox_result = model([img])
File "/home/connor/Documents/github/mmdeploy-test-onnx/model.py", line 107, in __call__
outputs = self._forward({'input': input_img})
File "/home/connor/Documents/github/mmdeploy-test-onnx/model.py", line 43, in _forward
self.ort_session.run_with_iobinding(self.io_binding)
File "/home/connor/anaconda3/envs/onnx/lib/python3.7/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 229, in run_with_iobinding
self._sess.run_with_iobinding(iobinding._iobinding, run_options)
RuntimeError: Error in execution: Non-zero status code returned while running FusedConv node. Name:'Conv_0_Relu_1' Status Message: CUDNN error executing cudnnFindConvolutionForwardAlgorithmEx( CudnnHandle(), s_.x_tensor, s_.x_data, s_.w_desc, s_.w_data, s_.conv_desc, s_.y_tensor, s_.y_data, 1, &algo_count, &perf, algo_search_workspace.get(), AlgoSearchWorkspaceSize)
Aborted (core dumped)
The inferencing is the same as in mmdeploy
Env.
pytorch 1.11.0
cudatoolkit 11.3
cudnn 8.2.1
onnxruntime-gpu 1.8.1
Is this an Issue with how the mmdetection model was created or onnxruntime-gpu specific issue? Any help is appreciated, thank you
Issue Analytics
- State:
- Created a year ago
- Reactions:1
- Comments:10
Top Results From Across the Web
Tutorial 8: Pytorch to ONNX (Experimental)
How to convert models from Pytorch to ONNX ... operators for ONNX Runtime and install MMCV manually following How to build custom operators...
Read more >Cudnn Error in enqueue: 3 When try to convert ONNX to TRT ...
Converting the model to TRT engine in FP16 is successful. This error is only raised in converting INT8. I checked the code, it...
Read more >faster_rcnn_r50 pretrained converted to ONNX hosted in ...
I am using that onnx file to host the model in NVIDIA triton model server. fasterrcnn_model | 1 | READY. The model summary...
Read more >onnx export of transpose for tensor of unknown rank. - You ...
When exporting a model with dynamic_axes , I get an error Unsupported: ONNX export of transpose for tensor of unknown rank. although the...
Read more >Exporting MMDetection models to ONNX format - Medium
In order to export a model file created in version 1 format to ONNX format, it must first be converted to version 2...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Thanks @tpoisonooo for your help,
Using your recommended env did help,
I was able to make it work in my old environment after getting similar fails in new env,
Found that the issue was input tensor wasn’t being moved to GPU device in my test inference code, this shouldnt affect anyone else, just noting down incase someone else has similar issue
My recommendation:
Open $WORK_DIR, there is an
end2end.onnx.Unit test ort-gpu inference with
end2end.onnx