ONNX mmdet converted model runtime CUDNN error

See original GitHub issue

Created an ONNX model for mmdet faster_rcnn_r50_fpn_1x_coco model both as onnx_static and onnx_dynamic,

Creating the model works, and testing on CPU works,

When testing the model on GPU, with onnxruntime-gpu==1.8.1, both produce CUDNN error:

[E:onnxruntime:Default, cuda_call.cc:117 CudaCall] CUDNN failure 4: CUDNN_STATUS_INTERNAL_ERROR ; GPU=0 ; hostname=connor ; expr=cudnnFindConvolutionForwardAlgorithmEx( CudnnHandle(), s_.x_tensor, s_.x_data, s_.w_desc, s_.w_data, s_.conv_desc, s_.y_tensor, s_.y_data, 1, &algo_count, &perf, algo_search_workspace.get(), AlgoSearchWorkspaceSize); 
2022-06-02 15:00:34.964696798 [E:onnxruntime:, sequential_executor.cc:339 Execute] Non-zero status code returned while running FusedConv node. Name:'Conv_0_Relu_1' Status Message: CUDNN error executing cudnnFindConvolutionForwardAlgorithmEx( CudnnHandle(), s_.x_tensor, s_.x_data, s_.w_desc, s_.w_data, s_.conv_desc, s_.y_tensor, s_.y_data, 1, &algo_count, &perf, algo_search_workspace.get(), AlgoSearchWorkspaceSize)
2022-06-02 15:00:34.964740210 [E:onnxruntime:Default, cuda_call.cc:117 CudaCall] CUDA failure 700: an illegal memory access was encountered ; GPU=0 ; hostname=connor ; expr=cudaEventRecord(current_deferred_release_event, static_cast<cudaStream_t>(GetComputeStream())); 
Traceback (most recent call last):
  File "main.py", line 80, in <module>
    main()
  File "main.py", line 63, in main
    bbox_xyxy, cls_conf, cls_ids = inference_model(model, img)
  File "main.py", line 10, in inference_model
    bbox_result = model([img])
  File "/home/connor/Documents/github/mmdeploy-test-onnx/model.py", line 107, in __call__
    outputs = self._forward({'input': input_img})
  File "/home/connor/Documents/github/mmdeploy-test-onnx/model.py", line 43, in _forward
    self.ort_session.run_with_iobinding(self.io_binding)
  File "/home/connor/anaconda3/envs/onnx/lib/python3.7/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 229, in run_with_iobinding
    self._sess.run_with_iobinding(iobinding._iobinding, run_options)
RuntimeError: Error in execution: Non-zero status code returned while running FusedConv node. Name:'Conv_0_Relu_1' Status Message: CUDNN error executing cudnnFindConvolutionForwardAlgorithmEx( CudnnHandle(), s_.x_tensor, s_.x_data, s_.w_desc, s_.w_data, s_.conv_desc, s_.y_tensor, s_.y_data, 1, &algo_count, &perf, algo_search_workspace.get(), AlgoSearchWorkspaceSize)
Aborted (core dumped)

The inferencing is the same as in mmdeploy

Env.

pytorch 1.11.0
cudatoolkit  11.3
cudnn 8.2.1
onnxruntime-gpu  1.8.1

Is this an Issue with how the mmdetection model was created or onnxruntime-gpu specific issue? Any help is appreciated, thank you

Issue Analytics

  • State:closed
  • Created a year ago
  • Reactions:1
  • Comments:10

github_iconTop GitHub Comments

1reaction
connor-johncommented, Jun 19, 2022

Thanks @tpoisonooo for your help,

Using your recommended env did help,

I was able to make it work in my old environment after getting similar fails in new env,

Found that the issue was input tensor wasn’t being moved to GPU device in my test inference code, this shouldnt affect anyone else, just noting down incase someone else has similar issue

1reaction
tpoisonooocommented, Jun 3, 2022

My recommendation:

  • Still using old env
pytorch 1.11.0
cudatoolkit  11.3
cudnn 8.2.1

Open $WORK_DIR, there is an end2end.onnx.

  • Install a new ort-gpu version, for example 1.11.x

Unit test ort-gpu inference with end2end.onnx

Read more comments on GitHub >

github_iconTop Results From Across the Web

Tutorial 8: Pytorch to ONNX (Experimental)
How to convert models from Pytorch to ONNX ... operators for ONNX Runtime and install MMCV manually following How to build custom operators...
Read more >
Cudnn Error in enqueue: 3 When try to convert ONNX to TRT ...
Converting the model to TRT engine in FP16 is successful. This error is only raised in converting INT8. I checked the code, it...
Read more >
faster_rcnn_r50 pretrained converted to ONNX hosted in ...
I am using that onnx file to host the model in NVIDIA triton model server. fasterrcnn_model | 1 | READY. The model summary...
Read more >
onnx export of transpose for tensor of unknown rank. - You ...
When exporting a model with dynamic_axes , I get an error Unsupported: ONNX export of transpose for tensor of unknown rank. although the...
Read more >
Exporting MMDetection models to ONNX format - Medium
In order to export a model file created in version 1 format to ONNX format, it must first be converted to version 2...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found