ONNX mmdet converted model runtime CUDNN error

Created an ONNX model for mmdet faster_rcnn_r50_fpn_1x_coco model both as onnx_static and onnx_dynamic,

Creating the model works, and testing on CPU works,

When testing the model on GPU, with onnxruntime-gpu==1.8.1, both produce CUDNN error:

[E:onnxruntime:Default, cuda_call.cc:117 CudaCall] CUDNN failure 4: CUDNN_STATUS_INTERNAL_ERROR ; GPU=0 ; hostname=connor ; expr=cudnnFindConvolutionForwardAlgorithmEx( CudnnHandle(), s_.x_tensor, s_.x_data, s_.w_desc, s_.w_data, s_.conv_desc, s_.y_tensor, s_.y_data, 1, &algo_count, &perf, algo_search_workspace.get(), AlgoSearchWorkspaceSize); 
2022-06-02 15:00:34.964696798 [E:onnxruntime:, sequential_executor.cc:339 Execute] Non-zero status code returned while running FusedConv node. Name:'Conv_0_Relu_1' Status Message: CUDNN error executing cudnnFindConvolutionForwardAlgorithmEx( CudnnHandle(), s_.x_tensor, s_.x_data, s_.w_desc, s_.w_data, s_.conv_desc, s_.y_tensor, s_.y_data, 1, &algo_count, &perf, algo_search_workspace.get(), AlgoSearchWorkspaceSize)
2022-06-02 15:00:34.964740210 [E:onnxruntime:Default, cuda_call.cc:117 CudaCall] CUDA failure 700: an illegal memory access was encountered ; GPU=0 ; hostname=connor ; expr=cudaEventRecord(current_deferred_release_event, static_cast<cudaStream_t>(GetComputeStream())); 
Traceback (most recent call last):
  File "main.py", line 80, in <module>
    main()
  File "main.py", line 63, in main
    bbox_xyxy, cls_conf, cls_ids = inference_model(model, img)
  File "main.py", line 10, in inference_model
    bbox_result = model([img])
  File "/home/connor/Documents/github/mmdeploy-test-onnx/model.py", line 107, in __call__
    outputs = self._forward({'input': input_img})
  File "/home/connor/Documents/github/mmdeploy-test-onnx/model.py", line 43, in _forward
    self.ort_session.run_with_iobinding(self.io_binding)
  File "/home/connor/anaconda3/envs/onnx/lib/python3.7/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 229, in run_with_iobinding
    self._sess.run_with_iobinding(iobinding._iobinding, run_options)
RuntimeError: Error in execution: Non-zero status code returned while running FusedConv node. Name:'Conv_0_Relu_1' Status Message: CUDNN error executing cudnnFindConvolutionForwardAlgorithmEx( CudnnHandle(), s_.x_tensor, s_.x_data, s_.w_desc, s_.w_data, s_.conv_desc, s_.y_tensor, s_.y_data, 1, &algo_count, &perf, algo_search_workspace.get(), AlgoSearchWorkspaceSize)
Aborted (core dumped)

The inferencing is the same as in mmdeploy

Env.

pytorch 1.11.0
cudatoolkit  11.3
cudnn 8.2.1
onnxruntime-gpu  1.8.1

Is this an Issue with how the mmdetection model was created or onnxruntime-gpu specific issue? Any help is appreciated, thank you

Issue Analytics

State:
Created a year ago
Reactions:1
Comments:10

Top GitHub Comments

1reaction

connor-johncommented, Jun 19, 2022

Thanks @tpoisonooo for your help,

Using your recommended env did help,

I was able to make it work in my old environment after getting similar fails in new env,

Found that the issue was input tensor wasn’t being moved to GPU device in my test inference code, this shouldnt affect anyone else, just noting down incase someone else has similar issue

1reaction

tpoisonooocommented, Jun 3, 2022

My recommendation:

Still using old env

pytorch 1.11.0
cudatoolkit  11.3
cudnn 8.2.1

Open $WORK_DIR, there is an end2end.onnx.

Install a new ort-gpu version, for example 1.11.x

Unit test ort-gpu inference with end2end.onnx

Top Results From Across the Web

Tutorial 8: Pytorch to ONNX (Experimental)

How to convert models from Pytorch to ONNX ... operators for ONNX Runtime and install MMCV manually following How to build custom operators...

Cudnn Error in enqueue: 3 When try to convert ONNX to TRT ...

Converting the model to TRT engine in FP16 is successful. This error is only raised in converting INT8. I checked the code, it...

faster_rcnn_r50 pretrained converted to ONNX hosted in ...

I am using that onnx file to host the model in NVIDIA triton model server. fasterrcnn_model | 1 | READY. The model summary...

onnx export of transpose for tensor of unknown rank. - You ...

When exporting a model with dynamic_axes , I get an error Unsupported: ONNX export of transpose for tensor of unknown rank. although the...

Exporting MMDetection models to ONNX format - Medium

In order to export a model file created in version 1 format to ONNX format, it must first be converted to version 2...

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Start Free

Top Related Reddit Thread

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

ONNX mmdet converted model runtime CUDNN error

Issue Analytics

Top GitHub Comments

Top Results From Across the Web

Top Related Medium Post

Top Related StackOverflow Question

Troubleshoot Live Code

Top Related Reddit Thread

Top Related Hackernoon Post

Top Related Tweet

Top Related Dev.to Post

Top Related Hashnode Post