Failed to determine best cudnn convolution algorithm/No GPU/TPU found

See original GitHub issue

RTX3080 / cuda11.1/cudnn 8.2.1/ubuntu16.04

This problem occurs in jaxlib-0.1.72+cuda111. When I update to 0.1.74, it will disappear. However, in 0.1.74, Jax cannot detect the existence of GPU, and tensorflow can

Therefore, whether I use 0.1.72 or 0.1.74, there is always a problem with me

`RuntimeError: UNKNOWN: Failed to determine best cudnn convolution algorithm: INTERNAL: All algorithms tried for %custom-call.1 = (f32[1,112,112,64]{2,1,3,0}, u8[0]{0}) custom-call(f32[1,229,229,3]{2,1,3,0} %pad, f32[7,7,3,64]{1,0,2,3} %copy.4), window={size=7x7 stride=2x2}, dim_labels=b01f_01io->b01f, custom_call_target=“__cudnn$convForward”, metadata={op_type=“conv_general_dilated” op_name=“jit(conv_general_dilated)/conv_general_dilated[\n batch_group_count=1\n dimension_numbers=ConvDimensionNumbers(lhs_spec=(0, 3, 1, 2), rhs_spec=(3, 2, 0, 1), out_spec=(0, 3, 1, 2))\n feature_group_count=1\n lhs_dilation=(1, 1)\n lhs_shape=(1, 224, 224, 3)\n padding=((2, 3), (2, 3))\n precision=None\n preferred_element_type=None\n rhs_dilation=(1, 1)\n rhs_shape=(7, 7, 3, 64)\n window_strides=(2, 2)\n]” source_file=“/media/node/Materials/anaconda3/envs/xmcgan/lib/python3.9/site-packages/flax/linen/linear.py” source_line=282}, backend_config=“{"algorithm":"0","tensor_ops_enabled":false,"conv_result_scale":1,"activation_mode":"0","side_input_scale":0}” failed. Falling back to default algorithm.

Convolution performance may be suboptimal. To ignore this failure and try to use a fallback algorithm, use XLA_FLAGS=–xla_gpu_strict_conv_algorithm_picker=false. Please also file a bug for the root cause of failing autotuning. `

Issue Analytics

  • State:open
  • Created 2 years ago
  • Reactions:12
  • Comments:12 (2 by maintainers)

github_iconTop GitHub Comments

11reactions
half-potatocommented, Mar 16, 2022

Turns out it was an OOM error, just a bad error message. Solution is in #8506. use the environment flag XLA_PYTHON_CLIENT_MEM_FRACTION=0.87. It appears that there is some kind of issue with how jax.scipy.signal.convolve2d handles preallocated memory. I believe it would be nice to have a better error message for this.

7reactions
ross-Hrcommented, Jan 4, 2022

Do you fix the error ?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Failed to get convolution algorithm. This is probably because ...
If using Conda environments, in my case the issue was solved by installing tensorflow-gpu and not CUDAtoolkit nor cuDNN because they are already ......
Read more >
CUDNN ERROR: Failed to get convolution algorithm
I am attempting to install the OS-agnostic version of the most recent NCCL. This is bringing a new error: ldconfig lists NCCL, but...
Read more >
Failed to get convolution algorithm. This is probably because ...
Everuting runs fine without the GPU accelerator. Tried a lot downloaded some \cudnn-10.0-windows10-x64-v7.3.1.20.zip and did the manual coy past ...
Read more >
Failed to determine best cudnn convolution ... - Issues Antenna
UNKNOWN: Failed to determine best cudnn convolution algorithm: UNKNOWN: GetConvolveAlgorithms failed. I am trying to run the code locally on a device with...
Read more >
CUDNN ERROR: Failed to get convolution ... - Newbedev
You have cache issues I regularly work around this error by shutting down. ... I'd go back and set up CUDA + TensorFlow...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found