Meet RuntimeError in mmdet3d/ops/spconv/src/indice_cuda.cu

See original GitHub issue

Describe the bug I follow the install.md and run pip install -v -e . without errors. However, when I train those models which use spconv, I get an error. It seems like there is something wrong with my spconv.

Reproduction

  1. Did you make any modifications on the code or config? Did you understand what you have modified? No.

Environment

  1. Please run python mmdet/utils/collect_env.py to collect necessary environment infomation and paste it here. `Python: 3.6.10 |Anaconda, Inc.| (default, May 8 2020, 02:54:21) [GCC 7.3.0] CUDA available: True CUDA_HOME: /usr/local/cuda-10.2 NVCC: Cuda compilation tools, release 10.2, V10.2.89 GPU 0,1,2,3: GeForce RTX 2080 Ti GCC: gcc (GCC) 5.4.0 PyTorch: 1.5.0 PyTorch compiling details: PyTorch built with:
  • GCC 7.3
  • C++ Version: 201402
  • Intel® Math Kernel Library Version 2020.0.1 Product Build 20200208 for Intel® 64 architecture applications
  • Intel® MKL-DNN v0.21.1 (Git Hash 7d2fd500bc78936d1d648ca713b901012f470dbc)
  • OpenMP 201511 (a.k.a. OpenMP 4.5)
  • NNPACK is enabled
  • CPU capability usage: AVX2
  • CUDA Runtime 10.2
  • NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_37,code=compute_37
  • CuDNN 7.6.5
  • Magma 2.5.2
  • Build settings: BLAS=MKL, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -fopenmp -DNDEBUG -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DUSE_INTERNAL_THREADPOOL_IMPL -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_STATIC_DISPATCH=OFF,

TorchVision: 0.6.0a0+82fd1c8 OpenCV: 4.3.0 MMCV: 1.0.2 MMDetection: 2.3.0rc0+af33f11 MMDetection Compiler: GCC 7.3 MMDetection CUDA Compiler: 10.2`

Error traceback

 File "/home/xx/anaconda3/envs/jio2pt1.5/lib/python3.6/site-packages/torch/nn/modules/module.py", line 550, in __call__
    input = module(input)
  File "/home/xx/anaconda3/envs/jio2pt1.5/lib/python3.6/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/xx/fei2_workspace/mmdetection3d/tools/../mmdet3d/ops/spconv/conv.py", line 168, in forward
    result = self.forward(*input, **kwargs)
    result = self.forward(*input, **kwargs)
  File "/home/xx/fei2_workspace/mmdetection3d/tools/../mmdet3d/ops/spconv/conv.py", line 168, in forward
  File "/home/xx/fei2_workspace/mmdetection3d/tools/../mmdet3d/ops/spconv/conv.py", line 168, in forward
    result = self.forward(*input, **kwargs)
  File "/home/xx/fei2_workspace/mmdetection3d/tools/../mmdet3d/ops/spconv/conv.py", line 168, in forward
    grid=input.grid)
  File "/home/xx/fei2_workspace/mmdetection3d/tools/../mmdet3d/ops/spconv/ops.py", line 94, in get_indice_pairs
    grid=input.grid)
  File "/home/xx/fei2_workspace/mmdetection3d/tools/../mmdet3d/ops/spconv/ops.py", line 94, in get_indice_pairs
    grid=input.grid)
    grid=input.grid)
  File "/home/xx/fei2_workspace/mmdetection3d/tools/../mmdet3d/ops/spconv/ops.py", line 94, in get_indice_pairs
  File "/home/xx/fei2_workspace/mmdetection3d/tools/../mmdet3d/ops/spconv/ops.py", line 94, in get_indice_pairs
    int(transpose))
    int(transpose))
RuntimeError: /home/xx/fei2_workspace/mmdetection3d/mmdet3d/ops/spconv/src/indice_cuda.cu 124
cuda execution failed with error 2
    int(transpose))
RuntimeError: /home/xx/fei2_workspace/mmdetection3d/mmdet3d/ops/spconv/src/indice_cuda.cu 124
cuda execution failed with error 2
RuntimeError: /home/xx/fei2_workspace/mmdetection3d/mmdet3d/ops/spconv/src/indice_cuda.cu 124
cuda execution failed with error 2
    int(transpose))
RuntimeError: /home/xx/fei2_workspace/mmdetection3d/mmdet3d/ops/spconv/src/indice_cuda.cu 124
cuda execution failed with error 2

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:11 (2 by maintainers)

github_iconTop GitHub Comments

3reactions
liuzili97commented, Jul 16, 2020

Hi @liuzili97 , Thanks for your bug report. There are several things we need to know to accelerate the debugging process.

  1. Did you modify the code of spconv?
  2. What config are you using?
  3. Can you check whether your PyTorch could run CUDA operation successfully on your GPU? We never met this error before. From previous experience, it is possible that 1) the indices are not correct due to some reason; 2) the compilation arch is not correct; 3) some unknown things happen.

Thanks for your reply!

  1. I didn’t.
  2. Pointpillar works well, but all other models which use spconv don’t work and raise the error. For example, I can train a pointpillar using configs/pointpillars/hv_pointpillars_secfpn_6x8_160ex2_kitti-3d-3class.py on GPU, but I’ll meet the error when training other models like parta2 using configs/parta2/hv_PartA2_secfpn_2x8_cyclic_80e_kitti-3d-3class.py
  3. I think CUDA works well since the pointpillar can be trained on my GPU.

I have tried deleting build and mmdet3d.egg-info and re-run pip install -v -e .. Still, no error is raised in compiling. However, the cuda problem still exists.

0reactions
rockywindcommented, Jul 29, 2021

@liuzili97 What’s the implementation of the function “check_mem”?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Meet RuntimeError in mmdet3d/ops/spconv/src/indice_cuda.cu
BTW, the referenced issue is simply an "out of memory" error caused by adding codes unrelated to the codebase. You can check whether...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found