Meet RuntimeError in mmdet3d/ops/spconv/src/indice_cuda.cu

See original GitHub issue

Describe the bug I follow the install.md and run pip install -v -e . without errors. However, when I train those models which use spconv, I get an error. It seems like there is something wrong with my spconv.

Reproduction

Did you make any modifications on the code or config? Did you understand what you have modified? No.

Environment

Please run python mmdet/utils/collect_env.py to collect necessary environment infomation and paste it here. `Python: 3.6.10 |Anaconda, Inc.| (default, May 8 2020, 02:54:21) [GCC 7.3.0] CUDA available: True CUDA_HOME: /usr/local/cuda-10.2 NVCC: Cuda compilation tools, release 10.2, V10.2.89 GPU 0,1,2,3: GeForce RTX 2080 Ti GCC: gcc (GCC) 5.4.0 PyTorch: 1.5.0 PyTorch compiling details: PyTorch built with:

GCC 7.3
C++ Version: 201402
Intel® Math Kernel Library Version 2020.0.1 Product Build 20200208 for Intel® 64 architecture applications
Intel® MKL-DNN v0.21.1 (Git Hash 7d2fd500bc78936d1d648ca713b901012f470dbc)
OpenMP 201511 (a.k.a. OpenMP 4.5)
NNPACK is enabled
CPU capability usage: AVX2
CUDA Runtime 10.2
NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_37,code=compute_37
CuDNN 7.6.5
Magma 2.5.2
Build settings: BLAS=MKL, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -fopenmp -DNDEBUG -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DUSE_INTERNAL_THREADPOOL_IMPL -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_STATIC_DISPATCH=OFF,

TorchVision: 0.6.0a0+82fd1c8 OpenCV: 4.3.0 MMCV: 1.0.2 MMDetection: 2.3.0rc0+af33f11 MMDetection Compiler: GCC 7.3 MMDetection CUDA Compiler: 10.2`

Error traceback

 File "/home/xx/anaconda3/envs/jio2pt1.5/lib/python3.6/site-packages/torch/nn/modules/module.py", line 550, in __call__
    input = module(input)
  File "/home/xx/anaconda3/envs/jio2pt1.5/lib/python3.6/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/xx/fei2_workspace/mmdetection3d/tools/../mmdet3d/ops/spconv/conv.py", line 168, in forward
    result = self.forward(*input, **kwargs)
    result = self.forward(*input, **kwargs)
  File "/home/xx/fei2_workspace/mmdetection3d/tools/../mmdet3d/ops/spconv/conv.py", line 168, in forward
  File "/home/xx/fei2_workspace/mmdetection3d/tools/../mmdet3d/ops/spconv/conv.py", line 168, in forward
    result = self.forward(*input, **kwargs)
  File "/home/xx/fei2_workspace/mmdetection3d/tools/../mmdet3d/ops/spconv/conv.py", line 168, in forward
    grid=input.grid)
  File "/home/xx/fei2_workspace/mmdetection3d/tools/../mmdet3d/ops/spconv/ops.py", line 94, in get_indice_pairs
    grid=input.grid)
  File "/home/xx/fei2_workspace/mmdetection3d/tools/../mmdet3d/ops/spconv/ops.py", line 94, in get_indice_pairs
    grid=input.grid)
    grid=input.grid)
  File "/home/xx/fei2_workspace/mmdetection3d/tools/../mmdet3d/ops/spconv/ops.py", line 94, in get_indice_pairs
  File "/home/xx/fei2_workspace/mmdetection3d/tools/../mmdet3d/ops/spconv/ops.py", line 94, in get_indice_pairs
    int(transpose))
    int(transpose))
RuntimeError: /home/xx/fei2_workspace/mmdetection3d/mmdet3d/ops/spconv/src/indice_cuda.cu 124
cuda execution failed with error 2
    int(transpose))
RuntimeError: /home/xx/fei2_workspace/mmdetection3d/mmdet3d/ops/spconv/src/indice_cuda.cu 124
cuda execution failed with error 2
RuntimeError: /home/xx/fei2_workspace/mmdetection3d/mmdet3d/ops/spconv/src/indice_cuda.cu 124
cuda execution failed with error 2
    int(transpose))
RuntimeError: /home/xx/fei2_workspace/mmdetection3d/mmdet3d/ops/spconv/src/indice_cuda.cu 124
cuda execution failed with error 2

Issue Analytics

State:
Created 3 years ago
Comments:11 (2 by maintainers)

Top GitHub Comments

3reactions

liuzili97commented, Jul 16, 2020

Hi @liuzili97 , Thanks for your bug report. There are several things we need to know to accelerate the debugging process.

Did you modify the code of spconv?

What config are you using?

Can you check whether your PyTorch could run CUDA operation successfully on your GPU? We never met this error before. From previous experience, it is possible that 1) the indices are not correct due to some reason; 2) the compilation arch is not correct; 3) some unknown things happen.

Thanks for your reply!

I didn’t.
Pointpillar works well, but all other models which use spconv don’t work and raise the error. For example, I can train a pointpillar using configs/pointpillars/hv_pointpillars_secfpn_6x8_160ex2_kitti-3d-3class.py on GPU, but I’ll meet the error when training other models like parta2 using configs/parta2/hv_PartA2_secfpn_2x8_cyclic_80e_kitti-3d-3class.py
I think CUDA works well since the pointpillar can be trained on my GPU.

I have tried deleting build and mmdet3d.egg-info and re-run pip install -v -e .. Still, no error is raised in compiling. However, the cuda problem still exists.

0reactions

rockywindcommented, Jul 29, 2021

@liuzili97 What’s the implementation of the function “check_mem”?