RuntimeError: sigmoid_focal_loss_forward_impl: implementation for device cuda:0 not found.

See original GitHub issue

Hi, I tried to train the model for LiDAR-only detector using this command:

torchpack dist-run -np 8 python tools/train.py configs/nuscenes/det/transfusion/secfpn/lidar/voxelnet_0p075.yaml

but got the following error. All the other training commands are working fine exept this one. Do I need to build any additional library? Any suggestion? Thanks.

Traceback (most recent call last):
  File "tools/train.py", line 87, in <module>
    main()
  File "tools/train.py", line 76, in main
    train_model(
  File "/home/trainer/bevnet/mmdet3d/apis/train.py", line 126, in train_model
    runner.run(data_loaders, [("train", 1)])
  File "/usr/local/lib/python3.8/dist-packages/mmcv/runner/epoch_based_runner.py", line 127, in run
    epoch_runner(data_loaders[i], **kwargs)
  File "/home/trainer/bevnet/mmdet3d/runner/epoch_based_runner.py", line 14, in train
    super().train(data_loader, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/mmcv/runner/epoch_based_runner.py", line 50, in train
    self.run_iter(data_batch, train_mode=True, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/mmcv/runner/epoch_based_runner.py", line 29, in run_iter
    outputs = self.model.train_step(data_batch, self.optimizer,
  File "/usr/local/lib/python3.8/dist-packages/mmcv/parallel/distributed.py", line 52, in train_step
    output = self.module.train_step(*inputs[0], **kwargs[0])
  File "/home/trainer/bevnet/mmdet3d/models/fusion_models/base.py", line 78, in train_step
    losses = self(**data)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/mmcv/runner/fp16_utils.py", line 128, in new_func
    output = old_func(*new_args, **new_kwargs)
  File "/home/trainer/bevnet/mmdet3d/models/fusion_models/bevfusion.py", line 187, in forward
    outputs = self.forward_single(
  File "/usr/local/lib/python3.8/dist-packages/mmcv/runner/fp16_utils.py", line 128, in new_func
    output = old_func(*new_args, **new_kwargs)
  File "/home/trainer/bevnet/mmdet3d/models/fusion_models/bevfusion.py", line 269, in forward_single
    losses = head.loss(gt_bboxes_3d, gt_labels_3d, pred_dict)
  File "/usr/local/lib/python3.8/dist-packages/mmcv/runner/fp16_utils.py", line 214, in new_func
    output = old_func(*new_args, **new_kwargs)
  File "/home/trainer/bevnet/mmdet3d/models/heads/bbox/transfusion.py", line 645, in loss
    layer_loss_cls = self.loss_cls(
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/mmdet/models/losses/focal_loss.py", line 233, in forward
    loss_cls = self.loss_weight * calculate_loss_func(
  File "/usr/local/lib/python3.8/dist-packages/mmdet/models/losses/focal_loss.py", line 139, in sigmoid_focal_loss
    loss = _sigmoid_focal_loss(pred.contiguous(), target.contiguous(), gamma,
  File "/usr/local/lib/python3.8/dist-packages/mmcv/ops/focal_loss.py", line 55, in forward
    ext_module.sigmoid_focal_loss_forward(
RuntimeError: sigmoid_focal_loss_forward_impl: implementation for device cuda:0 not found.

Issue Analytics

  • State:open
  • Created 10 months ago
  • Comments:12 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
YoushaaMurhijcommented, Nov 16, 2022

I will try that.

0reactions
kentang-mitcommented, Dec 10, 2022

Please let me know if the solution will not work. We are doing a major reformat on this codebase internally to remove unnecessary dependencies on mmcv/mmdet and make the installation process easier, but this reformat process might take relatively long time (months).

Read more comments on GitHub >

github_iconTop Results From Across the Web

RuntimeError: nms_impl: implementation for device cuda:0 not ...
When I run /usr/src/app/demo/inference_demo.ipynb, an error reported: /usr/src/app/mmdet/datasets/utils.py:65: UserWarning: "ImageToTensor" ...
Read more >
Why torch.device('cuda', 0) is not working and ... - Stack Overflow
torch.device('cuda', 0) Found GPU%d %s which is of cuda capability %d.%d. PyTorch no longer supports this GPU because it is too old.
Read more >
Loading a model + RuntimeError: Expected all tensors to be ...
RuntimeError : Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! The above error...
Read more >
RuntimeError: nms_impl: implementation for device cuda:0 not ...
RuntimeError : nms_impl: implementation for device cuda:0 not found.
Read more >
RuntimeError: Expected all tensors to be on the same device ...
Hi! I am encountering problems when trying to send my graph to device for prediction. I do the following: device = torch.device("cuda:0" if ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found