RuntimeError: sigmoid_focal_loss_forward_impl: implementation for device cuda:0 not found.
See original GitHub issueHi, I tried to train the model for LiDAR-only detector using this command:
torchpack dist-run -np 8 python tools/train.py configs/nuscenes/det/transfusion/secfpn/lidar/voxelnet_0p075.yaml
but got the following error. All the other training commands are working fine exept this one. Do I need to build any additional library? Any suggestion? Thanks.
Traceback (most recent call last):
File "tools/train.py", line 87, in <module>
main()
File "tools/train.py", line 76, in main
train_model(
File "/home/trainer/bevnet/mmdet3d/apis/train.py", line 126, in train_model
runner.run(data_loaders, [("train", 1)])
File "/usr/local/lib/python3.8/dist-packages/mmcv/runner/epoch_based_runner.py", line 127, in run
epoch_runner(data_loaders[i], **kwargs)
File "/home/trainer/bevnet/mmdet3d/runner/epoch_based_runner.py", line 14, in train
super().train(data_loader, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/mmcv/runner/epoch_based_runner.py", line 50, in train
self.run_iter(data_batch, train_mode=True, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/mmcv/runner/epoch_based_runner.py", line 29, in run_iter
outputs = self.model.train_step(data_batch, self.optimizer,
File "/usr/local/lib/python3.8/dist-packages/mmcv/parallel/distributed.py", line 52, in train_step
output = self.module.train_step(*inputs[0], **kwargs[0])
File "/home/trainer/bevnet/mmdet3d/models/fusion_models/base.py", line 78, in train_step
losses = self(**data)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/mmcv/runner/fp16_utils.py", line 128, in new_func
output = old_func(*new_args, **new_kwargs)
File "/home/trainer/bevnet/mmdet3d/models/fusion_models/bevfusion.py", line 187, in forward
outputs = self.forward_single(
File "/usr/local/lib/python3.8/dist-packages/mmcv/runner/fp16_utils.py", line 128, in new_func
output = old_func(*new_args, **new_kwargs)
File "/home/trainer/bevnet/mmdet3d/models/fusion_models/bevfusion.py", line 269, in forward_single
losses = head.loss(gt_bboxes_3d, gt_labels_3d, pred_dict)
File "/usr/local/lib/python3.8/dist-packages/mmcv/runner/fp16_utils.py", line 214, in new_func
output = old_func(*new_args, **new_kwargs)
File "/home/trainer/bevnet/mmdet3d/models/heads/bbox/transfusion.py", line 645, in loss
layer_loss_cls = self.loss_cls(
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/mmdet/models/losses/focal_loss.py", line 233, in forward
loss_cls = self.loss_weight * calculate_loss_func(
File "/usr/local/lib/python3.8/dist-packages/mmdet/models/losses/focal_loss.py", line 139, in sigmoid_focal_loss
loss = _sigmoid_focal_loss(pred.contiguous(), target.contiguous(), gamma,
File "/usr/local/lib/python3.8/dist-packages/mmcv/ops/focal_loss.py", line 55, in forward
ext_module.sigmoid_focal_loss_forward(
RuntimeError: sigmoid_focal_loss_forward_impl: implementation for device cuda:0 not found.
Issue Analytics
- State:
- Created 10 months ago
- Comments:12 (5 by maintainers)
Top Results From Across the Web
RuntimeError: nms_impl: implementation for device cuda:0 not ...
When I run /usr/src/app/demo/inference_demo.ipynb, an error reported: /usr/src/app/mmdet/datasets/utils.py:65: UserWarning: "ImageToTensor" ...
Read more >Why torch.device('cuda', 0) is not working and ... - Stack Overflow
torch.device('cuda', 0) Found GPU%d %s which is of cuda capability %d.%d. PyTorch no longer supports this GPU because it is too old.
Read more >Loading a model + RuntimeError: Expected all tensors to be ...
RuntimeError : Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! The above error...
Read more >RuntimeError: nms_impl: implementation for device cuda:0 not ...
RuntimeError : nms_impl: implementation for device cuda:0 not found.
Read more >RuntimeError: Expected all tensors to be on the same device ...
Hi! I am encountering problems when trying to send my graph to device for prediction. I do the following: device = torch.device("cuda:0" if ......
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
I will try that.
Please let me know if the solution will not work. We are doing a major reformat on this codebase internally to remove unnecessary dependencies on mmcv/mmdet and make the installation process easier, but this reformat process might take relatively long time (months).