RuntimeError: Function 'PowBackward0' returned nan values in its 0th output.
See original GitHub issueDuring the learning process, the following error occurs and learning is interrupted.
[TRAIN] Iter: 40300 Loss: 0.011321269907057285 PSNR: 23.059185028076172
11%|█████████████████████▎ 11%|█████████████████████▎ 11%|█████████████████████▎ 11%|█████████████████████▎ 11%|█████████████████████▎ 11%|█████████████████████▎ 11%|█████████████████████▎ 11%|█████████████████████▎ 11%|█████████████████████▎ 11%|█████████████████████▎ 11%|█████████████████████▎ 11%|█████████████████████▎ 11%|█████████████████████▎ 11%|█████████████████████▎ 11%|█████████████████████▎ 11%|█████████████████████▎ 11%|█████████████████████▎ 11%|█████████████████████▎ 11%|█████████████████████▎ 11%|█████████████████████▎ 11%|█████████████████████▎ 11%|█████████████████████▎ 11%|█████████████████████▎ 11%|█████████████████████▎ 11%|█████████████████████▎ 11%|█████████████████████▎ 11%|█████████████████████▎ 11%|█████████████████████▎ 11%|█████████████████████▎ 11%|█████████████████████▎ 11%|█████████████████████▎ 11%|█████████████████████▎ 11%|█████████████████████▎ 11%|█████████████████████▎ 11%|█████████████████████▎ 11%|█████████████████████▎ 11%|█████████████████████▎ 11%|█████████████████████▎ 11%|█████████████████████▎ 11%|█████████████████████▎ 11%|█████████████████████▎ 11%|█████████████████████▎ 11%|█████████████████████▎ 11%|█████████████████████▎ 11%|█████████████████████▎ 11%|█████████████████████▎ 11%|█████████████████████▎ 11%|█████████████████████▎ 11%|█████████████████████▎ 11%|█████████████████████▎ 11%|█████████████████████▎ 11%|█████████████████████▎ 11%|█████████████████████▎ 11%|█████████████████████▎ 11%|█████████████████████▎ 11%|█████████████████████▎ 11%|█████████████████████▎ | 20356/180001 [1:25:30<11:07:17, 3.99it/s][W python_anomaly_mode.cpp:104] Warning: Error detected in PowBackward0. Traceback of forward call that caused the error:
File "run_nerf.py", line 858, in <module>
train()
File "run_nerf.py", line 751, in train
img_loss0 = img2mse(extras['rgb0'], target_s)
File "/app/nerf/run_nerf_helpers.py", line 12, in <lambda>
img2mse = lambda x, y : torch.mean((x - y) ** 2)
(function _print_stack)
11%|█████████████████████▎ | 20356/180001 [1:25:30<11:10:36, 3.97it/s]
Traceback (most recent call last):
File "run_nerf.py", line 858, in <module>
train()
File "run_nerf.py", line 755, in train
loss.backward()
File "/opt/conda/lib/python3.8/site-packages/torch/tensor.py", line 245, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "/opt/conda/lib/python3.8/site-packages/torch/autograd/__init__.py", line 145, in backward
Variable._execution_engine.run_backward(
RuntimeError: Function 'PowBackward0' returned nan values in its 0th output.
Here’s my configuration.
expname = mydata_test
basedir = ./logs
datadir = ./data/nerf_llff_data/mydata
dataset_type = llff
factor = 8
llffhold = 8
N_rand = 1024
N_samples = 64
N_importance = 64
use_viewdirs = True
raw_noise_std = 1e0
Issue Analytics
- State:
- Created 2 years ago
- Comments:12 (1 by maintainers)
Top Results From Across the Web
Function 'PowBackward0' returned nan values in its 0th output
Hi, by using torch.autograd.set_detect_anomaly(True) Pytorch return this error: Function 'PowBackward0' returned nan values in its 0th output.
Read more >Pytorch loss is nan - Stack Overflow
RuntimeError : Function 'LogSoftmaxBackward0' returned nan values in its 0th output. So I tried debugging and found something strange.
Read more >Function 'LogSoftmaxBackward0' returned nan values in its ...
RuntimeError : Function 'LogSoftmaxBackward0' returned nan values in its 0th output. Why if NUM_OF_CELLS is increased from 8 to 16 ...
Read more >RuntimeError: Function 'MulBackward0' returned nan values ...
RuntimeError : Function 'MulBackward0' returned nan values in its 0th output. I pull the codes from gitlab and did not change them.
Read more >Debugging neural networks. 02–04–2019 - Benjamin Blundell
output = model(target, tpoints, w_mask, sigma) ... RuntimeError: Function 'ExpBackward' returned nan values in its 0th output.
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
I can confirm this problem is happening to me on https://github.com/apchenstu/mvsnerf, trying out with either the lego synthetic dataset, or the orchid llff dataset.
I’ll try to see how to make this reproducible.
If it is an error in the 0th output, that means your weights are still not fully updated so some values in some batch’s predictions , during your first epoch are nans. So it’s not your inputs, but your model predictions that are nans. Could be an overflow or underflow error. This will make any loss function give you a
tensor(nan).What you can do is put a check for when loss is nan and let the weights adjust themselves