FloatingPointError: Predicted boxes or scores contain Inf/NaN. Training has diverged.

See original GitHub issue

Hi,

I’ve been running detectron2 using the tutorial colab book. today, while training using a dataset that has previously worked I got the following error:

`from detectron2.engine import DefaultTrainer from detectron2.config import get_cfg import os

cfg = get_cfg() cfg.merge_from_file(“./detectron2_repo/configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml”) cfg.DATASETS.TRAIN = (“3test”,) cfg.DATASETS.TEST = () # no metrics implemented for this dataset cfg.DATALOADER.NUM_WORKERS = 4 cfg.MODEL.WEIGHTS = “detectron2://COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x/137849600/model_final_f10217.pkl” # initialize from model zoo cfg.SOLVER.IMS_PER_BATCH = 2 cfg.SOLVER.BASE_LR = 0.1 cfg.SOLVER.MAX_ITER = 10000
cfg.MODEL.ROI_HEADS.BATCH_SIZE_PER_IMAGE = 100
cfg.MODEL.ROI_HEADS.NUM_CLASSES = 1 cfg.TEST.DETECTIONS_PER_IMAGE = 2000

os.makedirs(cfg.OUTPUT_DIR, exist_ok=True) trainer = DefaultTrainer(cfg) trainer.resume_or_load(resume=False) trainer.train()`

I get the following error after a few iterations:

FloatingPointError: Predicted boxes or scores contain Inf/NaN. Training has diverged.

I’d really appreciate any way of getting past this error.

Cheers,

Ed

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:10

github_iconTop GitHub Comments

10reactions
MiXaiLL76commented, Feb 5, 2021

I want to know whether you solve the problem or not?

set

num_gpu = 1
bs = (num_gpu * 2)
cfg.SOLVER.BASE_LR = 0.02 * bs / 16  # pick a good LR
7reactions
ppwwyyxxcommented, Mar 31, 2020

You probably need a smaller learning rate.

As the issue template mentions:

If you expect the model to converge / work better, note that we do not give suggestions on how to train a new model. Only in one of the two conditions we will help with it: (1) You’re unable to reproduce the results in detectron2 model zoo. (2) It indicates a detectron2 bug.

We provide configs & models with standard academic settings and expect users to have the knowledge to choose or design appropriate models & parameters for their own tasks.

Read more comments on GitHub >

github_iconTop Results From Across the Web

FloatingPointError: Predicted boxes or scores contain Inf/Nan ...
Error:FloatingPointError: Predicted boxes or scores contain Inf/Nan. Training has diverged. 经查阅,是learning_raye设置太大的原因,当时我的学习 ...
Read more >
Give randomCrop augmentation and loss become explode
And why it explode the loss when I gave RandomCrop ? FloatingPointError: Predicted boxes or scores contain Inf/NaN. Training has diverged.
Read more >
CVPR/regionclip-demo at main - Hugging Face
... if training: raise FloatingPointError( "Predicted boxes or scores contain Inf/NaN. Training has diverged." ) boxes = boxes[valid_mask] scores_per_img ...
Read more >
Dealing with NaNs and infs - Stable Baselines3 - Read the Docs
During the training of a model on a given environment, it is possible that the RL model becomes completely corrupted when a NaN...
Read more >
SIIM COVID19 | Kaggle
FloatingPointError : Predicted boxes or scores contain Inf/NaN. Training has diverged. [08/23 10:39:16 d2.engine.hooks]: Overall training speed: 3 iterations ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found