FloatingPointError: Predicted boxes or scores contain Inf/NaN. Training has diverged.
See original GitHub issueHi,
I’ve been running detectron2 using the tutorial colab book. today, while training using a dataset that has previously worked I got the following error:
`from detectron2.engine import DefaultTrainer from detectron2.config import get_cfg import os
cfg = get_cfg()
cfg.merge_from_file(“./detectron2_repo/configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml”)
cfg.DATASETS.TRAIN = (“3test”,)
cfg.DATASETS.TEST = () # no metrics implemented for this dataset
cfg.DATALOADER.NUM_WORKERS = 4
cfg.MODEL.WEIGHTS = “detectron2://COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x/137849600/model_final_f10217.pkl” # initialize from model zoo
cfg.SOLVER.IMS_PER_BATCH = 2
cfg.SOLVER.BASE_LR = 0.1
cfg.SOLVER.MAX_ITER = 10000
cfg.MODEL.ROI_HEADS.BATCH_SIZE_PER_IMAGE = 100
cfg.MODEL.ROI_HEADS.NUM_CLASSES = 1
cfg.TEST.DETECTIONS_PER_IMAGE = 2000
os.makedirs(cfg.OUTPUT_DIR, exist_ok=True) trainer = DefaultTrainer(cfg) trainer.resume_or_load(resume=False) trainer.train()`
I get the following error after a few iterations:
FloatingPointError: Predicted boxes or scores contain Inf/NaN. Training has diverged.
I’d really appreciate any way of getting past this error.
Cheers,
Ed
Issue Analytics
- State:
- Created 3 years ago
- Comments:10
Top Related StackOverflow Question
set
You probably need a smaller learning rate.
As the issue template mentions:
We provide configs & models with standard academic settings and expect users to have the knowledge to choose or design appropriate models & parameters for their own tasks.