Some weights of {} were not initialized from the model checkpoint

See original GitHub issue

I keep failing to load model checkpoint. I built a model inheriting PreTrainedModel and have roberta inside initialization. Training this model with trainer works fine, but when I try to load the checkpoint using from_pretrained, it keeps failing to load the checkpoint. Can someone help me out? Thanks

Structure of my model

class MaskClassifier(PreTrainedModel):
    def __init__(self, config, path):
        super().__init__(config=config)
        self.roberta = RobertaModel.from_pretrained(path)
        self.max_mask = 10
        self.hidden_size = RobertaConfig().hidden_size
        self.linear1 = torch.nn.Linear(2 * self.hidden_size, self.hidden_size)
        self.linear2 = torch.nn.Linear(self.hidden_size, self.max_mask + 1)
        self.softmax = torch.nn.Softmax(dim=1)


    def forward(self, input_ids, attention_mask, token_type_ids, labels=None):
       ... 
        # Feed input to RoBERTa

Initialize before training

config = RobertaConfig()
config.max_position_embeddings = 514
config.type_vocab_size = 1
config.vocab_size = 50265

model = MaskClassifier(config=config, path='roberta-base')

Saving after training trainer.save_model('./slogan_pretrained')

Loading the checkpoint

config = RobertaConfig()
config.max_position_embeddings = 514
config.type_vocab_size = 1
config.vocab_size = 50265

 model = MaskClassifier.from_pretrained(path, config=config, path='roberta-base')

I found similair issue(https://github.com/huggingface/transformers/issues/2886), but I don’t know exactly how I should override the function from_pretrained and even I tried overriding this functions, it still can’t load the checkpoint.

Error Message

Some weights of MaskClassifier were not initialized from the model checkpoint at /home/yeoun/slogans/slogan_pretrained and are newly initialized: [‘.roberta.embeddings.position_ids’, ‘.roberta.embeddings.word_embeddings.weight’, ‘.roberta.embeddings.position_embeddings.weight’, ‘.roberta.embeddings.token_type_embeddings.weight’, ‘.roberta.embeddings.LayerNorm.weight’, ‘.roberta.embeddings.LayerNorm.bias’, ‘.roberta.encoder.layer.0.attention.self.query.weight’, ‘.roberta.encoder.layer.0.attention.self.query.bias’, ‘.roberta.encoder.layer.0.attention.self.key.weight’, ‘.roberta.encoder.layer.0.attention.self.key.bias’, ‘.roberta.encoder.layer.0.attention.self.value.weight’, …

Issue Analytics

State:
Created 3 years ago
Comments:6 (6 by maintainers)

Top GitHub Comments

2reactions

LysandreJikcommented, Jan 29, 2021

Hi! Thanks for opening an issue. I see two issues with your setup here:

Why are you using from_pretrained to load the RobertaModel inside your pre-trained model? You should just initialize a RobertaModel from the configuration imo.
Instead of PreTrainedModel, I would instead use RobertaPreTrainedModel.

See the below script for an example of what I would recommend. I’m saving & reloading the model to make sure that all the weights get saved/loaded:

from transformers import RobertaModel, RobertaConfig, logging
from transformers.models.roberta.modeling_roberta import RobertaPreTrainedModel
import torch

logging.set_verbosity_info()

class MaskClassifier(RobertaPreTrainedModel):
    def __init__(self, config):
        super().__init__(config=config)
        self.roberta = RobertaModel(config)
        self.max_mask = 10
        self.hidden_size = config.hidden_size
        self.linear1 = torch.nn.Linear(2 * self.hidden_size, self.hidden_size)
        self.linear2 = torch.nn.Linear(self.hidden_size, self.max_mask + 1)
        self.softmax = torch.nn.Softmax(dim=1)

        self.init_weights()

model = MaskClassifier.from_pretrained("roberta-base")

Let’s see the logs now, for the first load using the roberta-base checkpoint:

Some weights of the model checkpoint at roberta-base were not used when initializing MaskClassifier: ['lm_head.bias', 'lm_head.dense.weight', 'lm_head.dense.bias', 'lm_head.layer_norm.weight', 'lm_head.layer_norm.bias', 'lm_head.decoder.weight']
- This IS expected if you are initializing MaskClassifier from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing MaskClassifier from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of MaskClassifier were not initialized from the model checkpoint at roberta-base and are newly initialized: ['roberta.embeddings.position_ids', 'linear1.weight', 'linear1.bias', 'linear2.weight', 'linear2.bias']

The warning tells you: you’re not using the lm_head weights, and the following layers are initialized: linear1 and linear2. Since you’re not using the LM head, and the two layers are the ones you just added, then there’s nothing to worry about.

Let’s try saving the model and reloading it again:

model.save_pretrained("here")
MaskClassifier.from_pretrained("here")

The logs show:

All model checkpoint weights were used when initializing MaskClassifier.
All the weights of MaskClassifier were initialized from the model checkpoint at here.

Success 🎉

0reactions

LysandreJikcommented, Feb 1, 2021

Glad I could help!

Top Results From Across the Web

"Some weights of the model checkpoint at bert-base-uncased ...

returns this warning message: Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.

Python: BERT Error - Some weights of the model checkpoint at ...

- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical...

Is "Some weights of the model were not used" warning normal ...

Yes, the warning is telling you that some weights were randomly initialized (here you classification head), which is normal since you are ...

Pytorch BERT beginner's room - Kaggle

Time # Log Message 49.9s 3 {'ids': tensor([[ 101, 2012, 16859, ..., 0, 0, 0], 49.9s 4, 49.9s 5,

Hugging Face Transformers教程笔记(7)：Fine-tuning a ...

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: [' ...