RobertaTokenizer object has no attribute 'add_special_tokens_single_sentence'

See original GitHub issue

In trying to test out the roberta model I received this error. My setup is the same as in the Fine Tune Model section of the readme.

transformers==2.0.0 fast-bert==1.4.2

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-17-c876b1d42fd6> in <module>
      7     multi_gpu=args.multi_gpu,
      8     model_type=args.model_type,
----> 9     logger=logger)

~/.conda/envs/transclass/lib/python3.7/site-packages/fast_bert/data_lm.py in from_raw_corpus(data_dir, text_list, tokenizer, batch_size_per_gpu, max_seq_length, multi_gpu, test_size, model_type, logger, clear_cache, no_cache)
    152                                model_type=model_type,
    153                                logger=logger,
--> 154                                clear_cache=clear_cache, no_cache=no_cache)
    155 
    156     def __init__(self, data_dir, tokenizer, train_file='lm_train.txt', val_file='lm_val.txt',

~/.conda/envs/transclass/lib/python3.7/site-packages/fast_bert/data_lm.py in __init__(self, data_dir, tokenizer, train_file, val_file, batch_size_per_gpu, max_seq_length, multi_gpu, model_type, logger, clear_cache, no_cache)
    209             train_filepath = str(self.data_dir/train_file)
    210             train_dataset = TextDataset(self.tokenizer, train_filepath, cached_features_file,
--> 211                                         self.logger, block_size=self.tokenizer.max_len_single_sentence)
    212 
    213             self.train_batch_size = self.batch_size_per_gpu * \

~/.conda/envs/transclass/lib/python3.7/site-packages/fast_bert/data_lm.py in __init__(self, tokenizer, file_path, cache_path, logger, block_size)
    104 
    105             while len(tokenized_text) >= block_size:  # Truncate in block of block_size
--> 106                 self.examples.append(tokenizer.add_special_tokens_single_sentence(
    107                     tokenized_text[:block_size]))
    108                 tokenized_text = tokenized_text[block_size:]

AttributeError: 'RobertaTokenizer' object has no attribute 'add_special_tokens_single_sentence'

It appears that the RobertaTokenizer has attributes:

add_special_tokens add_special_tokens_sequence_pair add_special_tokens_single_sequence add_tokens

But not add_special_tokens_single_sentence.

It seems this method is quite similar to add_special_tokens_single_sequence, and perhaps that is the intended method.

Issue Analytics

State:
Created 4 years ago
Comments:6

Top GitHub Comments

1reaction

alberduriscommented, Nov 4, 2019

This is broke again despite of #102 since Huggingface has made a breaking change on the Transformers repo (see https://github.com/huggingface/transformers/commit/6c1d0bc0665ef01710db301fb1a0a3c23778714a)

The fix is, again, replacing the add_special_tokens_single_sequence method with build_inputs_with_special_tokens

0reactions

ddofercommented, Mar 7, 2020

I’mgetting this same error while using a number of other tokenizer (including training my own tokenizers from the huggingface tokenizers library (BertWordPieceTokenizer , SentencePieceBPETokenizer & ByteLevelBPETokenizer)

Top Results From Across the Web

AttributeError: 'RobertaTokenizer' object has no attribute ...

With the latest update to Transformers, has the function been removed? I still see it in the code, but I run into the...

Error training MLM with Roberta Tokenizer

I am currently trying to train a MLM using a ByteLevelBPETokenizer on a custom corpus and am getting the following error: AttributeError: ...

Loading a tokenizer on huggingface: AttributeError

There seems to be some issue with the tokenizer. It works, if you remove use_fast parameter or set it true, then you will...

RoBERTa_Bert_tokenizer_train_...

Id)): location = f'{path}{rec_id}.json' with open(location, ... *inputs, **kwargs): AttributeError: 'RobertaTokenizerFast' object has no attribute 'to'.

Training RoBERTa from scratch - the missing guide

We will be dealing not only with long wiki articles but later on ... BertProcessing logger.info("Loading RoBERTa tokenizer") tokenizer ...