RobertaTokenizer object has no attribute 'add_special_tokens_single_sentence'

See original GitHub issue

In trying to test out the roberta model I received this error. My setup is the same as in the Fine Tune Model section of the readme.

transformers==2.0.0 fast-bert==1.4.2

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-17-c876b1d42fd6> in <module>
      7     multi_gpu=args.multi_gpu,
      8     model_type=args.model_type,
----> 9     logger=logger)

~/.conda/envs/transclass/lib/python3.7/site-packages/fast_bert/data_lm.py in from_raw_corpus(data_dir, text_list, tokenizer, batch_size_per_gpu, max_seq_length, multi_gpu, test_size, model_type, logger, clear_cache, no_cache)
    152                                model_type=model_type,
    153                                logger=logger,
--> 154                                clear_cache=clear_cache, no_cache=no_cache)
    155 
    156     def __init__(self, data_dir, tokenizer, train_file='lm_train.txt', val_file='lm_val.txt',

~/.conda/envs/transclass/lib/python3.7/site-packages/fast_bert/data_lm.py in __init__(self, data_dir, tokenizer, train_file, val_file, batch_size_per_gpu, max_seq_length, multi_gpu, model_type, logger, clear_cache, no_cache)
    209             train_filepath = str(self.data_dir/train_file)
    210             train_dataset = TextDataset(self.tokenizer, train_filepath, cached_features_file,
--> 211                                         self.logger, block_size=self.tokenizer.max_len_single_sentence)
    212 
    213             self.train_batch_size = self.batch_size_per_gpu * \

~/.conda/envs/transclass/lib/python3.7/site-packages/fast_bert/data_lm.py in __init__(self, tokenizer, file_path, cache_path, logger, block_size)
    104 
    105             while len(tokenized_text) >= block_size:  # Truncate in block of block_size
--> 106                 self.examples.append(tokenizer.add_special_tokens_single_sentence(
    107                     tokenized_text[:block_size]))
    108                 tokenized_text = tokenized_text[block_size:]

AttributeError: 'RobertaTokenizer' object has no attribute 'add_special_tokens_single_sentence'

It appears that the RobertaTokenizer has attributes:

add_special_tokens add_special_tokens_sequence_pair add_special_tokens_single_sequence add_tokens

But not add_special_tokens_single_sentence.

It seems this method is quite similar to add_special_tokens_single_sequence, and perhaps that is the intended method.

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:6

github_iconTop GitHub Comments

1reaction
alberduriscommented, Nov 4, 2019

This is broke again despite of #102 since Huggingface has made a breaking change on the Transformers repo (see https://github.com/huggingface/transformers/commit/6c1d0bc0665ef01710db301fb1a0a3c23778714a)

The fix is, again, replacing the add_special_tokens_single_sequence method with build_inputs_with_special_tokens

0reactions
ddofercommented, Mar 7, 2020

I’mgetting this same error while using a number of other tokenizer (including training my own tokenizers from the huggingface tokenizers library (BertWordPieceTokenizer , SentencePieceBPETokenizer & ByteLevelBPETokenizer)

Read more comments on GitHub >

github_iconTop Results From Across the Web

AttributeError: 'RobertaTokenizer' object has no attribute ...
With the latest update to Transformers, has the function been removed? I still see it in the code, but I run into the...
Read more >
Error training MLM with Roberta Tokenizer
I am currently trying to train a MLM using a ByteLevelBPETokenizer on a custom corpus and am getting the following error: AttributeError: ...
Read more >
Loading a tokenizer on huggingface: AttributeError
There seems to be some issue with the tokenizer. It works, if you remove use_fast parameter or set it true, then you will...
Read more >
RoBERTa_Bert_tokenizer_train_...
Id)): location = f'{path}{rec_id}.json' with open(location, ... *inputs, **kwargs): AttributeError: 'RobertaTokenizerFast' object has no attribute 'to'.
Read more >
Training RoBERTa from scratch - the missing guide
We will be dealing not only with long wiki articles but later on ... BertProcessing logger.info("Loading RoBERTa tokenizer") tokenizer ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found