_batch_encode_plus() got an unexpected keyword argument 'is_pretokenized' using BertTokenizerFast
See original GitHub issueSystem Info
tokenizer = BertTokenizerFast.from_pretrained('bert-base-uncased')
for token, label in zip(tokenizer.convert_ids_to_tokens(training_set[0]["input_ids"]), training_set[0]["labels"]):
print('{0:10} {1}'.format(token, label))
The error I am getting is:
Traceback (most recent call last):
File "C:\Users\1632613\Documents\Anit\NER_Trans\test.py", line 108, in <module>
for token, label in zip(tokenizer.convert_ids_to_tokens(training_set[0]["input_ids"]), training_set[0]["labels"]):
File "C:\Users\1632613\Documents\Anit\NER_Trans\test.py", line 66, in __getitem__
encoding = self.tokenizer(sentence,
File "C:\Users\1632613\AppData\Local\conda\conda\envs\ner\lib\site-packages\transformers\tokenization_utils_base.py", line 2477, in __call__
return self.batch_encode_plus(
File "C:\Users\1632613\AppData\Local\conda\conda\envs\ner\lib\site-packages\transformers\tokenization_utils_base.py", line 2668, in batch_encode_plus
return self._batch_encode_plus(
TypeError: _batch_encode_plus() got an unexpected keyword argument 'is_pretokenized'
Who can help?
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examplesfolder (such as GLUE/SQuAD, …) - My own task or dataset (give details below)
Reproduction
- Download the NER Dataset from the Kaggle link (https://www.kaggle.com/datasets/namanj27/ner-dataset)
- Use the Script with the mentioned versions of transformers and tokenizers: tokenizer = BertTokenizerFast.from_pretrained(‘bert-base-uncased’) for token, label in zip(tokenizer.convert_ids_to_tokens(training_set[0][“input_ids”]), training_set[0][“labels”]): print(‘{0:10} {1}’.format(token, label))
Expected behavior
I expect to get the token, label from the script above.
Python Version: 3.9
tokenizers-0.12.1
transformers-4.19.2
Anyone can shed some light please?
Issue Analytics
- State:
- Created a year ago
- Comments:8 (3 by maintainers)
Top Results From Across the Web
3.0.1: "unexpected keyword argument 'is_pretokenized'" when ...
3.0.1: "unexpected keyword argument 'is_pretokenized'" when using batch_encode_plus() w/ Fast Tokenizers #5528.
Read more >batch_encode_plus() got an unexpected keyword argument ...
I am studying RoBERTA model to detect emotions in ...
Read more >Tokenizer - Hugging Face
Tokenizer. A tokenizer is in charge of preparing the inputs for a model. The library contains tokenizers for all the models. Most of...
Read more >PyTorch Transformers: TypeError: forward() got an ...
At the beginning of training I get the following error: Traceback (most recent ... TypeError: forward() got an unexpected keyword argument ...
Read more >How to use BERT from the Hugging Face transformer library
where(). Because in this particular example I am retrieving the top 10 candidate replacement words for the mask token(you can get more than...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
I am having the same problem
here is the output of
transformers-cli envyou can also find the colab notebook here
Experiencing the same issue. I think it depends on the version compatibility of PyTorch or Transformers. This notebook is different from the others since the predictions are made sentence-wise.
It works very well with Python 3.7, Transformers 3.0.2. @SaulLu would appreciate your help.