_batch_encode_plus() got an unexpected keyword argument 'is_pretokenized' using BertTokenizerFast

System Info

tokenizer = BertTokenizerFast.from_pretrained('bert-base-uncased')
for token, label in zip(tokenizer.convert_ids_to_tokens(training_set[0]["input_ids"]), training_set[0]["labels"]):
  print('{0:10}  {1}'.format(token, label))

The error I am getting is:
Traceback (most recent call last):
  File "C:\Users\1632613\Documents\Anit\NER_Trans\test.py", line 108, in <module>
    for token, label in zip(tokenizer.convert_ids_to_tokens(training_set[0]["input_ids"]), training_set[0]["labels"]):
  File "C:\Users\1632613\Documents\Anit\NER_Trans\test.py", line 66, in __getitem__
    encoding = self.tokenizer(sentence,
  File "C:\Users\1632613\AppData\Local\conda\conda\envs\ner\lib\site-packages\transformers\tokenization_utils_base.py", line 2477, in __call__
    return self.batch_encode_plus(
  File "C:\Users\1632613\AppData\Local\conda\conda\envs\ner\lib\site-packages\transformers\tokenization_utils_base.py", line 2668, in batch_encode_plus
    return self._batch_encode_plus(
TypeError: _batch_encode_plus() got an unexpected keyword argument 'is_pretokenized'

Who can help?

@SaulLu

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, …)
My own task or dataset (give details below)

Reproduction

Download the NER Dataset from the Kaggle link (https://www.kaggle.com/datasets/namanj27/ner-dataset)
Use the Script with the mentioned versions of transformers and tokenizers: tokenizer = BertTokenizerFast.from_pretrained(‘bert-base-uncased’) for token, label in zip(tokenizer.convert_ids_to_tokens(training_set[0][“input_ids”]), training_set[0][“labels”]): print(‘{0:10} {1}’.format(token, label))

Expected behavior

I expect to get the token, label from the script above.

Python Version: 3.9
tokenizers-0.12.1 
transformers-4.19.2

Anyone can shed some light please?

Issue Analytics

State:
Created a year ago
Comments:8 (3 by maintainers)

Top GitHub Comments

1reaction

naarkhoocommented, Dec 17, 2022

I am having the same problem

here is the output of transformers-cli env

- `transformers` version: 4.25.1
- Platform: Linux-5.10.133+-x86_64-with-glibc2.27
- Python version: 3.8.16
- Huggingface_hub version: 0.11.1
- PyTorch version (GPU?): 1.13.0+cu116 (True)
- Tensorflow version (GPU?): 2.9.2 (True)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using GPU in script?: <fill in>
- Using distributed or parallel set-up in script?: <fill in>

you can also find the colab notebook here

0reactions

berkekavakcommented, Dec 20, 2022

Experiencing the same issue. I think it depends on the version compatibility of PyTorch or Transformers. This notebook is different from the others since the predictions are made sentence-wise.

It works very well with Python 3.7, Transformers 3.0.2. @SaulLu would appreciate your help.