Tokenizers throwing warning "The current process just got forked, Disabling parallelism to avoid deadlocks.. To disable this warning, please explicitly set TOKENIZERS_PARALLELISM=(true | false)"

See original GitHub issue

I know this warning is because the transformer library is updated to 3.x. I know the warning saying to set TOKENIZERS_PARALLELISM = true / false

My question is where should i set TOKENIZERS_PARALLELISM = true / false is this when defining tokenizers like

tok = Tokenizer.from_pretrained('xyz', TOKENIZERS_PARALLELISM=True) // this doesn't work

or is this when encoding text like

tok.encode_plus(text_string, some=some, some=some, TOKENIZERS_PARALLELISM = True) // this also didn't work

Suggestions anyone?

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Reactions:31
  • Comments:11 (5 by maintainers)

github_iconTop GitHub Comments

12reactions
nathan-chappellcommented, Jul 9, 2020

I may be a rookie, but it seems like it would be useful to indicate that this is an environment variable in the warning message.

11reactions
n1t0commented, Jul 6, 2020

This is happening whenever you use multiprocessing (Often used by data loaders). The way to disable this warning is to set the TOKENIZERS_PARALLELISM environment variable to the value that makes more sense for you. By default, we disable the parallelism to avoid any hidden deadlock that would be hard to debug, but you might be totally fine while keeping it enabled in your specific use-case.

You can try to set it to true, and if your process seems to be stuck, doing nothing, then you should use false.

We’ll improve this message to help avoid any confusion (Cf https://github.com/huggingface/tokenizers/issues/328)

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to disable TOKENIZERS_PARALLELISM=(true | false ...
Set the environment variable to the string "false" ... "The current process just got forked, after parallelism has already been used.
Read more >
Pre-tokenization vs. mini-batch tokenization and ...
I am using BART and its BartTokenizeFast for a Seq2Seq application. ... huggingface/tokenizers: The current process just got forked, ...
Read more >
The current process just got forked. Disabling parallelism to ...
The current process just got forked. Disabling parallelism to avoid deadlocks...To disable this warning, please explicitly set ...
Read more >
BERT,LDAinPython,BagOFwords,Embeddings,Clustering
Looking visually we can say that this data set has a few broad ... current process just got forked, after parallelism has already...
Read more >
Questions in Compute - AWS re:Post
Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found