model.generate with prefix_allowed_tokens_fn throws RuntimeError: probability tensor contains either `inf`, `nan` or element < 0
See original GitHub issueEnvironment info
transformersversion: 4.15.0- Platform: Linux-5.4.0-90-generic-x86_64-with-debian-bullseye-sid
- Python version: 3.7.12
- PyTorch version (GPU?): 1.10.0+cu102 (True)
- Tensorflow version (GPU?): 2.7.0 (True)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using GPU in script?: No
- Using distributed or parallel set-up in script?: No
Who can help
Information
Model I am using T5ForConditionalGeneration:
The problem arises when using my own modified scripts: Script to reproduce error is mentioned below.
The tasks I am working on is my own task or dataset: The task requires conditional generation from T5, in such a way, that the output vocabulary is restricted to a small set.
To reproduce
- Run the following script to reproduce the behaviour.
from transformers import T5Tokenizer, T5ForConditionalGeneration, T5Config
lm_model = 't5-small'
model = T5ForConditionalGeneration.from_pretrained(lm_model)
tokenizer = T5Tokenizer.from_pretrained(lm_model)
def restrict_decode_vocab(batch_idx, prefix_beam):
if len(prefix_beam)==3:
restricted_vocab = tokenizer(' ', return_tensors="pt")['input_ids'].tolist()
else:
restricted_vocab = tokenizer('<extra_id_0> cute dog <extra_id_1> the <pad>', return_tensors="pt")['input_ids'].tolist()
return restricted_vocab
source = ['The <extra_id_0> walks in <extra_id_1> park .']
source_encoding = tokenizer(source[:], padding='longest', return_tensors="pt")
input_ids, attention_mask = source_encoding['input_ids'], source_encoding['attention_mask']
decoded_beams = model.generate(input_ids=input_ids, attention_mask=attention_mask, do_sample=True, num_beams=2, prefix_allowed_tokens_fn=restrict_decode_vocab, min_length=4, max_length=4, remove_invalid_values=True)
print(decoded_beams)
- Above script produces the following stack trace.
/home/jsingh319/uploaded_venvs/venv-koala-torch-1.10-python-3.7.12/lib/python3.7/site-packages/transformers/generation_utils.py:2259: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
next_indices = next_tokens // vocab_size
Traceback (most recent call last):
File "reproduce_error.py", line 17, in <module>
decoded_beams = model.generate(input_ids=input_ids, attention_mask=attention_mask, do_sample=True, num_beams=2, prefix_allowed_tokens_fn=restrict_decode_vocab, min_length=4, max_length=4, remove_invalid_values=True)
File "/home/jsingh319/uploaded_venvs/venv-koala-torch-1.10-python-3.7.12/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context
return func(*args, **kwargs)
File "/home/jsingh319/uploaded_venvs/venv-koala-torch-1.10-python-3.7.12/lib/python3.7/site-packages/transformers/generation_utils.py", line 1220, in generate
**model_kwargs,
File "/home/jsingh319/uploaded_venvs/venv-koala-torch-1.10-python-3.7.12/lib/python3.7/site-packages/transformers/generation_utils.py", line 2253, in beam_sample
next_tokens = torch.multinomial(probs, num_samples=2 * num_beams)
RuntimeError: probability tensor contains either `inf`, `nan` or element < 0
Expected behavior
No error.
Possible solution
The call function for class “InfNanRemoveLogitsProcessor” should include the following statement before returning “scores”.
scores[scores == float("-inf")] = torch.finfo(scores.dtype).min
Issue Analytics
- State:
- Created 2 years ago
- Reactions:2
- Comments:10 (5 by maintainers)
Top Results From Across the Web
probability tensor contains either `inf`, `nan` or element < 0 ...
I'm using torch 1.6.0, however, when I use torch 1.1.0, I don't get any error anymore, and I could train the model correctly....
Read more >[Blenderbot] Getting runtime error while using generate
While using blenderbot small model, I got a peculiar error saying– RuntimeError: probability tensor contains either inf , nan or element < 0....
Read more >Invalid multinomial distribution (encountering probability entry ...
It turns out this is getting triggered because out_probs contains nan entries. print('nan values:', torch.sum(torch.isnan(out_probs)).item()).
Read more >I keep getting this 'probability tensor' error whenever I load ...
Tried generating a cyberpunk action story concept. probability tensor contains either \inf\, \ nan` or element < 0`.
Read more >nosenselesstalks/01-tensor-operations - Jovian
Collaborate with nosenselesstalks on 01-tensor-operations notebook. ... RuntimeError: probability tensor contains either `inf`, `nan` or element < 0.
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
I don’t think that this is related in any way to the
InfNanRemoveLogitsProcessorprocessor. IMO, the reason for the error here is that in the 3rd generation step, all values ofnext_token_scoresare set to-inf(I think) due to theprefix_allowed_tokens_fnthat you’ve added. This is not a bug IMO withtransformers, but with theprefix_allowed_tokens_fnfunction as it should not set all values to-inf.A tip from my side @iamjanvijay would be to do the following. Create the
PrefixConstrainedLogitsProcessorobject with your function and just play around with it locally (what happens at generation step 3) I think you’ll see then that it sets all values to-infat some point which it shouldn’t doI’ll close my PR in the meantime. We can reopen it if needed, but I tend to agree with @patrickvonplaten that having everything
float(-inf)can be considered a bug already.