UnicodeDecodeError: 'charmap' codec can't decode byte 0x90 in position 962: character maps to <undefined>

See original GitHub issue

I am trying to run model.py but i am getting following error:

D:\imad_web\kaggle-quora-dup_24_position>python model.py C:\ProgramData\Anaconda3\lib\site-packages\h5py_init_.py:36: FutureWarning: Conversion of the second argument of issubdtype from float to np.floating is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type. from ._conv import register_converters as _register_converters Using TensorFlow backend. Creating the vocabulary of words occurred more than 100 Traceback (most recent call last): File “model.py”, line 122, in <module> embeddings_index = get_embedding() File “model.py”, line 55, in get_embedding for line in f: File “C:\ProgramData\Anaconda3\lib\encodings\cp1252.py”, line 23, in decode return codecs.charmap_decode(input,self.errors,decoding_table)[0] UnicodeDecodeError: ‘charmap’ codec can’t decode byte 0x90 in position 962: character maps to <undefined> ` def get_embedding(): embeddings_index = {} f = open(EMBEDDING_FILE) for line in f: ##########################line 55 values = line.split() word = values[0] if len(values) == EMBEDDING_DIM + 1 and word in top_words: coefs = np.asarray(values[1:], dtype=“float32”) embeddings_index[word] = coefs f.close() return embeddings_index

`

` vectorizer = CountVectorizer(lowercase=False, token_pattern=“\S+”, min_df=MIN_WORD_OCCURRENCE) vectorizer.fit(all_questions) top_words = set(vectorizer.vocabulary_.keys()) top_words.add(REPLACE_WORD)

embeddings_index = get_embedding() ##############line 122 print(“Words are not found in the embedding:”, top_words - embeddings_index.keys()) top_words = embeddings_index.keys()

`

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Reactions:2
  • Comments:5 (1 by maintainers)

github_iconTop GitHub Comments

16reactions
prerak13commented, Mar 31, 2018

No, Bug any ways i’ve found solution. I had used f = open(EMBEDDING_FILE,encoding=“utf-8”) now its working 😃

7reactions
felipetoffolicommented, Jul 7, 2018

f = open(EMBEDDING_FILE, “rb”, buffering=0)

Read more comments on GitHub >

github_iconTop Results From Across the Web

'charmap' codec can't decode byte X in position Y: character ...
If the file contains characters of values not defined in this codepage (like 0x90) we get UnicodeDecodeError . Sometimes we don't know the...
Read more >
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d
Solved - UnicodeDecodeError : ' charmap ' codec can't decode byte 0x9d ... can't decode byte 0x9d in position 6552: character maps to...
Read more >
'charmap' codec can't decode byte 0x9d – Python File Read
UnicodeDecodeError : 'charmap' codec can't decode byte 0x9d in position 12345 : character maps to <undefined>. The following solution just ...
Read more >
'charmap' codec can't decode byte 0x9d in position 846
'charmap' codec can't decode byte 0x9d in position 846: character maps to <undefined>. Krish. Add encoding: file = open(filename, ...
Read more >
Help with error UnicodeDecodeError: 'charmap' - Google Groups
UnicodeDecodeError : 'charmap' codec can't decode byte 0x9d in position 2438: character maps to <undefined>. Anyone has any idea how to solve? Best...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found