UnicodeDecodeError: 'charmap' codec can't decode byte 0x90 in position 962: character maps to <undefined>

I am trying to run model.py but i am getting following error:

D:\imad_web\kaggle-quora-dup_24_position>python model.py C:\ProgramData\Anaconda3\lib\site-packages\h5py_init_.py:36: FutureWarning: Conversion of the second argument of issubdtype from float to np.floating is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type. from ._conv import register_converters as _register_converters Using TensorFlow backend. Creating the vocabulary of words occurred more than 100 Traceback (most recent call last): File “model.py”, line 122, in <module> embeddings_index = get_embedding() File “model.py”, line 55, in get_embedding for line in f: File “C:\ProgramData\Anaconda3\lib\encodings\cp1252.py”, line 23, in decode return codecs.charmap_decode(input,self.errors,decoding_table)[0] UnicodeDecodeError: ‘charmap’ codec can’t decode byte 0x90 in position 962: character maps to <undefined> ` def get_embedding(): embeddings_index = {} f = open(EMBEDDING_FILE) for line in f: ##########################line 55 values = line.split() word = values[0] if len(values) == EMBEDDING_DIM + 1 and word in top_words: coefs = np.asarray(values[1:], dtype=“float32”) embeddings_index[word] = coefs f.close() return embeddings_index

` vectorizer = CountVectorizer(lowercase=False, token_pattern=“\S+”, min_df=MIN_WORD_OCCURRENCE) vectorizer.fit(all_questions) top_words = set(vectorizer.vocabulary_.keys()) top_words.add(REPLACE_WORD)

embeddings_index = get_embedding() ##############line 122 print(“Words are not found in the embedding:”, top_words - embeddings_index.keys()) top_words = embeddings_index.keys()

Issue Analytics

State:
Created 5 years ago
Reactions:2
Comments:5 (1 by maintainers)

Top GitHub Comments

16reactions

prerak13commented, Mar 31, 2018

No, Bug any ways i’ve found solution. I had used f = open(EMBEDDING_FILE,encoding=“utf-8”) now its working 😃

7reactions

felipetoffolicommented, Jul 7, 2018

f = open(EMBEDDING_FILE, “rb”, buffering=0)

Top Results From Across the Web

'charmap' codec can't decode byte X in position Y: character ...

If the file contains characters of values not defined in this codepage (like 0x90) we get UnicodeDecodeError . Sometimes we don't know the...

UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d

Solved - UnicodeDecodeError : ' charmap ' codec can't decode byte 0x9d ... can't decode byte 0x9d in position 6552: character maps to...

'charmap' codec can't decode byte 0x9d – Python File Read

UnicodeDecodeError : 'charmap' codec can't decode byte 0x9d in position 12345 : character maps to <undefined>. The following solution just ...

'charmap' codec can't decode byte 0x9d in position 846

'charmap' codec can't decode byte 0x9d in position 846: character maps to <undefined>. Krish. Add encoding: file = open(filename, ...

Help with error UnicodeDecodeError: 'charmap' - Google Groups

UnicodeDecodeError : 'charmap' codec can't decode byte 0x9d in position 2438: character maps to <undefined>. Anyone has any idea how to solve? Best...