Warning : org.apache.pdfbox.pdmodel.font.PDSimpleFont toUnicode

See original GitHub issue

Summary of the issue

The warning pops up on the creation of csv, Basically i am reading the PDF for the extraction of text. I extract text and write it to the CSV, Its not generating any error, Its warning

The exact warning(s) is

Nov 08, 2017 10:48:48 AM org.apache.pdfbox.pdmodel.font.PDSimpleFont toUnicode
WARNING: No Unicode mapping for .notdef (9) in font Helvetica

Environment

Python version : Python 2.7.13

Java version :

openjdk version "1.8.0_131"
OpenJDK Runtime Environment (build 1.8.0_131-8u131-b11-2ubuntu1.17.04.3-b11)
OpenJDK 64-Bit Server VM (build 25.131-b11, mixed mode)

Java : java -h is working fine.

Os : Ubuntu 17.04

What did you intend to be?

The warning should not pops up. The rest of the goal is achieved

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:6 (2 by maintainers)

github_iconTop GitHub Comments

2reactions
lukehsiaocommented, Feb 14, 2018

@chezou Is there a way to silence these warnings?

0reactions
pri1311commented, Jul 16, 2020

Had a similar problem while working with a pdf file which had Verdana font. I converted it to word(docx) file changed the font all throughout and converted it back to pdf. worked well for me after that. It could be a good hack for small projects/ data.

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to solve no unicode mapping error from PDFBox?
org.apache.pdfbox.pdmodel.font.PDType0Font toUnicode No Unicode mapping for CID+24 (24) in font DroidSansFallback.
Read more >
Getting lots of warnings "No Unicode mapping for..." when ...
I have attached the original PDF file, the text output and the log with warnings. And besides, PDF file seems to have a...
Read more >
No Unicode mapping for math fonts in scientific papers - Apache
Trying to extract text from scientific papers (1200) I've encountered warnings related to Unicode mapping of special characters (Greek ...
Read more >
tabula-py example notebook
tabula-py is a tool for convert PDF tables to pandas DataFrame. tabula-py is a wrapper of tabula-java, which requires java on your machine....
Read more >
org.apache.pdfbox.pdmodel.font.PDType0Font java code ...
Best Java code snippets using org.apache.pdfbox.pdmodel.font. ... LOG.warn("No Unicode mapping for " + cid + " (" + code + ") in font...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found