No /Root object! - Is this really a PDF?

See original GitHub issue

Hi,

I’ve got this PDF (see attachment) which opens just fine in a PDF viewer but fails to get parsed:

PDFSyntaxError                            Traceback (most recent call last)

<ipython-input-21-661fe9476e35> in <module>()
      7 device = TextConverter(rsrcmgr, outfp, codec="utf-8", laparams=LAParams())
      8 interpreter = PDFPageInterpreter(rsrcmgr, device)
----> 9 for page in PDFPage.get_pages(fp, pagenos=set(), caching=True, check_extractable=True):
     10     interpreter.process_page(page)
     11 device.close()

1 frames

/usr/local/lib/python3.6/dist-packages/pdfminer/pdfpage.py in get_pages(cls, fp, pagenos, maxpages, password, caching, check_extractable)
    126         parser = PDFParser(fp)
    127         # Create a PDF document object that stores the document structure.
--> 128         doc = PDFDocument(parser, password=password, caching=caching)
    129         # Check if the document allows text extraction.
    130         # If not, warn the user and proceed.

/usr/local/lib/python3.6/dist-packages/pdfminer/pdfdocument.py in __init__(self, parser, password, caching, fallback)
    594                 break
    595         else:
--> 596             raise PDFSyntaxError('No /Root object! - Is this really a PDF?')
    597         if self.catalog.get('Type') is not LITERAL_CATALOG:
    598             if settings.STRICT:

PDFSyntaxError: No /Root object! - Is this really a PDF?

Steps to reproduce the bug:

f = PDFParser(open(pdf, 'rb'))
doc = PDFDocument(f)

W020160201380721221093.pdf

Issue Analytics

  • State:open
  • Created 3 years ago
  • Reactions:2
  • Comments:6 (3 by maintainers)

github_iconTop GitHub Comments

2reactions
samkit-jaincommented, Jun 22, 2022

@micmalti I was able to resolve this issue by repairing the PDF via Ghostscript. Command I ran:

gs -o "output.pdf" -sDEVICE=pdfwrite input.pdf

The repaired PDF.

Is this something that pdfminer should be able to handle natively? I don’t know.

1reaction
FelixSchwarzcommented, Aug 23, 2022

@samkit-jain Thank you for the workaround - that helped me tremendously.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Parsing a PDF with no /Root object using PDFMiner
- Is this really a PDF? Of course, I immediately checked to see whether or not these PDFs were corrupted, but they can...
Read more >
The root object is missing... - Acrobat Answers
Frequently when downloading a pdf document it is saved in downloads folder but then will not open. I get the message, "there was...
Read more >
pdf2txt problem — CollectiveAccess support forum
PDFSyntaxError: No /Root object! - Is this really a PDF? pdftotext converts the file ok; I have checked the pdf file using an...
Read more >
pdfminer - Bountysource
py in __init__(self, parser, password, caching, fallback) 594 break 595 else: --> 596 raise PDFSyntaxError('No /Root object! - Is this really a PDF?')...
Read more >
Specifying a default root object - Amazon CloudFront
Specifying a default root object. PDFRSS. You can configure CloudFront to return a specific object (the default root object) when a user ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found