No /Root object! - Is this really a PDF?

Hi,

I’ve got this PDF (see attachment) which opens just fine in a PDF viewer but fails to get parsed:

PDFSyntaxError                            Traceback (most recent call last)

<ipython-input-21-661fe9476e35> in <module>()
      7 device = TextConverter(rsrcmgr, outfp, codec="utf-8", laparams=LAParams())
      8 interpreter = PDFPageInterpreter(rsrcmgr, device)
----> 9 for page in PDFPage.get_pages(fp, pagenos=set(), caching=True, check_extractable=True):
     10     interpreter.process_page(page)
     11 device.close()

1 frames

/usr/local/lib/python3.6/dist-packages/pdfminer/pdfpage.py in get_pages(cls, fp, pagenos, maxpages, password, caching, check_extractable)
    126         parser = PDFParser(fp)
    127         # Create a PDF document object that stores the document structure.
--> 128         doc = PDFDocument(parser, password=password, caching=caching)
    129         # Check if the document allows text extraction.
    130         # If not, warn the user and proceed.

/usr/local/lib/python3.6/dist-packages/pdfminer/pdfdocument.py in __init__(self, parser, password, caching, fallback)
    594                 break
    595         else:
--> 596             raise PDFSyntaxError('No /Root object! - Is this really a PDF?')
    597         if self.catalog.get('Type') is not LITERAL_CATALOG:
    598             if settings.STRICT:

PDFSyntaxError: No /Root object! - Is this really a PDF?

Steps to reproduce the bug:

f = PDFParser(open(pdf, 'rb'))
doc = PDFDocument(f)

W020160201380721221093.pdf

Issue Analytics

State:
Created 3 years ago
Reactions:2
Comments:6 (3 by maintainers)

Top GitHub Comments

2reactions

samkit-jaincommented, Jun 22, 2022

@micmalti I was able to resolve this issue by repairing the PDF via Ghostscript. Command I ran:

gs -o "output.pdf" -sDEVICE=pdfwrite input.pdf

The repaired PDF.

Is this something that pdfminer should be able to handle natively? I don’t know.

1reaction

FelixSchwarzcommented, Aug 23, 2022

@samkit-jain Thank you for the workaround - that helped me tremendously.

Top Results From Across the Web

Parsing a PDF with no /Root object using PDFMiner

- Is this really a PDF? Of course, I immediately checked to see whether or not these PDFs were corrupted, but they can...

The root object is missing... - Acrobat Answers

Frequently when downloading a pdf document it is saved in downloads folder but then will not open. I get the message, "there was...

pdf2txt problem — CollectiveAccess support forum

PDFSyntaxError: No /Root object! - Is this really a PDF? pdftotext converts the file ok; I have checked the pdf file using an...

pdfminer - Bountysource

py in __init__(self, parser, password, caching, fallback) 594 break 595 else: --> 596 raise PDFSyntaxError('No /Root object! - Is this really a PDF?')...

Specifying a default root object - Amazon CloudFront

Specifying a default root object. PDFRSS. You can configure CloudFront to return a specific object (the default root object) when a user ...