No /Root object! - Is this really a PDF?
See original GitHub issueHi,
I’ve got this PDF (see attachment) which opens just fine in a PDF viewer but fails to get parsed:
PDFSyntaxError Traceback (most recent call last)
<ipython-input-21-661fe9476e35> in <module>()
7 device = TextConverter(rsrcmgr, outfp, codec="utf-8", laparams=LAParams())
8 interpreter = PDFPageInterpreter(rsrcmgr, device)
----> 9 for page in PDFPage.get_pages(fp, pagenos=set(), caching=True, check_extractable=True):
10 interpreter.process_page(page)
11 device.close()
1 frames
/usr/local/lib/python3.6/dist-packages/pdfminer/pdfpage.py in get_pages(cls, fp, pagenos, maxpages, password, caching, check_extractable)
126 parser = PDFParser(fp)
127 # Create a PDF document object that stores the document structure.
--> 128 doc = PDFDocument(parser, password=password, caching=caching)
129 # Check if the document allows text extraction.
130 # If not, warn the user and proceed.
/usr/local/lib/python3.6/dist-packages/pdfminer/pdfdocument.py in __init__(self, parser, password, caching, fallback)
594 break
595 else:
--> 596 raise PDFSyntaxError('No /Root object! - Is this really a PDF?')
597 if self.catalog.get('Type') is not LITERAL_CATALOG:
598 if settings.STRICT:
PDFSyntaxError: No /Root object! - Is this really a PDF?
Steps to reproduce the bug:
f = PDFParser(open(pdf, 'rb'))
doc = PDFDocument(f)
Issue Analytics
- State:
- Created 3 years ago
- Reactions:2
- Comments:6 (3 by maintainers)
Top Results From Across the Web
Parsing a PDF with no /Root object using PDFMiner
- Is this really a PDF? Of course, I immediately checked to see whether or not these PDFs were corrupted, but they can...
Read more >The root object is missing... - Acrobat Answers
Frequently when downloading a pdf document it is saved in downloads folder but then will not open. I get the message, "there was...
Read more >pdf2txt problem — CollectiveAccess support forum
PDFSyntaxError: No /Root object! - Is this really a PDF? pdftotext converts the file ok; I have checked the pdf file using an...
Read more >pdfminer - Bountysource
py in __init__(self, parser, password, caching, fallback) 594 break 595 else: --> 596 raise PDFSyntaxError('No /Root object! - Is this really a PDF?')...
Read more >Specifying a default root object - Amazon CloudFront
Specifying a default root object. PDFRSS. You can configure CloudFront to return a specific object (the default root object) when a user ...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
@micmalti I was able to resolve this issue by repairing the PDF via Ghostscript. Command I ran:
The repaired PDF.
Is this something that pdfminer should be able to handle natively? I don’t know.
@samkit-jain Thank you for the workaround - that helped me tremendously.