Error when reading GFF files from GenBank

See original GitHub issue

Hi,

I go this issue when trying to process a bunch of genome assemblies including some that where directly downloaded from the NCBI.

It appears that panaroo cannot digest GFF files coming from GenBank as I get this error:

Traceback (most recent call last):
  File "/usr/local/bin/panaroo", line 10, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.7/site-packages/panaroo/__main__.py", line 273, in main
    args.n_cpu)
  File "/usr/local/lib/python3.7/site-packages/panaroo/prokka.py", line 229, in process_prokka_input
    raise RuntimeError("Error reading prokka input!")
RuntimeError: Error reading prokka input!

(the error was actually a bit longer when including the report of the error handling by master process, but i don’t think that’s relevant here)

this was obtained using panaroo version 1.2.3 (from the module installed on the Sanger farm).

One example of GFF file that fails is to be found in this assembly: https://www.ncbi.nlm.nih.gov/assembly/GCA_009746685.1

Do you have any suggestion of how this could be addressed? Is Panaroo only expecting GFF files as produced by Prokka? I used the the GenBank version of the assembly; do you think it would work better if using the RefSeq annotation?

Thank you in advance for the help.

Florent

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:5 (1 by maintainers)

github_iconTop GitHub Comments

2reactions
flasscommented, Sep 14, 2020

I confirm that processing the GFF files with the convert_refseq_to_prokka_gff.py script allowed me to circumvent the issue. Thanks again.

1reaction
flasscommented, Sep 14, 2020

ah sorry I should have thought of that myself! I was confused by the fact that the script is provided as an executable file so I was assuming it would know the right interpreter by itself. I could suggest adding a shebang like #!/usr/bin/env python3 at the beginning of the script, or to not make the script executable to avoid that mistake. Trying with the Python interpreter, I still got errors linked to missing packages in the environment; I solved that issue by installing the Conda environment specific to panaroo (https://anaconda.org/bioconda/panaroo). This script now works fine.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Validation Error Explanations for Genomes - NCBI - NIH
Explanation : An author name has illegal characters. Suggestion : Check the first names (given names) in the sequence and publication references ...
Read more >
EasyRNASeq - gff file is not recognized
The gff3 file that I provided was able to be read. However, a new error came up. It seems to me that there...
Read more >
problem opening a .gff file - Google Groups
Hi, The error message indicates a problem parsing the gff file, your colleagues experience nothwithstanding. Can you reply and attach the file? Jim....
Read more >
Gff to genbank - feature is missing - Biostars
Another alternative would be to open fasta file in Artemis -> Read GFF3 as an entry within it and save the whole thing...
Read more >
The problem
Gene Feature Format (GFF) is used to annotate intervals on a genome. Loading and validating a GFF is a common first step in...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found