funannotate test - Not enough gene models to train Augustus

See original GitHub issue

Are you using the latest release? Yep (I think)! Installed through Conda

funannotate --version
funannotate v1.8.3

Describe the bug I ran the test set, and I got the following error:

[06:02 PM]: Not enough gene models 175 to train Augustus (200 required), exiting

This might be related to https://github.com/nextgenusfs/funannotate/issues/418 (tblastn exiting earliy), but there isn’t a way to set --min_training_models. I’m trying a rerun with 1 cpu to rule out multithreaded tblastn causing BUSCO to exit early. Will let you know if it succeeds or not.

What command did you issue?

funannotate test -t all --cpus 8         

Logfiles

funannotate test -t all --cpus 8                                                                                                                                                                    #########################################################
Running `funannotate clean` unit testing: minimap2 mediated assembly duplications
Downloading: https://osf.io/8pjbe/download?version=1 Bytes: 252076
CMD: funannotate clean -i test.clean.fa -o test.exhaustive.fa --exhaustive
#########################################################
-----------------------------------------------
6 input contigs, 6 larger than 500 bp, N50 is 427,039 bp
Checking duplication of 6 contigs
-----------------------------------------------
scaffold_73 appears duplicated: 100% identity over 100% of the contig. contig length: 15153
scaffold_91 appears duplicated: 100% identity over 100% of the contig. contig length: 8858
scaffold_27 appears duplicated: 100% identity over 100% of the contig. contig length: 427039
-----------------------------------------------
6 input contigs; 6 larger than 500 bp; 3 duplicated; 3 written to file
#########################################################
SUCCESS: `funannotate clean` test complete.
#########################################################

#########################################################
Running `funannotate mask` unit testing: RepeatModeler --> RepeatMasker
Downloading: https://osf.io/hbryz/download?version=1 Bytes: 375687
CMD: funannotate mask -i test.fa -o test.masked.fa --cpus 8
#########################################################
-------------------------------------------------------
[05:07 PM]: OS: Debian GNU/Linux 10, 24 cores, ~ 264 GB RAM. Python: 3.8.6
[05:07 PM]: Running funanotate v1.8.3
[05:07 PM]: Soft-masking simple repeats with tantan
[05:07 PM]: Repeat soft-masking finished:
Masked genome: /home/rpetit3/test-genemark/test-mask_3801/test.masked.fa
num scaffolds: 2
assembly size: 1,216,048 bp
masked repeats: 50,965 bp (4.19%)
-------------------------------------------------------
#########################################################
SUCCESS: `funannotate mask` test complete.
#########################################################

#########################################################
Running `funannotate predict` unit testing
Downloading: https://osf.io/te2pf/download?version=1 Bytes: 1489808
CMD: funannotate predict -i test.softmasked.fa --protein_evidence protein.evidence.fasta -o annotate --augustus_species saccharomyces --cpus 8 --species Awesome testicus
#########################################################
-------------------------------------------------------
[05:07 PM]: OS: Debian GNU/Linux 10, 24 cores, ~ 264 GB RAM. Python: 3.8.6
[05:07 PM]: Running funannotate v1.8.3
[05:07 PM]: Skipping CodingQuarry as no --rna_bam passed
[05:07 PM]: Parsed training data, run ab-initio gene predictors as follows:
  Program      Training-Method
  augustus     pretrained
  genemark     selftraining
  glimmerhmm   busco
  snap         busco
[05:07 PM]: Loading genome assembly and parsing soft-masked repetitive sequences
[05:07 PM]: Genome loaded: 6 scaffolds; 3,776,588 bp; 19.75% repeats masked
[05:07 PM]: Mapping 1,065 proteins to genome using diamond and exonerate
[05:07 PM]: Found 1,784 preliminary alignments --> aligning with exonerate
[05:09 PM]: Exonerate finished: found 1,433 alignments
[05:09 PM]: Running GeneMark-ES on assembly
[05:09 PM]: GeneMark-ES failed: annotate/predict_misc/genemark/output/gmhmm.mod file missing, please check logfiles.
[05:09 PM]: GeneMark predictions failed. If you can run GeneMark outside of funannotate, then pass the results to --genemark_gtf.
[05:09 PM]: Running BUSCO to find conserved gene models for training ab-initio predictors
[05:27 PM]: 175 valid BUSCO predictions found, validating protein sequences
[05:28 PM]: 175 BUSCO predictions validated
[05:28 PM]: Running Augustus gene prediction using saccharomyces parameters
[05:33 PM]: 1,492 predictions from Augustus
[05:33 PM]: Pulling out high quality Augustus predictions
[05:33 PM]: Found 372 high quality predictions from Augustus (>90% exon evidence)
[05:33 PM]: Running SNAP gene prediction, using training data: annotate/predict_misc/busco.final.gff3
[05:34 PM]: 0 predictions from SNAP
[05:34 PM]: SNAP prediction failed, moving on without result
[05:34 PM]: Running GlimmerHMM gene prediction, using training data: annotate/predict_misc/busco.final.gff3
[05:38 PM]: 1,597 predictions from GlimmerHMM
[05:38 PM]: Summary of gene models passed to EVM (weights):
  Source         Weight   Count
  Augustus       1        1331
  Augustus HiQ   2        373
  GlimmerHMM     1        1597
  Total          -        3301
[05:38 PM]: EVM: partitioning input to ~ 35 genes per partition
[05:46 PM]: Converting to GFF3 and collecting all EVM results
[05:46 PM]: 1,727 total gene models from EVM
[05:46 PM]: Generating protein fasta files from 1,727 EVM models
[05:46 PM]: now filtering out bad gene models (< 50 aa in length, transposable elements, etc).
[05:46 PM]: Found 154 gene models to remove: 0 too short; 0 span gaps; 154 transposable elements
[05:46 PM]: 1,573 gene models remaining
[05:46 PM]: Predicting tRNAs
[05:47 PM]: 112 tRNAscan models are valid (non-overlapping)
[05:47 PM]: Generating GenBank tbl annotation file
[05:48 PM]: Converting to final Genbank format
[05:48 PM]: Collecting final annotation files for 1,685 total gene models
[05:48 PM]: Funannotate predict is finished, output files are in the annotate/predict_results folder
[05:48 PM]: Your next step might be functional annotation, suggested commands:
-------------------------------------------------------
Run InterProScan (Docker required):
funannotate iprscan -i annotate -m docker -c 8

Run antiSMASH:
funannotate remote -i annotate -m antismash -e youremail@server.edu

Annotate Genome:
funannotate annotate -i annotate --cpus 8 --sbt yourSBTfile.txt
-------------------------------------------------------

[05:48 PM]: Training parameters file saved: annotate/predict_results/saccharomyces.parameters.json
[05:48 PM]: Add species parameters to database:

  funannotate species -s saccharomyces -a annotate/predict_results/saccharomyces.parameters.json

#########################################################
SUCCESS: `funannotate predict` test complete.
#########################################################

#########################################################
Running `funannotate predict` BUSCO-mediated training unit testing
CMD: funannotate predict -i test.softmasked.fa --protein_evidence protein.evidence.fasta -o annotate --cpus 8 --species Awesome busco
#########################################################
-------------------------------------------------------
[05:48 PM]: OS: Debian GNU/Linux 10, 24 cores, ~ 264 GB RAM. Python: 3.8.6
[05:48 PM]: Running funannotate v1.8.3
[05:48 PM]: Skipping CodingQuarry as no --rna_bam passed
[05:48 PM]: Parsed training data, run ab-initio gene predictors as follows:
  Program      Training-Method
  augustus     busco
  genemark     selftraining
  glimmerhmm   busco
  snap         busco
[05:48 PM]: Loading genome assembly and parsing soft-masked repetitive sequences
[05:48 PM]: Genome loaded: 6 scaffolds; 3,776,588 bp; 19.75% repeats masked
[05:48 PM]: Mapping 1,065 proteins to genome using diamond and exonerate
[05:48 PM]: Found 1,784 preliminary alignments --> aligning with exonerate
[05:49 PM]: Exonerate finished: found 1,435 alignments
[05:49 PM]: Running GeneMark-ES on assembly
[05:49 PM]: GeneMark-ES failed: annotate/predict_misc/genemark/output/gmhmm.mod file missing, please check logfiles.
[05:49 PM]: GeneMark predictions failed. If you can run GeneMark outside of funannotate, then pass the results to --genemark_gtf.
[05:49 PM]: Running BUSCO to find conserved gene models for training ab-initio predictors
[06:02 PM]: 175 valid BUSCO predictions found, validating protein sequences
[06:02 PM]: 175 BUSCO predictions validated
[06:02 PM]: Not enough gene models 175 to train Augustus (200 required), exiting
#########################################################
Traceback (most recent call last):
  File "/home/rpetit3/miniconda3/envs/funannotate/bin/funannotate", line 713, in <module>
    main()
  File "/home/rpetit3/miniconda3/envs/funannotate/bin/funannotate", line 703, in main
    mod.main(arguments)
  File "/home/rpetit3/miniconda3/envs/funannotate/lib/python3.8/site-packages/funannotate/test.py", line 406, in main
    runBuscoTest(args)
  File "/home/rpetit3/miniconda3/envs/funannotate/lib/python3.8/site-packages/funannotate/test.py", line 199, in runBuscoTest
    assert 1500 <= countGFFgenes(os.path.join(
  File "/home/rpetit3/miniconda3/envs/funannotate/lib/python3.8/site-packages/funannotate/test.py", line 44, in countGFFgenes
    with open(input, 'r') as f:
FileNotFoundError: [Errno 2] No such file or directory: 'test-busco_3801/annotate/predict_results/Awesome_busco.gff3'

funannotate check output

OS/Install Information OS info:

uname -a
Linux loma 4.19.0-9-amd64 #1 SMP Debian 4.19.118-2+deb10u1 (2020-06-07) x86_64 GNU/Linux

Versions info

funannotate check --show-versions
-------------------------------------------------------
Checking dependencies for 1.8.3
-------------------------------------------------------
You are running Python v 3.8.6. Now checking python packages...
biopython: 1.78
goatools: 1.0.15
matplotlib: 3.3.4
natsort: 7.1.1
numpy: 1.20.1
pandas: 1.2.2
psutil: 5.8.0
requests: 2.25.1
scikit-learn: 0.24.1
scipy: 1.6.0
seaborn: 0.11.1
All 11 python packages installed


You are running Perl v b'5.026002'. Now checking perl modules...
Bio::Perl: 1.007002
Carp: 1.38
Clone: 0.42
DBD::SQLite: 1.64
DBD::mysql: 4.046
DBI: 1.642
DB_File: 1.855
Data::Dumper: 2.173
File::Basename: 2.85
File::Which: 1.23
Getopt::Long: 2.5
Hash::Merge: 0.300
JSON: 4.02
LWP::UserAgent: 6.39
Logger::Simple: 2.0
POSIX: 1.76
Parallel::ForkManager: 2.02
Pod::Usage: 1.69
Scalar::Util::Numeric: 0.40
Storable: 3.15
Text::Soundex: 3.05
Thread::Queue: 3.12
Tie::File: 1.02
URI::Escape: 3.31
YAML: 1.29
threads: 2.15
threads::shared: 1.56
All 27 Perl modules installed


Checking Environmental Variables...
$FUNANNOTATE_DB=/home/rpetit3/funannotate_db
$PASAHOME=/home/rpetit3/miniconda3/envs/funannotate/opt/pasa-2.4.1
$TRINITY_HOME=/home/rpetit3/miniconda3/envs/funannotate/opt/trinity-2.8.5
$EVM_HOME=/home/rpetit3/miniconda3/envs/funannotate/opt/evidencemodeler-1.1.1
$AUGUSTUS_CONFIG_PATH=/home/rpetit3/miniconda3/envs/funannotate/config/
$GENEMARK_PATH=/home/rpetit3/miniconda3/envs/funannotate/share/genemark
All 6 environmental variables are set
-------------------------------------------------------
Checking external dependencies...
PASA: 2.4.1
CodingQuarry: 2.0
Trinity: 2.8.5
augustus: 3.4.0
bamtools: bamtools 2.5.1
bedtools: bedtools v2.30.0
blat: BLAT v36
diamond: 2.0.7
ete3: 3.1.2
exonerate: exonerate 2.4.0
fasta: no way to determine
glimmerhmm: 3.0.4
gmap: 2017-11-15
hisat2: 2.2.1
hmmscan: HMMER 3.3.2 (Nov 2020)
hmmsearch: HMMER 3.3.2 (Nov 2020)
java: 11.0.8-internal
kallisto: 0.46.1
mafft: v7.475 (2020/Nov/23)
makeblastdb: makeblastdb 2.2.31+
minimap2: 2.17-r941
proteinortho: 6.0.28
pslCDnaFilter: no way to determine
salmon: salmon 0.14.1
samtools: samtools 1.10
snap: 2006-07-28
stringtie: 2.1.4
tRNAscan-SE: 2.0.7 (Oct 2020)
tantan: tantan 26
tbl2asn: no way to determine, likely 25.X
tblastn: tblastn 2.2.31+
trimal: trimAl v1.4.rev15 build[2013-12-17]
trimmomatic: 0.39
        ERROR: emapper.py not installed
        ERROR: gmes_petap.pl not installed
        ERROR: signalp not installed

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:5 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
nextgenusfscommented, Feb 18, 2021

Downgrade Augustus – in the newest version they changed something and output is failing in BUSCO. So anything < 3.4 should work.

0reactions
rpetit3commented, Feb 18, 2021

I’ll update the recipe.

And yeah, I agree its probably GeneMark.

I’ll close this for now, thanks for the help Jon!

Read more comments on GitHub >

github_iconTop Results From Across the Web

Funannotate Commands — Funannotate 1.7.0 documentation
This script is meant to be run with a haploid genome, it has not been tested as a method to haplodize a polyploid...
Read more >
Genome annotation with Funannotate - Galaxy Training!
Funannotate uses ab-initio predictors (Augustus, SNAP, glimmerHMM, CodingQuarry and GeneMark-ES/ET (optional due to licensing)) to improve its predictions: ...
Read more >
Funannotate Documentation - Read the Docs
Parse PASA gene models and use to train/run Augustus, snap, GlimmerHMM. 5. Extract high-quality Augustus predictions (HiQ).
Read more >
funannotate: a pipeline for genome annotation - NIH HPC
Funannotate is a genome prediction, annotation, and comparison software ... RNA-seq mediated training of Augustus/GeneMark predict Run gene ...
Read more >
Eukaryotic Genome Annotation in 2016 - Biostars
7) Train Augustus with these "high-confidence" EVM gene models. 8) Re-use PASA, Exonerate, SNAP, RepeatMasker as hints for Augustus and run gene predictions....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found