funannotate test - Not enough gene models to train Augustus
See original GitHub issueAre you using the latest release? Yep (I think)! Installed through Conda
funannotate --version
funannotate v1.8.3
Describe the bug I ran the test set, and I got the following error:
[06:02 PM]: Not enough gene models 175 to train Augustus (200 required), exiting
This might be related to https://github.com/nextgenusfs/funannotate/issues/418 (tblastn exiting earliy), but there isn’t a way to set --min_training_models. I’m trying a rerun with 1 cpu to rule out multithreaded tblastn causing BUSCO to exit early. Will let you know if it succeeds or not.
What command did you issue?
funannotate test -t all --cpus 8
Logfiles
funannotate test -t all --cpus 8 #########################################################
Running `funannotate clean` unit testing: minimap2 mediated assembly duplications
Downloading: https://osf.io/8pjbe/download?version=1 Bytes: 252076
CMD: funannotate clean -i test.clean.fa -o test.exhaustive.fa --exhaustive
#########################################################
-----------------------------------------------
6 input contigs, 6 larger than 500 bp, N50 is 427,039 bp
Checking duplication of 6 contigs
-----------------------------------------------
scaffold_73 appears duplicated: 100% identity over 100% of the contig. contig length: 15153
scaffold_91 appears duplicated: 100% identity over 100% of the contig. contig length: 8858
scaffold_27 appears duplicated: 100% identity over 100% of the contig. contig length: 427039
-----------------------------------------------
6 input contigs; 6 larger than 500 bp; 3 duplicated; 3 written to file
#########################################################
SUCCESS: `funannotate clean` test complete.
#########################################################
#########################################################
Running `funannotate mask` unit testing: RepeatModeler --> RepeatMasker
Downloading: https://osf.io/hbryz/download?version=1 Bytes: 375687
CMD: funannotate mask -i test.fa -o test.masked.fa --cpus 8
#########################################################
-------------------------------------------------------
[05:07 PM]: OS: Debian GNU/Linux 10, 24 cores, ~ 264 GB RAM. Python: 3.8.6
[05:07 PM]: Running funanotate v1.8.3
[05:07 PM]: Soft-masking simple repeats with tantan
[05:07 PM]: Repeat soft-masking finished:
Masked genome: /home/rpetit3/test-genemark/test-mask_3801/test.masked.fa
num scaffolds: 2
assembly size: 1,216,048 bp
masked repeats: 50,965 bp (4.19%)
-------------------------------------------------------
#########################################################
SUCCESS: `funannotate mask` test complete.
#########################################################
#########################################################
Running `funannotate predict` unit testing
Downloading: https://osf.io/te2pf/download?version=1 Bytes: 1489808
CMD: funannotate predict -i test.softmasked.fa --protein_evidence protein.evidence.fasta -o annotate --augustus_species saccharomyces --cpus 8 --species Awesome testicus
#########################################################
-------------------------------------------------------
[05:07 PM]: OS: Debian GNU/Linux 10, 24 cores, ~ 264 GB RAM. Python: 3.8.6
[05:07 PM]: Running funannotate v1.8.3
[05:07 PM]: Skipping CodingQuarry as no --rna_bam passed
[05:07 PM]: Parsed training data, run ab-initio gene predictors as follows:
Program Training-Method
augustus pretrained
genemark selftraining
glimmerhmm busco
snap busco
[05:07 PM]: Loading genome assembly and parsing soft-masked repetitive sequences
[05:07 PM]: Genome loaded: 6 scaffolds; 3,776,588 bp; 19.75% repeats masked
[05:07 PM]: Mapping 1,065 proteins to genome using diamond and exonerate
[05:07 PM]: Found 1,784 preliminary alignments --> aligning with exonerate
[05:09 PM]: Exonerate finished: found 1,433 alignments
[05:09 PM]: Running GeneMark-ES on assembly
[05:09 PM]: GeneMark-ES failed: annotate/predict_misc/genemark/output/gmhmm.mod file missing, please check logfiles.
[05:09 PM]: GeneMark predictions failed. If you can run GeneMark outside of funannotate, then pass the results to --genemark_gtf.
[05:09 PM]: Running BUSCO to find conserved gene models for training ab-initio predictors
[05:27 PM]: 175 valid BUSCO predictions found, validating protein sequences
[05:28 PM]: 175 BUSCO predictions validated
[05:28 PM]: Running Augustus gene prediction using saccharomyces parameters
[05:33 PM]: 1,492 predictions from Augustus
[05:33 PM]: Pulling out high quality Augustus predictions
[05:33 PM]: Found 372 high quality predictions from Augustus (>90% exon evidence)
[05:33 PM]: Running SNAP gene prediction, using training data: annotate/predict_misc/busco.final.gff3
[05:34 PM]: 0 predictions from SNAP
[05:34 PM]: SNAP prediction failed, moving on without result
[05:34 PM]: Running GlimmerHMM gene prediction, using training data: annotate/predict_misc/busco.final.gff3
[05:38 PM]: 1,597 predictions from GlimmerHMM
[05:38 PM]: Summary of gene models passed to EVM (weights):
Source Weight Count
Augustus 1 1331
Augustus HiQ 2 373
GlimmerHMM 1 1597
Total - 3301
[05:38 PM]: EVM: partitioning input to ~ 35 genes per partition
[05:46 PM]: Converting to GFF3 and collecting all EVM results
[05:46 PM]: 1,727 total gene models from EVM
[05:46 PM]: Generating protein fasta files from 1,727 EVM models
[05:46 PM]: now filtering out bad gene models (< 50 aa in length, transposable elements, etc).
[05:46 PM]: Found 154 gene models to remove: 0 too short; 0 span gaps; 154 transposable elements
[05:46 PM]: 1,573 gene models remaining
[05:46 PM]: Predicting tRNAs
[05:47 PM]: 112 tRNAscan models are valid (non-overlapping)
[05:47 PM]: Generating GenBank tbl annotation file
[05:48 PM]: Converting to final Genbank format
[05:48 PM]: Collecting final annotation files for 1,685 total gene models
[05:48 PM]: Funannotate predict is finished, output files are in the annotate/predict_results folder
[05:48 PM]: Your next step might be functional annotation, suggested commands:
-------------------------------------------------------
Run InterProScan (Docker required):
funannotate iprscan -i annotate -m docker -c 8
Run antiSMASH:
funannotate remote -i annotate -m antismash -e youremail@server.edu
Annotate Genome:
funannotate annotate -i annotate --cpus 8 --sbt yourSBTfile.txt
-------------------------------------------------------
[05:48 PM]: Training parameters file saved: annotate/predict_results/saccharomyces.parameters.json
[05:48 PM]: Add species parameters to database:
funannotate species -s saccharomyces -a annotate/predict_results/saccharomyces.parameters.json
#########################################################
SUCCESS: `funannotate predict` test complete.
#########################################################
#########################################################
Running `funannotate predict` BUSCO-mediated training unit testing
CMD: funannotate predict -i test.softmasked.fa --protein_evidence protein.evidence.fasta -o annotate --cpus 8 --species Awesome busco
#########################################################
-------------------------------------------------------
[05:48 PM]: OS: Debian GNU/Linux 10, 24 cores, ~ 264 GB RAM. Python: 3.8.6
[05:48 PM]: Running funannotate v1.8.3
[05:48 PM]: Skipping CodingQuarry as no --rna_bam passed
[05:48 PM]: Parsed training data, run ab-initio gene predictors as follows:
Program Training-Method
augustus busco
genemark selftraining
glimmerhmm busco
snap busco
[05:48 PM]: Loading genome assembly and parsing soft-masked repetitive sequences
[05:48 PM]: Genome loaded: 6 scaffolds; 3,776,588 bp; 19.75% repeats masked
[05:48 PM]: Mapping 1,065 proteins to genome using diamond and exonerate
[05:48 PM]: Found 1,784 preliminary alignments --> aligning with exonerate
[05:49 PM]: Exonerate finished: found 1,435 alignments
[05:49 PM]: Running GeneMark-ES on assembly
[05:49 PM]: GeneMark-ES failed: annotate/predict_misc/genemark/output/gmhmm.mod file missing, please check logfiles.
[05:49 PM]: GeneMark predictions failed. If you can run GeneMark outside of funannotate, then pass the results to --genemark_gtf.
[05:49 PM]: Running BUSCO to find conserved gene models for training ab-initio predictors
[06:02 PM]: 175 valid BUSCO predictions found, validating protein sequences
[06:02 PM]: 175 BUSCO predictions validated
[06:02 PM]: Not enough gene models 175 to train Augustus (200 required), exiting
#########################################################
Traceback (most recent call last):
File "/home/rpetit3/miniconda3/envs/funannotate/bin/funannotate", line 713, in <module>
main()
File "/home/rpetit3/miniconda3/envs/funannotate/bin/funannotate", line 703, in main
mod.main(arguments)
File "/home/rpetit3/miniconda3/envs/funannotate/lib/python3.8/site-packages/funannotate/test.py", line 406, in main
runBuscoTest(args)
File "/home/rpetit3/miniconda3/envs/funannotate/lib/python3.8/site-packages/funannotate/test.py", line 199, in runBuscoTest
assert 1500 <= countGFFgenes(os.path.join(
File "/home/rpetit3/miniconda3/envs/funannotate/lib/python3.8/site-packages/funannotate/test.py", line 44, in countGFFgenes
with open(input, 'r') as f:
FileNotFoundError: [Errno 2] No such file or directory: 'test-busco_3801/annotate/predict_results/Awesome_busco.gff3'
funannotate check output
OS/Install Information OS info:
uname -a
Linux loma 4.19.0-9-amd64 #1 SMP Debian 4.19.118-2+deb10u1 (2020-06-07) x86_64 GNU/Linux
Versions info
funannotate check --show-versions
-------------------------------------------------------
Checking dependencies for 1.8.3
-------------------------------------------------------
You are running Python v 3.8.6. Now checking python packages...
biopython: 1.78
goatools: 1.0.15
matplotlib: 3.3.4
natsort: 7.1.1
numpy: 1.20.1
pandas: 1.2.2
psutil: 5.8.0
requests: 2.25.1
scikit-learn: 0.24.1
scipy: 1.6.0
seaborn: 0.11.1
All 11 python packages installed
You are running Perl v b'5.026002'. Now checking perl modules...
Bio::Perl: 1.007002
Carp: 1.38
Clone: 0.42
DBD::SQLite: 1.64
DBD::mysql: 4.046
DBI: 1.642
DB_File: 1.855
Data::Dumper: 2.173
File::Basename: 2.85
File::Which: 1.23
Getopt::Long: 2.5
Hash::Merge: 0.300
JSON: 4.02
LWP::UserAgent: 6.39
Logger::Simple: 2.0
POSIX: 1.76
Parallel::ForkManager: 2.02
Pod::Usage: 1.69
Scalar::Util::Numeric: 0.40
Storable: 3.15
Text::Soundex: 3.05
Thread::Queue: 3.12
Tie::File: 1.02
URI::Escape: 3.31
YAML: 1.29
threads: 2.15
threads::shared: 1.56
All 27 Perl modules installed
Checking Environmental Variables...
$FUNANNOTATE_DB=/home/rpetit3/funannotate_db
$PASAHOME=/home/rpetit3/miniconda3/envs/funannotate/opt/pasa-2.4.1
$TRINITY_HOME=/home/rpetit3/miniconda3/envs/funannotate/opt/trinity-2.8.5
$EVM_HOME=/home/rpetit3/miniconda3/envs/funannotate/opt/evidencemodeler-1.1.1
$AUGUSTUS_CONFIG_PATH=/home/rpetit3/miniconda3/envs/funannotate/config/
$GENEMARK_PATH=/home/rpetit3/miniconda3/envs/funannotate/share/genemark
All 6 environmental variables are set
-------------------------------------------------------
Checking external dependencies...
PASA: 2.4.1
CodingQuarry: 2.0
Trinity: 2.8.5
augustus: 3.4.0
bamtools: bamtools 2.5.1
bedtools: bedtools v2.30.0
blat: BLAT v36
diamond: 2.0.7
ete3: 3.1.2
exonerate: exonerate 2.4.0
fasta: no way to determine
glimmerhmm: 3.0.4
gmap: 2017-11-15
hisat2: 2.2.1
hmmscan: HMMER 3.3.2 (Nov 2020)
hmmsearch: HMMER 3.3.2 (Nov 2020)
java: 11.0.8-internal
kallisto: 0.46.1
mafft: v7.475 (2020/Nov/23)
makeblastdb: makeblastdb 2.2.31+
minimap2: 2.17-r941
proteinortho: 6.0.28
pslCDnaFilter: no way to determine
salmon: salmon 0.14.1
samtools: samtools 1.10
snap: 2006-07-28
stringtie: 2.1.4
tRNAscan-SE: 2.0.7 (Oct 2020)
tantan: tantan 26
tbl2asn: no way to determine, likely 25.X
tblastn: tblastn 2.2.31+
trimal: trimAl v1.4.rev15 build[2013-12-17]
trimmomatic: 0.39
ERROR: emapper.py not installed
ERROR: gmes_petap.pl not installed
ERROR: signalp not installed
Issue Analytics
- State:
- Created 3 years ago
- Comments:5 (2 by maintainers)
Top Results From Across the Web
Funannotate Commands — Funannotate 1.7.0 documentation
This script is meant to be run with a haploid genome, it has not been tested as a method to haplodize a polyploid...
Read more >Genome annotation with Funannotate - Galaxy Training!
Funannotate uses ab-initio predictors (Augustus, SNAP, glimmerHMM, CodingQuarry and GeneMark-ES/ET (optional due to licensing)) to improve its predictions: ...
Read more >Funannotate Documentation - Read the Docs
Parse PASA gene models and use to train/run Augustus, snap, GlimmerHMM. 5. Extract high-quality Augustus predictions (HiQ).
Read more >funannotate: a pipeline for genome annotation - NIH HPC
Funannotate is a genome prediction, annotation, and comparison software ... RNA-seq mediated training of Augustus/GeneMark predict Run gene ...
Read more >Eukaryotic Genome Annotation in 2016 - Biostars
7) Train Augustus with these "high-confidence" EVM gene models. 8) Re-use PASA, Exonerate, SNAP, RepeatMasker as hints for Augustus and run gene predictions....
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Downgrade Augustus – in the newest version they changed something and output is failing in BUSCO. So anything < 3.4 should work.
I’ll update the recipe.
And yeah, I agree its probably GeneMark.
I’ll close this for now, thanks for the help Jon!