andi - estimates evolutionary distances
andi [
OPTIONS...]
FILES...
andi estimates the evolutionary distance between closely related genomes.
For this
andi reads the input sequences from
FASTA files and
computes the pairwise anchor distance. The idea behind this is explained in a
paper by Haubold et al. (2015).
The output is a symmetrical distance matrix in
PHYLIP format, with each
entry representing divergence with a positive real number. A distance of zero
means that two sequences are identical, whereas other values are estimates for
the nucleotide substitution rate (Jukes-Cantor corrected). For technical
reasons the comparison might fail and no estimate can be computed. In such
cases
nan is printed. This either means that the input sequences were
too short (<200bp) or too diverse (K>0.5) for our method to work
properly.
-
-b INT, --bootstrap=INT
- Compute multiple distance matrices, with n-1
bootstrapped from the first. See the paper Klötzl & Haubold
(2016) for a detailed explanation.
-
--file-of-filenames=FILE
- Usually, andi is called with the filenames as
commandline arguments. With this option the filenames may also be read
from a file itself, with one name per line. Use a single dash (
'-') to read from stdin.
-
-j, --join
- Use this mode if each of your FASTA files represents
one assembly with numerous contigs. andi will then treat all of the
contained sequences per file as a single genome. In this mode at least one
filename must be provided via command line arguments. For the output the
filename is used to identify each sequence.
-
-l, --low-memory
- In multithreaded mode, andi requires memory linear
to the amount of threads. The low memory mode changes this to a constant
demand independent from the used number of threads. Unfortunately, this
comes at a significant runtime cost.
-
-m MODEL, --model=MODEL
- Set the nucleotide evolution model to one of 'Raw', 'JC',
'Kimura', or 'LogDet'. By default the Jukes-Cantor correction is
used.
-
-p FLOAT
- Significance of an anchor; default: 0.025.
-
--progress[=WHEN]
- Print a progress bar. WHEN can be 'auto' (default if
omitted), 'always', or 'never'.
-
-t INT, --threads=INT
- The number of threads to be used; by default, all available
processors are used.
Multithreading is only available if andi was compiled with OpenMP
support.
- --truncate-names
- By default andi outputs the full names of sequences,
optionally padded with spaces, if they are shorter than ten characters.
Names longer than ten characters may lead to problems with downstream
tools. With this switch names will be truncated.
-
-v, --verbose
- Prints additional information, including the amount of
found homology. Apply multiple times for extra verboseness.
-
-h, --help
- Prints the synopsis and an explanation of available
options.
- --version
- Outputs version information and acknowledgments.
Copyright © 2014 - 2021 Fabian Klötzl License GPLv3+: GNU GPL
version 3 or later.
This is free software: you are free to change and redistribute it. There is NO
WARRANTY, to the extent permitted by law. The full license text is available
at <
http://gnu.org/licenses/gpl.html>.
1) andi: Haubold, B. Klötzl, F. and Pfaffelhuber, P. (2015). andi: Fast
and accurate estimation of evolutionary distances between closely related
genomes, Bioinformatics 31.8.
2) Algorithms: Ohlebusch, E. (2013). Bioinformatics Algorithms. Sequence
Analysis, Genome Rearrangements, and Phylogenetic Reconstruction. pp 118f.
3) SA construction: Mori, Y. (2005). libdivsufsort, unpublished.
4) Bootstrapping: Klötzl, F. and Haubold, B. (2016). Support Values for
Genome Phylogenies, Life 6.1.
Please report bugs to <
[email protected]> or at
<
https://github.com/EvolBioInf/andi>.