alf - Alignment free sequence comparison
alf [
OPTIONS]
-i IN.FASTA [
-o OUT.TXT]
Compute pairwise similarity of sequences using alignment-free methods in
IN.FASTA and write out tab-delimited matrix with pairwise scores to
OUT.TXT.
-
-h, --help
- Display the help message.
- --version
- Display version information.
-
-v, --verbose
- When given, details about the progress are printed to the
screen.
-
-i, --input-file INPUT_FILE
- Name of the multi-FASTA input file. Valid filetypes are:
.sam[.*], .raw[.*], .gbk[.*], .frn[.*],
.fq[.*], .fna[.*], .ffn[.*], .fastq[.*],
.fasta[.*], .faa[.*], .fa[.*], .embl[.*], and
.bam, where * is any of the following extensions: gz,
bz2, and bgzf for transparent (de)compression.
-
-o, --output-file OUTPUT_FILE
- Name of the file to which the tab-delimtied matrix with
pairwise scores will be written to. Default is to write to stdout. Valid
filetype is: .alf[.*], where * is any of the following extensions:
tsv for transparent (de)compression.
-
-m, --method STRING
- Select method to use. One of N2, D2,
D2Star, and D2z. Default: N2.
-
-k, --k-mer-size INTEGER
- Size of the k-mers. Default: 4.
-
-mo, --bg-model-order INTEGER
- Order of background Markov Model. Default: 1.
-
-rc, --reverse-complement STRING
- Which strand to score. Use both_strands to score
both strands simultaneously. One of input, both_strands,
mean, min, and max. Default: input.
-
-mm, --mismatches INTEGER
- Number of mismatches, one of 0 and 1. When
1 is used, N2 uses the k-mer-neighbour with one mismatch. Default:
0.
-
-mmw, --mismatch-weight DOUBLE
- Real-valued weight of counts for words with mismatches.
Default: 0.1.
-
-kwf, --k-mer-weights-file
OUTPUT_FILE
- Print k-mer weights for every sequence to this file if
given. Valid filetype is: .txt.
- For questions or comments, contact:
- Jonathan Goeke <[email protected]>
- Please reference the following publication if you used ALF
or the N2 method for your analysis:
- Jonathan Goeke, Marcel H. Schulz, Julia Lasserre, and
Martin Vingron. Estimation of Pairwise Sequence Similarity of Mammalian
Enhancers with Word Neighbourhood Counts. Bioinformatics (2012).
- Project Homepage:
- http://www.seqan.de/projects/alf