blasr - Map SMRT Sequences to a reference genome
blasr reads.bam genome.fasta
--bam --out out.bam
blasr reads.fasta genome.fasta
blasr reads.fasta genome.fasta
--sa genome.fasta.sa
blasr reads.bax.h5 genome.fasta [--sa genome.fasta.sa]
blasr reads.bax.h5 genome.fasta
--sa genome.fasta.sa
--maxScore 100
--minMatch 15 ...
blasr reads.bax.h5 genome.fasta
--sa genome.fasta.sa
--nproc 24
--out alignment.out ...
blasr is a read mapping program that maps reads to positions in a genome by
clustering short exact matches between the read and the genome, and scoring
clusters using alignment. The matches are generated by searching all suffixes
of a read against the genome using a suffix array. Global chaining methods are
used to score clusters of matches.
The only required inputs to blasr are a file of reads and a reference genome. It
is exremely useful to have read filtering information, and mapping runtime may
decrease substantially when a precomputed suffix array index on the reference
sequence is specified.
Although reads may be input in FASTA format, the recommended input is PacBio BAM
files because these contain quality value information that is used in the
alignment and produces higher quality variant detection. Although alignments
can be output in various formats, the recommended output format is PacBio BAM.
Support to bax.h5 and plx.h5 files will be DEPRECATED. Support to region
tables for h5 files will be DEPRECATED.
When suffix array index of a genome is not specified, the suffix array is built
before producing alignment. This may be prohibitively slow when the genome is
large (e.g. Human). It is best to precompute the suffix array of a genome
using the program sawriter, and then specify the suffix array on the command
line using
-sa genome.fa.sa.
The optional parameters are roughly divided into three categories: control over
anchoring, alignment scoring, and output.
The default anchoring parameters are optimal for small genomes and samples with
up to 5% divergence from the reference genome. The main parameter governing
speed and sensitivity is the
-minMatch parameter. For human genome
alignments, a value of 11 or higher is recommended. Several methods may be
used to speed up alignments, at the expense of possibly decreasing
sensitivity.
Regions that are too repetitive may be ignored during mapping by limiting the
number of positions a read maps to with the
-maxAnchorsPerPosition
option. Values between 500 and 1000 are effective in the human genome.
For small genomes such as bacterial genomes or BACs, the default parameters are
sufficient for maximal sensitivity and good speed.
This manpage was written by Andreas Tille for the Debian distribution and can be
used for any other usage of the program.