samtools-fasta,-samtools-fastq - converts a SAM/BAM/CRAM file to FASTA or FASTQ
samtools fastq [
options]
in.bam
samtools fasta [
options]
in.bam
Converts a BAM or CRAM into either FASTQ or FASTA format depending on the
command invoked. The files will be automatically compressed if the file names
have a .gz or .bgzf extension.
If the input contains read-pairs which are to be interleaved or written to
separate files in the same order, then the input should be first collated by
name. Use
samtools collate or
samtools sort -n to ensure this.
For each different QNAME, the input records are categorised according to the
state of the READ1 and READ2 flag bits. The three categories used are:
1 : Only READ1 is set.
2 : Only READ2 is set.
0 : Either both READ1 and READ2 are set; or neither is set.
The exact meaning of these categories depends on the sequencing technology used.
It is expected that ordinary single and paired-end sequencing reads will be in
categories 1 and 2 (in the case of paired-end reads, one read of the pair will
be in category 1, the other in category 2). Category 0 is essentially a
“catch-all” for reads that do not fit into a simple paired-end
sequencing model.
For each category only one sequence will be written for a given QNAME. If more
than one record is available for a given QNAME and category, the first in
input file order that has quality values will be used. If none of the
candidate records has quality values, then the first in input file order will
be used instead.
Sequences will be written to standard output unless one of the
-1,
-2,
-o, or
-0 options is used, in which case sequences
for that category will be written to the specified file. The same filename may
be specified with multiple options, in which case the sequences will be
multiplexed in order of occurrence.
If a singleton file is specified using the
-s option then only paired
sequences will be output for categories 1 and 2; paired meaning that for a
given QNAME there are sequences for both category 1
and 2. If there is
a sequence for only one of categories 1 or 2 then it will be diverted into the
specified singletons file. This can be used to prepare fastq files for
programs that cannot handle a mixture of paired and singleton reads.
The
-s option only affects category 1 and 2 records. The output for
category 0 will be the same irrespective of the use of this option.
- -n
- By default, either '/1' or '/2' is added to the end of read
names where the corresponding READ1 or READ2 FLAG bit is set. Using
-n causes read names to be left as they are.
- -N
- Always add either '/1' or '/2' to the end of read names
even when put into different files.
- -O
- Use quality values from OQ tags in preference to standard
quality string if available.
- -s FILE
- Write singleton reads to FILE.
- -t
- Copy RG, BC and QT tags to the FASTQ header line, if they
exist.
- -T TAGLIST
- Specify a comma-separated list of tags to copy to the FASTQ
header line, if they exist. TAGLIST can be blank or * to
indicate all tags should be copied to the output. If using *, be
careful to quote it to avoid unwanted shell expansion.
- -1 FILE
- Write reads with the READ1 FLAG set (and READ2 not set) to
FILE instead of outputting them. If the -s option is used, only
paired reads will be written to this file.
- -2 FILE
- Write reads with the READ2 FLAG set (and READ1 not set) to
FILE instead of outputting them. If the -s option is used, only
paired reads will be written to this file.
- -o FILE
- Write reads with either READ1 FLAG or READ2 flag set to
FILE instead of outputting them to stdout. This is equivalent to -1
FILE -2 FILE.
- -0 FILE
- Write reads where the READ1 and READ2 FLAG bits set are
either both set or both unset to FILE instead of outputting them.
-
-f INT
- Only output alignments with all bits set in INT
present in the FLAG field. INT can be specified in hex by beginning
with `0x' (i.e. /^0x[0-9A-F]+/) or in octal by beginning with `0' (i.e.
/^0[0-7]+/) [0].
-
-F INT
- Do not output alignments with any bits set in INT
present in the FLAG field. INT can be specified in hex by beginning
with `0x' (i.e. /^0x[0-9A-F]+/) or in octal by beginning with `0' (i.e.
/^0[0-7]+/) [0x900]. This defaults to 0x900 representing filtering of
secondary and supplementary alignments.
-
-G INT
- Only EXCLUDE reads with all of the bits set in INT
present in the FLAG field. INT can be specified in hex by beginning
with `0x' (i.e. /^0x[0-9A-F]+/) or in octal by beginning with `0' (i.e.
/^0[0-7]+/) [0].
- -i
- add Illumina Casava 1.8 format entry to header (eg
1:N:0:ATCACG)
- -c [0..9]
- set compression level when writing gz or bgzf fastq
files.
- --i1 FILE
- write first index reads to FILE
- --i2 FILE
- write second index reads to FILE
- --barcode-tag TAG
- aux tag to find index reads in [default: BC]
- --quality-tag TAG
- aux tag to find index quality in [default: QT]
-
-@, --threads INT
- Number of input/output compression threads to use in
addition to main thread [0].
- --index-format STR
- string to describe how to parse the barcode and quality
tags. For example:
- i14i8
- the first 14 characters are index 1, the next 8 characters
are index 2
- n8i14
- ignore the first 8 characters, and use the next 14
characters for index 1
If the tag contains a separator, then the numeric part can be replaced with
'*' to mean 'read until the separator or end of tag', for example:
- n*i*
- ignore the left part of the tag until the separator, then
use the second part
Starting from a coordinate sorted file, output paired reads to separate files,
discarding singletons, supplementary and secondary reads. The resulting files
can be used with, for example, the
bwa aligner.
samtools collate -u -O in_pos.bam | \
samtools fastq -1 paired1.fq -2 paired2.fq -0 /dev/null -s /dev/null -n
Starting with a name collated file, output paired and singleton reads in a
single file, discarding supplementary and secondary reads. To get all of the
reads in a single file, it is necessary to redirect the output of samtools
fastq. The output file is suitable for use with
bwa mem -p which
understands interleaved files containing a mixture of paired and singleton
reads.
samtools fastq -0 /dev/null in_name.bam > all_reads.fq
Output paired reads in a single file, discarding supplementary and secondary
reads. Save any singletons in a separate file. Append /1 and /2 to read names.
This format is suitable for use by
NextGenMap when using its
-p
and
-q options. With this aligner, paired reads must be mapped
separately to the singletons.
samtools fastq -0 /dev/null -s single.fq -N in_name.bam > paired.fq
- o
- The way of specifying output files is far too complicated
and easy to get wrong.
Written by Heng Li, with modifications by Martin Pollard and Jennifer Liddle,
all from the Sanger Institute.
samtools(1),
samtools-faidx(1),
samtools-fqidx(1)
samtools-import(1)
Samtools website: <
http://www.htslib.org/>