NAME

samtools-view - views and converts SAM/BAM/CRAM files

SYNOPSIS

samtools view [options] in.sam|in.bam|in.cram [region...]
 

DESCRIPTION

With no options or regions specified, prints all alignments in the specified input alignment file (in SAM, BAM, or CRAM format) to standard output in SAM format (with no header).
 
You may specify one or more space-separated region specifications after the input filename to restrict output to only those alignments which overlap the specified region(s). Use of region specifications requires a coordinate-sorted and indexed input file (in BAM or CRAM format).
 
The -b, -C, -1, -u, -h, -H, and -c options change the output format from the default of headerless SAM, and the -o and -U options set the output file name(s).
 
The -t and -T options provide additional reference data. One of these two options is required when SAM input does not contain @SQ headers, and the -T option is required whenever writing CRAM output.
 
The -L, -M, -N, -r, -R, -d, -D, -s, -q, -l, -m, -f, -F, -G, and --rf options filter the alignments that will be included in the output to only those alignments that match certain criteria.
 
The -p, option sets the UNMAP flag on filtered alignments then writes them to the output file.
 
The -x, -B, --add-flags, and --remove-flags options modify the data which is contained in each alignment.
 
The -X option can be used to allow user to specify customized index file location(s) if the data folder does not contain any index file. See EXAMPLES section for sample of usage.
 
Finally, the -@ option can be used to allocate additional threads to be used for compression, and the -? option requests a long help message.
 
REGIONS:
Regions can be specified as: RNAME[:STARTPOS[-ENDPOS]] and all position coordinates are 1-based.
 
Important note: when multiple regions are given, some alignments may be output multiple times if they overlap more than one of the specified regions.
 
Examples of region specifications:
chr1
Output all alignments mapped to the reference sequence named `chr1' (i.e. @SQ SN:chr1).
chr2:1000000
The region on chr2 beginning at base position 1,000,000 and ending at the end of the chromosome.
chr3:1000-2000
The 1001bp region on chr3 beginning at base position 1,000 and ending at base position 2,000 (including both end positions).
'*'
Output the unmapped reads at the end of the file. (This does not include any unmapped reads placed on a reference sequence alongside their mapped mates.)
.
Output all alignments. (Mostly unnecessary as not specifying a region at all has the same effect.)
 
 

OPTIONS

-b, --bam
Output in the BAM format.
-C, --cram
Output in the CRAM format (requires -T).
-1, --fast
Enable fast compression. This also changes the default output format to BAM, but this can be overridden by the explicit format options or using a filename with a known suffix.
-u, --uncompressed
Output uncompressed data. This also changes the default output format to BAM, but this can be overridden by the explicit format options or using a filename with a known suffix. This option saves time spent on compression/decompression and is thus preferred when the output is piped to another samtools command.
-h, --with-header
Include the header in the output.
-H, --header-only
Output the header only.
--no-header
When producing SAM format, output alignment records but not headers. This is the default; the option can be used to reset the effect of -h/-H.
-c, --count
Instead of printing the alignments, only count them and print the total number. All filter options, such as -f, -F, and -q, are taken into account. The -p option is ignored in this mode.
-?, --help
Output long help and exit immediately.
-o FILE, --output FILE
Output to FILE [stdout].
-U FILE, --unoutput FILE, --output-unselected FILE
Write alignments that are not selected by the various filter options to FILE. When this option is used, all alignments (or all alignments intersecting the regions specified) are written to either the output file or this file, but never both.
-p, --unmap
Set the UNMAP flag on alignments that are not selected by the filter options. These alignments are then written to the normal output. This is not compatible with -U.
-t FILE, --fai-reference FILE
A tab-delimited FILE. Each line must contain the reference name in the first column and the length of the reference in the second column, with one line for each distinct reference. Any additional fields beyond the second column are ignored. This file also defines the order of the reference sequences in sorting. If you run: `samtools faidx <ref.fa>', the resulting index file <ref.fa>.fai can be used as this FILE.
-T FILE, --reference FILE
A FASTA format reference FILE, optionally compressed by bgzip and ideally indexed by samtools faidx. If an index is not present one will be generated for you, if the reference file is local. If the reference file is not local, but is accessed instead via an https://, s3:// or other URL, the index file will need to be supplied by the server alongside the reference. It is possible to have the reference and index files in different locations by supplying both to this option separated by the string "##idx##", for example: -T ftp://x.com/ref.fa##idx##ftp://y.com/index.fa.fai However, note that only the location of the reference will be stored in the output file header. If this method is used to make CRAM files, the cram reader may not be able to find the index, and may not be able to decode the file unless it can get the references it needs using a different method.
-L FILE, --target-file FILE, --targets-file FILE
Only output alignments overlapping the input BED FILE [null].
-M, --use-index
Use the multi-region iterator on the union of a BED file and command-line region arguments. This avoids re-reading the same regions of files so can sometimes be much faster. Note this also removes duplicate sequences. Without this a sequence that overlaps multiple regions specified on the command line will be reported multiple times. The usage of a BED file is optional and its path has to be preceded by -L option.
--region-file FILE, --regions-file FILE
Use an index and multi-region iterator to only output alignments overlapping the input BED FILE. Equivalent to -M -L FILE or --use-index --target-file FILE.
-N FILE, --qname-file FILE
Output only alignments with read names listed in FILE.
-r STR, --read-group STR
Output alignments in read group STR [null]. Note that records with no RG tag will also be output when using this option. This behaviour may change in a future release.
-R FILE, --read-group-file FILE
Output alignments in read groups listed in FILE [null]. Note that records with no RG tag will also be output when using this option. This behaviour may change in a future release.
-d STR1[:STR2], --tag STR1[:STR2]
Only output alignments with tag STR1 and associated value STR2, which can be a string or an integer [null]. The value can be omitted, in which case only the tag is considered.
-D STR:FILE, --tag-file STR:FILE
Only output alignments with tag STR and associated values listed in FILE [null].
-q INT, --min-MQ INT
Skip alignments with MAPQ smaller than INT [0].
-l STR, --library STR
Only output alignments in library STR [null].
-m INT, --min-qlen INT
Only output alignments with number of CIGAR bases consuming query sequence ≥ INT [0]
-e STR, --expr STR
Only include alignments that match the filter expression STR. The syntax for these expressions is described in the main samtools(1) man page under the FILTER EXPRESSIONS heading.
-f FLAG, --require-flags FLAG
Only output alignments with all bits set in FLAG present in the FLAG field. FLAG can be specified in hex by beginning with `0x' (i.e. /^0x[0-9A-F]+/), in octal by beginning with `0' (i.e. /^0[0-7]+/), as a decimal number not beginning with '0' or as a comma-separated list of flag names. For a list of flag names see samtools-flags(1).
-F FLAG, --excl-flags FLAG, --exclude-flags FLAG
Do not output alignments with any bits set in FLAG present in the FLAG field. FLAG can be specified in hex by beginning with `0x' (i.e. /^0x[0-9A-F]+/), in octal by beginning with `0' (i.e. /^0[0-7]+/), as a decimal number not beginning with '0' or as a comma-separated list of flag names.
--rf FLAG , --incl-flags FLAG, --include-flags FLAG
Only output alignments with any bit set in FLAG present in the FLAG field. FLAG can be specified in hex by beginning with `0x' (i.e. /^0x[0-9A-F]+/), in octal by beginning with `0' (i.e. /^0[0-7]+/), as a decimal number not beginning with '0' or as a comma-separated list of flag names.
-G FLAG
Do not output alignments with all bits set in INT present in the FLAG field. This is the opposite of -f such that -f12 -G12 is the same as no filtering at all. FLAG can be specified in hex by beginning with `0x' (i.e. /^0x[0-9A-F]+/), in octal by beginning with `0' (i.e. /^0[0-7]+/), as a decimal number not beginning with '0' or as a comma-separated list of flag names.
-x STR, --remove-tag STR
Read tag(s) to exclude from output (repeatable) [null]. This can be a single tag or a comma separated list. Alternatively the option itself can be repeated multiple times. If the list starts with a `^' then it is negated and treated as a request to remove all tags except those in STR. The list may be empty, so -x ^ will remove all tags. Note that tags will only be removed from reads that pass filtering.
--keep-tag STR
This keeps only tags listed in STR and is directly equivalent to --remove-tag ^STR. Specifying an empty list will remove all tags. If both --keep-tag and --remove-tag are specified then --keep-tag has precedence. Note that tags will only be removed from reads that pass filtering.
-B, --remove-B
Collapse the backward CIGAR operation.
--add-flags FLAG
Adds flag(s) to read. FLAG can be specified in hex by beginning with `0x' (i.e. /^0x[0-9A-F]+/), in octal by beginning with `0' (i.e. /^0[0-7]+/), as a decimal number not beginning with '0' or as a comma-separated list of flag names.
--remove-flags FLAG
Remove flag(s) from read. FLAG is specified in the same way as with the --add-flags option.
--subsample FLOAT
Output only a proportion of the input alignments, as specified by 0.0 ≤ FLOAT ≤ 1.0, which gives the fraction of templates/pairs to be kept. This subsampling acts in the same way on all of the alignment records in the same template or read pair, so it never keeps a read but not its mate.
--subsample-seed INT
Subsampling seed used to influence which subset of reads is kept. When subsampling data that has previously been subsampled, be sure to use a different seed value from those used previously; otherwise more reads will be retained than expected. [0]
-s FLOAT
Subsampling shorthand option: -s INT.FRAC is equivalent to --subsample-seed INT --subsample 0.FRAC.
-@ INT, --threads INT
Number of BAM compression threads to use in addition to main thread [0].
-P, --fetch-pairs
Retrieve pairs even when the mate is outside of the requested region. Enabling this option also turns on the multi-region iterator ( -M). A region to search must be specified, either on the command-line, or using the -L option. The input file must be an indexed regular file. This option first scans the requested region, using the RNEXT and PNEXT fields of the records that have the PAIRED flag set and pass other filtering options to find where paired reads are located. These locations are used to build an expanded region list, and a set of QNAMEs to allow from the new regions. It will then make a second pass, collecting all reads from the originally-specified region list together with reads from additional locations that match the allowed set of QNAMEs. Any other filtering options used will be applied to all reads found during this second pass. As this option links reads using RNEXT and PNEXT, it is important that these fields are set accurately. Use 'samtools fixmate' to correct them if necessary. Note that this option does not work with the -c, --count; -U, --output-unselected; or -p, --unmap options.
-S
Ignored for compatibility with previous samtools versions. Previously this option was required if input was in SAM format, but now the correct format is automatically detected by examining the first few characters of input.
-X, --customized-index
Include customized index file as a part of arguments. See EXAMPLES section for sample of usage.
--no-PG
Do not add a @PG line to the header of the output file.

EXAMPLES

o
Import SAM to BAM when @SQ lines are present in the header:
samtools view -bo aln.bam aln.sam
    

If @SQ lines are absent:
samtools faidx ref.fa
samtools view -bt ref.fa.fai -o aln.bam aln.sam
    

where ref.fa.fai is generated automatically by the faidx command.
o
Convert a BAM file to a CRAM file using a local reference sequence.
samtools view -C -T ref.fa -o aln.cram aln.bam
    

o
Convert a BAM file to a CRAM with NM and MD tags stored verbatim rather than calculating on the fly during CRAM decode, so that mixed data sets with MD/NM only on some records, or NM calculated using different definitions of mismatch, can be decoded without change. The second command demonstrates how to decode such a file. The request to not decode MD here is turning off auto-generation of both MD and NM; it will still emit the MD/NM tags on records that had these stored verbatim.
samtools view -C --output-fmt-option store_md=1 --output-fmt-option store_nm=1 -o aln.cram aln.bam
samtools view --input-fmt-option decode_md=0 -o aln.new.bam aln.cram
    

o
An alternative way of achieving the above is listing multiple options after the --output-fmt or -O option. The commands below are equivalent to the two above.
samtools view -O cram,store_md=1,store_nm=1 -o aln.cram aln.bam
samtools view --input-fmt cram,decode_md=0 -o aln.new.bam aln.cram
    

o
Include customized index file as a part of arguments.
samtools view [options] -X /data_folder/data.bam /index_folder/data.bai chrM:1-10
    

o
Output alignments in read group grp2 (records with no RG tag will also be in the output).
samtools view -r grp2 -o /data_folder/data.rg2.bam /data_folder/data.bam
    

o
Only keep reads with tag BC and were the barcode matches the barcodes listed in the barcode file.
samtools view -D BC:barcodes.txt -o /data_folder/data.barcodes.bam /data_folder/data.bam
    

o
Only keep reads with tag RG and read group grp2. This does almost the same than -r grp2 but will not keep records without the RG tag.
samtools view -d RG:grp2 -o /data_folder/data.rg2_only.bam /data_folder/data.bam
    

o
Remove the actions of samtools markdup. Clear the duplicate flag and remove the dt tag, keep the header.
samtools view -h --remove-flags DUP -x dt -o /data_folder/dat.no_dup_markings.bam /data_folder/data.bam
    

AUTHOR

Written by Heng Li from the Sanger Institute.
 

SEE ALSO

samtools(1), samtools-tview(1), sam(5)
Samtools website: <http://www.htslib.org/>

Recommended readings

Pages related to samtools-view you should read also: