bamintervalcomment - sort BAM files by coordinate or query name
bamintervalcomment [options]
bamintervalcomment reads a BAM, SAM or CRAM file and a file containing a list of
named intervals, marks each line in the input with the list of all matching
intervals and stores the resulting file in BAM, SAM or CRAM format. The
intervals file needs to be given using the intervals key. The file can be
either plain or compressed using gzip. The intervals file is expected to
contain one interval per line. Each line is assumed to contain a tab separated
list of values, where the following columns are used by the program:
- first and second column
- contain a pair of names which form the id of the
interval
- third column
- gives the name of the reference sequence containing the
interval
- fifth and sixth column
- give the interval on the reference sequence designated by
the third column as a pair of non negative integers. Both borders are
included.
For each alignment the matching interval designators are stored in the CO
(comment) auxiliary field in the form of a semicolon separated list, where
each list element is a pair (A,B) given the two id columns of the respective
interval. An example of an interval file can be found at
http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/refFlat.txt.gz .
The following key=value pairs can be given when running the program:
level=<-1|0|1|9|11>: set compression level of the output BAM file.
Valid values are
- -1:
- zlib/gzip default compression level
- 0:
- uncompressed
- 1:
- zlib/gzip level 1 (fast) compression
- 9:
- zlib/gzip level 9 (best) compression
If libmaus has been compiled with support for igzip (see
https://software.intel.com/en-us/articles/igzip-a-high-performance-deflate-compressor-with-optimizations-for-genomic-data)
then an additional valid value is
- 11:
- igzip compression
verbose=<1>: Valid values are
- 1:
- print progress report on standard error
- 0:
- do not print progress report
tmpfile=<filename>: set the prefix for temporary file names
disablevalidation=<0|1>: sets whether input validation is
performed. Valid values are
- 0:
- validation is enabled (default)
- 1:
- validation is disabled
md5=<0|1>: md5 checksum creation for output file. This option can
only be given if outputformat=bam. Then valid values are
- 0:
- do not compute checksum. This is the default.
- 1:
- compute checksum. If the md5filename key is set, then the
checksum is written to the given file. If md5filename is unset, then no
checksum will be computed.
md5filename file name for md5 checksum if md5=1.
index=<0|1>: compute BAM index for output file. This option can
only be given if outputformat=bam. Then valid values are
- 0:
- do not compute BAM index. This is the default.
- 1:
- compute BAM index. If the indexfilename key is set, then
the BAM index is written to the given file. If indexfilename is unset,
then no BAM index will be computed.
indexfilename file name for output BAM index if index=1.
inputformat=<bam>: input file format. All versions of
bamintervalcomment come with support for the BAM input format. If the program
in addition is linked to the io_lib package, then the following options are
valid:
- bam:
- BAM (see http://samtools.sourceforge.net/SAM1.pdf)
- sam:
- SAM (see http://samtools.sourceforge.net/SAM1.pdf)
- cram:
- CRAM (see http://www.ebi.ac.uk/ena/about/cram_toolkit)
outputformat=<bam>: output file format. All versions of
bamintervalcomment come with support for the BAM output format. If the program
in addition is linked to the io_lib package, then the following options are
valid:
- bam:
- BAM (see http://samtools.sourceforge.net/SAM1.pdf)
- sam:
- SAM (see http://samtools.sourceforge.net/SAM1.pdf)
- cram:
- CRAM (see http://www.ebi.ac.uk/ena/about/cram_toolkit).
This format is not advisable for data sorted by query name.
I=<[stdin]>: input filename, standard input if unset.
O=<[stdout]>: output filename, standard output if unset.
inputthreads=<[1]>: input helper threads, only valid for
inputformat=bam.
outputthreads=<[1]>: output helper threads, only valid for
outputformat=bam.
reference=<[]>: reference FastA file for inputformat=cram and
outputformat=cram. An index file (.fai) is required.
range=<>: input range to be processed. This option is only valid if
the input is a coordinate sorted and indexed BAM file
intervals=<>: file name of intervals file
Written by German Tischler.
Report bugs to <
[email protected]>
Copyright © 2009-2014 German Tischler, © 2011-2014 Genome Research
Limited. License GPLv3+: GNU GPL version 3
<
http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it. There is NO
WARRANTY, to the extent permitted by law.