bamvalidate - validate BAM file
bamvalidate [options]
bamvalidate reads a BAM, SAM or CRAM file and checks the contained alignments
for validity. If all the alignments in the source file pass validation, then
the program exits with code 0, otherwise it exits with a non-zero code. By
default no data is produced on the standard output channel, if the passthrough
parameter is set to 1, then the alignments are recoded to standard output
according to the given output parameters.
The following key=value pairs can be given:
verbose=<0|1>: Valid values are
- 1:
- print statistics on the standard error channel at the end
of a successful run
- 0:
- do not print statistics
passthrough=<0|1>: recode alignments to standard output if
passthrough=1. Default is passthrough=0.
level=<-1|0|1|9|11>: set compression level of the output BAM file.
Valid values are
- -1:
- zlib/gzip default compression level
- 0:
- uncompressed
- 1:
- zlib/gzip level 1 (fast) compression
- 9:
- zlib/gzip level 9 (best) compression
If libmaus has been compiled with support for igzip (see
https://software.intel.com/en-us/articles/igzip-a-high-performance-deflate-compressor-with-optimizations-for-genomic-data)
then an additional valid value is
- 11:
- igzip compression
tmpfile=<filename>: set the prefix for temporary file names
md5=<0|1>: md5 checksum creation for output file. This option can
only be given if outputformat=bam. Then valid values are
- 0:
- do not compute checksum. This is the default.
- 1:
- compute checksum. If the md5filename key is set, then the
checksum is written to the given file. If md5filename is unset, then no
checksum will be computed.
md5filename file name for md5 checksum if md5=1.
index=<0|1>: compute BAM index for output file. This option can
only be given if outputformat=bam. Then valid values are
- 0:
- do not compute BAM index. This is the default.
- 1:
- compute BAM index. If the indexfilename key is set, then
the BAM index is written to the given file. If indexfilename is unset,
then no BAM index will be computed.
indexfilename file name for output BAM index if index=1.
inputformat=<bam>: input file format. All versions of bamsort come
with support for the BAM input format. If the program in addition is linked to
the io_lib package, then the following options are valid:
- bam:
- BAM (see http://samtools.sourceforge.net/SAM1.pdf)
- sam:
- SAM (see http://samtools.sourceforge.net/SAM1.pdf)
- cram:
- CRAM (see http://www.ebi.ac.uk/ena/about/cram_toolkit)
outputformat=<bam>: output file format. All versions of bamsort
come with support for the BAM output format. If the program in addition is
linked to the io_lib package, then the following options are valid:
- bam:
- BAM (see http://samtools.sourceforge.net/SAM1.pdf)
- sam:
- SAM (see http://samtools.sourceforge.net/SAM1.pdf)
- cram:
- CRAM (see http://www.ebi.ac.uk/ena/about/cram_toolkit).
This format is not advisable for data sorted by query name.
I=<[stdin]>: input filename, standard input if unset.
O=<[stdout]>: output filename, standard output if unset.
inputthreads=<[1]>: input helper threads, only valid for
inputformat=bam.
outputthreads=<[1]>: output helper threads, only valid for
outputformat=bam.
reference=<[]>: reference FastA file for inputformat=cram and
outputformat=cram. An index file (.fai) is required.
range=<>: input range to be processed. This option is only valid if
the input is a coordinate sorted and indexed BAM file
basequalhist=<[0]>: compute base quality histogram and output this
histogram on the standard error channel after a successful validation run. The
histogram lines are prefixed with [H]. There is one line for each occurring
base quality value and two lines in the end specifying the minimum and maximum
occurring quality value. The lines contain tabulator symbol separated columns.
The occurring base quality lines have 5 columns after the [H] column. These
designate the numerical quality level, the ASCII character representation of
this level (e.g. ! for quality 0 or I for 40), the absolute number of
occurrences of the value, the fraction of the occurrences of the value (the
absolute number for this value divided by the sum of the occurrences over all
values) and the cumulative fraction for all values up to and including the
current value. The minimum and maximum value lines have min and max
respectively in the second column. The third and fourth column contain the
numerical value and the ASCII representation of this value. For all lines the
ASCII representation of a base quality value may be empty if the numerical
quality value does not correspond to a printable ASCII code.
Written by German Tischler.
Report bugs to <
[email protected]>
Copyright © 2009-2014 German Tischler, © 2011-2014 Genome Research
Limited. License GPLv3+: GNU GPL version 3
<
http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it. There is NO
WARRANTY, to the extent permitted by law.