NAME

bamvalidate - validate BAM file

SYNOPSIS

bamvalidate [options]

DESCRIPTION

bamvalidate reads a BAM, SAM or CRAM file and checks the contained alignments for validity. If all the alignments in the source file pass validation, then the program exits with code 0, otherwise it exits with a non-zero code. By default no data is produced on the standard output channel, if the passthrough parameter is set to 1, then the alignments are recoded to standard output according to the given output parameters.
The following key=value pairs can be given:
verbose=<0|1>: Valid values are
1:
print statistics on the standard error channel at the end of a successful run
0:
do not print statistics
passthrough=<0|1>: recode alignments to standard output if passthrough=1. Default is passthrough=0.
level=<-1|0|1|9|11>: set compression level of the output BAM file. Valid values are
-1:
zlib/gzip default compression level
0:
uncompressed
1:
zlib/gzip level 1 (fast) compression
9:
zlib/gzip level 9 (best) compression
If libmaus has been compiled with support for igzip (see https://software.intel.com/en-us/articles/igzip-a-high-performance-deflate-compressor-with-optimizations-for-genomic-data) then an additional valid value is
11:
igzip compression
tmpfile=<filename>: set the prefix for temporary file names
md5=<0|1>: md5 checksum creation for output file. This option can only be given if outputformat=bam. Then valid values are
0:
do not compute checksum. This is the default.
1:
compute checksum. If the md5filename key is set, then the checksum is written to the given file. If md5filename is unset, then no checksum will be computed.
md5filename file name for md5 checksum if md5=1.
index=<0|1>: compute BAM index for output file. This option can only be given if outputformat=bam. Then valid values are
0:
do not compute BAM index. This is the default.
1:
compute BAM index. If the indexfilename key is set, then the BAM index is written to the given file. If indexfilename is unset, then no BAM index will be computed.
indexfilename file name for output BAM index if index=1.
inputformat=<bam>: input file format. All versions of bamsort come with support for the BAM input format. If the program in addition is linked to the io_lib package, then the following options are valid:
bam:
BAM (see http://samtools.sourceforge.net/SAM1.pdf)
sam:
SAM (see http://samtools.sourceforge.net/SAM1.pdf)
cram:
CRAM (see http://www.ebi.ac.uk/ena/about/cram_toolkit)
outputformat=<bam>: output file format. All versions of bamsort come with support for the BAM output format. If the program in addition is linked to the io_lib package, then the following options are valid:
bam:
BAM (see http://samtools.sourceforge.net/SAM1.pdf)
sam:
SAM (see http://samtools.sourceforge.net/SAM1.pdf)
cram:
CRAM (see http://www.ebi.ac.uk/ena/about/cram_toolkit). This format is not advisable for data sorted by query name.
I=<[stdin]>: input filename, standard input if unset.
O=<[stdout]>: output filename, standard output if unset.
inputthreads=<[1]>: input helper threads, only valid for inputformat=bam.
outputthreads=<[1]>: output helper threads, only valid for outputformat=bam.
reference=<[]>: reference FastA file for inputformat=cram and outputformat=cram. An index file (.fai) is required.
range=<>: input range to be processed. This option is only valid if the input is a coordinate sorted and indexed BAM file
basequalhist=<[0]>: compute base quality histogram and output this histogram on the standard error channel after a successful validation run. The histogram lines are prefixed with [H]. There is one line for each occurring base quality value and two lines in the end specifying the minimum and maximum occurring quality value. The lines contain tabulator symbol separated columns. The occurring base quality lines have 5 columns after the [H] column. These designate the numerical quality level, the ASCII character representation of this level (e.g. ! for quality 0 or I for 40), the absolute number of occurrences of the value, the fraction of the occurrences of the value (the absolute number for this value divided by the sum of the occurrences over all values) and the cumulative fraction for all values up to and including the current value. The minimum and maximum value lines have min and max respectively in the second column. The third and fourth column contain the numerical value and the ASCII representation of this value. For all lines the ASCII representation of a base quality value may be empty if the numerical quality value does not correspond to a printable ASCII code.

AUTHOR

Written by German Tischler.

REPORTING BUGS

Report bugs to <[email protected]> Copyright © 2009-2014 German Tischler, © 2011-2014 Genome Research Limited. License GPLv3+: GNU GPL version 3 <http://gnu.org/licenses/gpl.html>
 
This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law.