NAME

bamseqchksum - produce checksums for primary data in BAM files

SYNOPSIS

bamseqchksum [options]

DESCRIPTION

bamseqchksum reads a BAM file from stdin, for each record calculates hash digest checksums over
[1]
flags and sequence
[2]
queryname, flags and sequence
[3]
flags, sequence and qualities
[4]
flags, sequence and source data related aux tags
where the flags are the least significant byte of the BAM FLAGS containing only the bits for multiple segments, first segment and last segment. The sequence is reverse complemented, and quality string reversed, before checksumming if the reverse complemented bit is set.
Depending on the chosen hash digest function either the sum modulo some power of 2 or the product modulo a prime number of these checksums is taken over all non-supplementary and non-secondary BAM alignment records. Separate sums or products are reported for combinations of all and QC pass records and for each readgroup.
The following key=value pairs can be given:
verbose=<0>: Valid values are
1:
print progress report on standard error
0:
do not print progress report
inputformat=<bam>: input file format All versions of bamseqchksum come with support for the BAM input format. If the program in addition is linked to the io_lib package, then the following options are valid:
bam:
BAM (see http://samtools.sourceforge.net/SAM1.pdf)
sam:
SAM (see http://samtools.sourceforge.net/SAM1.pdf)
cram:
CRAM (see http://www.ebi.ac.uk/ena/about/cram_toolkit)
reference=: file name of the reference for CRAM input files. If this key is unset, then the CRAM file header will be scanned for obtaining a reference file name.
hash=<crc32prod>: hash digest used to compute checksums. All versions of biobambam support the following functions:
crc32prod:
checksums are computed via crc32 and combined over multiple records by multiplication modulo the prime number 2^31-1. This is the default and only option for biobambam versions up to 0.0.174.
crc32:
checksums are computed via crc32 and combined by summing up modulo 2^32.
md5:
checksums are computed via md5 and combined by summing up modulo 2^128.
crc32prime32:
identical with crc32prod (alternate implementation for testing purposes)
crc32prime64:
checksums are computed via crc32 and combined over multiple records by multiplication modulo the prime number 2^64-59.
md5prime64:
checksums are computed via md5 and combined over multiple records by multiplication modulo the prime number 2^64-59.
crc32prime96:
checksums are computed via crc32 and combined over multiple records by multiplication modulo the prime number 2^96-17.
md5prime96:
checksums are computed via md5 and combined over multiple records by multiplication modulo the prime number 2^96-17.
crc32prime128:
checksums are computed via crc32 and combined over multiple records by multiplication modulo the prime number 2^128-159.
md5prime128:
checksums are computed via md5 and combined over multiple records by multiplication modulo the prime number 2^128-159.
crc32prime160:
checksums are computed via crc32 and combined over multiple records by multiplication modulo the prime number 2^160-47.
md5prime160:
checksums are computed via md5 and combined over multiple records by multiplication modulo the prime number 2^160-47.
crc32prime192:
checksums are computed via crc32 and combined over multiple records by multiplication modulo the prime number 2^192-237.
md5prime192:
checksums are computed via md5 and combined over multiple records by multiplication modulo the prime number 2^192-237.
crc32prime224:
checksums are computed via crc32 and combined over multiple records by multiplication modulo the prime number 2^224-63.
md5prime224:
checksums are computed via md5 and combined over multiple records by multiplication modulo the prime number 2^224-63.
crc32prime256:
checksums are computed via crc32 and combined over multiple records by multiplication modulo the prime number 2^256-189.
md5prime256:
checksums are computed via md5 and combined over multiple records by multiplication modulo the prime number 2^256-189.
null:
no checksums are computed and all checksums in the programs output are 0. This option is for performance testing only.
If libmaus is compiled with support for the nettle library, then the following options are available:
sha1:
checksums are computed via sha1 and combined by summing up modulo 2^160.
sha1prime64:
checksums are computed via sha1 and combined over multiple records by multiplication modulo the prime number 2^64-59.
sha1prime96:
checksums are computed via sha1 and combined over multiple records by multiplication modulo the prime number 2^96-17.
sha1prime128:
checksums are computed via sha1 and combined over multiple records by multiplication modulo the prime number 2^128-159.
sha1prime160:
checksums are computed via sha1 and combined over multiple records by multiplication modulo the prime number 2^160-47.
sha1prime192:
checksums are computed via sha1 and combined over multiple records by multiplication modulo the prime number 2^192-237.
sha1prime224:
checksums are computed via sha1 and combined over multiple records by multiplication modulo the prime number 2^224-63.
sha1prime256:
checksums are computed via sha1 and combined over multiple records by multiplication modulo the prime number 2^256-189.
sha224:
checksums are computed via sha2-224 and combined by summing up modulo 2^224.
sha224prime64:
checksums are computed via sha2-224 and combined over multiple records by multiplication modulo the prime number 2^64-59.
sha224prime96:
checksums are computed via sha2-224 and combined over multiple records by multiplication modulo the prime number 2^96-17.
sha224prime128:
checksums are computed via sha2-224 and combined over multiple records by multiplication modulo the prime number 2^128-159.
sha224prime160:
checksums are computed via sha2-224 and combined over multiple records by multiplication modulo the prime number 2^160-47.
sha224prime192:
checksums are computed via sha2-224 and combined over multiple records by multiplication modulo the prime number 2^192-237.
sha224prime224:
checksums are computed via sha2-224 and combined over multiple records by multiplication modulo the prime number 2^224-63.
sha224prime256:
checksums are computed via sha2-224 and combined over multiple records by multiplication modulo the prime number 2^256-189.
sha256:
checksums are computed via sha2-256 and combined by summing up modulo 2^256.
sha256prime64:
checksums are computed via sha2-256 and combined over multiple records by multiplication modulo the prime number 2^64-59.
sha256prime96:
checksums are computed via sha2-256 and combined over multiple records by multiplication modulo the prime number 2^96-17.
sha256prime128:
checksums are computed via sha2-256 and combined over multiple records by multiplication modulo the prime number 2^128-159.
sha256prime160:
checksums are computed via sha2-256 and combined over multiple records by multiplication modulo the prime number 2^160-47.
sha256prime192:
checksums are computed via sha2-256 and combined over multiple records by multiplication modulo the prime number 2^192-237.
sha256prime224:
checksums are computed via sha2-256 and combined over multiple records by multiplication modulo the prime number 2^224-63.
sha256prime256:
checksums are computed via sha2-256 and combined over multiple records by multiplication modulo the prime number 2^256-189.
sha384:
checksums are computed via sha2-384 and combined by summing up modulo 2^384.
sha384prime64:
checksums are computed via sha2-384 and combined over multiple records by multiplication modulo the prime number 2^64-59.
sha384prime96:
checksums are computed via sha2-384 and combined over multiple records by multiplication modulo the prime number 2^96-17.
sha384prime128:
checksums are computed via sha2-384 and combined over multiple records by multiplication modulo the prime number 2^128-159.
sha384prime160:
checksums are computed via sha2-384 and combined over multiple records by multiplication modulo the prime number 2^160-47.
sha384prime192:
checksums are computed via sha2-384 and combined over multiple records by multiplication modulo the prime number 2^192-237.
sha384prime224:
checksums are computed via sha2-384 and combined over multiple records by multiplication modulo the prime number 2^224-63.
sha384prime256:
checksums are computed via sha2-384 and combined over multiple records by multiplication modulo the prime number 2^256-189.
sha512:
checksums are computed via sha2-512 and combined by summing up modulo 2^512.
sha512prime64:
checksums are computed via sha2-512 and combined over multiple records by multiplication modulo the prime number 2^64-59.
sha512prime96:
checksums are computed via sha2-512 and combined over multiple records by multiplication modulo the prime number 2^96-17.
sha512prime128:
checksums are computed via sha2-512 and combined over multiple records by multiplication modulo the prime number 2^128-159.
sha512prime160:
checksums are computed via sha2-512 and combined over multiple records by multiplication modulo the prime number 2^160-47.
sha512prime192:
checksums are computed via sha2-512 and combined over multiple records by multiplication modulo the prime number 2^192-237.
sha512prime224:
checksums are computed via sha2-512 and combined over multiple records by multiplication modulo the prime number 2^224-63.
sha512prime256:
checksums are computed via sha2-512 and combined over multiple records by multiplication modulo the prime number 2^256-189.
sha512primesums:
checksums are computed via sha2-512 and combined over multiple records by adding modulo the Mersenne prime number 2^521-1.
sha512primesums512:
checksums are computed via sha2-512 and combined over multiple records by adding modulo 2^512-75.
murmur3:
checksums are computed via MurmurHash3_x64_128 (see https://github.com/aappleby/smhasher/blob/master/src/MurmurHash3.cpp) and combined over multiple records by summing modulo 2^128.
murmur3primesums128:
checksums are computed via MurmurHash3_x64_128 (see https://github.com/aappleby/smhasher/blob/master/src/MurmurHash3.cpp) and combined over multiple records by summing modulo 2^128+51.

AUTHOR

Written by David Jackson (using code by German Tischler as a template). Extended to hash digests beyond crc32prod by German Tischler.

REPORTING BUGS

Report bugs to <[email protected]> Copyright © 2014-2014 David Jackson, © 2014-2014 Genome Research Limited. Copyright © 2009-2016 German Tischler, © 2011-2014 Genome Research Limited. License GPLv3+: GNU GPL version 3 <http://gnu.org/licenses/gpl.html>
 
This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law.