bamseqchksum - produce checksums for primary data in BAM files
bamseqchksum [options]
bamseqchksum reads a BAM file from stdin, for each record calculates hash digest
checksums over
- [1]
- flags and sequence
- [2]
- queryname, flags and sequence
- [3]
- flags, sequence and qualities
- [4]
- flags, sequence and source data related aux tags
where the flags are the least significant byte of the BAM FLAGS containing only
the bits for multiple segments, first segment and last segment. The sequence
is reverse complemented, and quality string reversed, before checksumming if
the reverse complemented bit is set.
Depending on the chosen hash digest function either the sum modulo some power of
2 or the product modulo a prime number of these checksums is taken over all
non-supplementary and non-secondary BAM alignment records. Separate sums or
products are reported for combinations of all and QC pass records and for each
readgroup.
The following key=value pairs can be given:
verbose=<0>: Valid values are
- 1:
- print progress report on standard error
- 0:
- do not print progress report
inputformat=<bam>: input file format All versions of bamseqchksum
come with support for the BAM input format. If the program in addition is
linked to the io_lib package, then the following options are valid:
- bam:
- BAM (see http://samtools.sourceforge.net/SAM1.pdf)
- sam:
- SAM (see http://samtools.sourceforge.net/SAM1.pdf)
- cram:
- CRAM (see http://www.ebi.ac.uk/ena/about/cram_toolkit)
reference=: file name of the reference for CRAM input files. If this key
is unset, then the CRAM file header will be scanned for obtaining a reference
file name.
hash=<crc32prod>: hash digest used to compute checksums. All
versions of biobambam support the following functions:
- crc32prod:
- checksums are computed via crc32 and combined over multiple
records by multiplication modulo the prime number 2^31-1. This is the
default and only option for biobambam versions up to 0.0.174.
- crc32:
- checksums are computed via crc32 and combined by summing up
modulo 2^32.
- md5:
- checksums are computed via md5 and combined by summing up
modulo 2^128.
- crc32prime32:
- identical with crc32prod (alternate implementation for
testing purposes)
- crc32prime64:
- checksums are computed via crc32 and combined over multiple
records by multiplication modulo the prime number 2^64-59.
- md5prime64:
- checksums are computed via md5 and combined over multiple
records by multiplication modulo the prime number 2^64-59.
- crc32prime96:
- checksums are computed via crc32 and combined over multiple
records by multiplication modulo the prime number 2^96-17.
- md5prime96:
- checksums are computed via md5 and combined over multiple
records by multiplication modulo the prime number 2^96-17.
- crc32prime128:
- checksums are computed via crc32 and combined over multiple
records by multiplication modulo the prime number 2^128-159.
- md5prime128:
- checksums are computed via md5 and combined over multiple
records by multiplication modulo the prime number 2^128-159.
- crc32prime160:
- checksums are computed via crc32 and combined over multiple
records by multiplication modulo the prime number 2^160-47.
- md5prime160:
- checksums are computed via md5 and combined over multiple
records by multiplication modulo the prime number 2^160-47.
- crc32prime192:
- checksums are computed via crc32 and combined over multiple
records by multiplication modulo the prime number 2^192-237.
- md5prime192:
- checksums are computed via md5 and combined over multiple
records by multiplication modulo the prime number 2^192-237.
- crc32prime224:
- checksums are computed via crc32 and combined over multiple
records by multiplication modulo the prime number 2^224-63.
- md5prime224:
- checksums are computed via md5 and combined over multiple
records by multiplication modulo the prime number 2^224-63.
- crc32prime256:
- checksums are computed via crc32 and combined over multiple
records by multiplication modulo the prime number 2^256-189.
- md5prime256:
- checksums are computed via md5 and combined over multiple
records by multiplication modulo the prime number 2^256-189.
- null:
- no checksums are computed and all checksums in the programs
output are 0. This option is for performance testing only.
If libmaus is compiled with support for the nettle library, then the following
options are available:
- sha1:
- checksums are computed via sha1 and combined by summing up
modulo 2^160.
- sha1prime64:
- checksums are computed via sha1 and combined over multiple
records by multiplication modulo the prime number 2^64-59.
- sha1prime96:
- checksums are computed via sha1 and combined over multiple
records by multiplication modulo the prime number 2^96-17.
- sha1prime128:
- checksums are computed via sha1 and combined over multiple
records by multiplication modulo the prime number 2^128-159.
- sha1prime160:
- checksums are computed via sha1 and combined over multiple
records by multiplication modulo the prime number 2^160-47.
- sha1prime192:
- checksums are computed via sha1 and combined over multiple
records by multiplication modulo the prime number 2^192-237.
- sha1prime224:
- checksums are computed via sha1 and combined over multiple
records by multiplication modulo the prime number 2^224-63.
- sha1prime256:
- checksums are computed via sha1 and combined over multiple
records by multiplication modulo the prime number 2^256-189.
- sha224:
- checksums are computed via sha2-224 and combined by summing
up modulo 2^224.
- sha224prime64:
- checksums are computed via sha2-224 and combined over
multiple records by multiplication modulo the prime number 2^64-59.
- sha224prime96:
- checksums are computed via sha2-224 and combined over
multiple records by multiplication modulo the prime number 2^96-17.
- sha224prime128:
- checksums are computed via sha2-224 and combined over
multiple records by multiplication modulo the prime number 2^128-159.
- sha224prime160:
- checksums are computed via sha2-224 and combined over
multiple records by multiplication modulo the prime number 2^160-47.
- sha224prime192:
- checksums are computed via sha2-224 and combined over
multiple records by multiplication modulo the prime number 2^192-237.
- sha224prime224:
- checksums are computed via sha2-224 and combined over
multiple records by multiplication modulo the prime number 2^224-63.
- sha224prime256:
- checksums are computed via sha2-224 and combined over
multiple records by multiplication modulo the prime number 2^256-189.
- sha256:
- checksums are computed via sha2-256 and combined by summing
up modulo 2^256.
- sha256prime64:
- checksums are computed via sha2-256 and combined over
multiple records by multiplication modulo the prime number 2^64-59.
- sha256prime96:
- checksums are computed via sha2-256 and combined over
multiple records by multiplication modulo the prime number 2^96-17.
- sha256prime128:
- checksums are computed via sha2-256 and combined over
multiple records by multiplication modulo the prime number 2^128-159.
- sha256prime160:
- checksums are computed via sha2-256 and combined over
multiple records by multiplication modulo the prime number 2^160-47.
- sha256prime192:
- checksums are computed via sha2-256 and combined over
multiple records by multiplication modulo the prime number 2^192-237.
- sha256prime224:
- checksums are computed via sha2-256 and combined over
multiple records by multiplication modulo the prime number 2^224-63.
- sha256prime256:
- checksums are computed via sha2-256 and combined over
multiple records by multiplication modulo the prime number 2^256-189.
- sha384:
- checksums are computed via sha2-384 and combined by summing
up modulo 2^384.
- sha384prime64:
- checksums are computed via sha2-384 and combined over
multiple records by multiplication modulo the prime number 2^64-59.
- sha384prime96:
- checksums are computed via sha2-384 and combined over
multiple records by multiplication modulo the prime number 2^96-17.
- sha384prime128:
- checksums are computed via sha2-384 and combined over
multiple records by multiplication modulo the prime number 2^128-159.
- sha384prime160:
- checksums are computed via sha2-384 and combined over
multiple records by multiplication modulo the prime number 2^160-47.
- sha384prime192:
- checksums are computed via sha2-384 and combined over
multiple records by multiplication modulo the prime number 2^192-237.
- sha384prime224:
- checksums are computed via sha2-384 and combined over
multiple records by multiplication modulo the prime number 2^224-63.
- sha384prime256:
- checksums are computed via sha2-384 and combined over
multiple records by multiplication modulo the prime number 2^256-189.
- sha512:
- checksums are computed via sha2-512 and combined by summing
up modulo 2^512.
- sha512prime64:
- checksums are computed via sha2-512 and combined over
multiple records by multiplication modulo the prime number 2^64-59.
- sha512prime96:
- checksums are computed via sha2-512 and combined over
multiple records by multiplication modulo the prime number 2^96-17.
- sha512prime128:
- checksums are computed via sha2-512 and combined over
multiple records by multiplication modulo the prime number 2^128-159.
- sha512prime160:
- checksums are computed via sha2-512 and combined over
multiple records by multiplication modulo the prime number 2^160-47.
- sha512prime192:
- checksums are computed via sha2-512 and combined over
multiple records by multiplication modulo the prime number 2^192-237.
- sha512prime224:
- checksums are computed via sha2-512 and combined over
multiple records by multiplication modulo the prime number 2^224-63.
- sha512prime256:
- checksums are computed via sha2-512 and combined over
multiple records by multiplication modulo the prime number 2^256-189.
- sha512primesums:
- checksums are computed via sha2-512 and combined over
multiple records by adding modulo the Mersenne prime number 2^521-1.
- sha512primesums512:
- checksums are computed via sha2-512 and combined over
multiple records by adding modulo 2^512-75.
- murmur3:
- checksums are computed via MurmurHash3_x64_128 (see
https://github.com/aappleby/smhasher/blob/master/src/MurmurHash3.cpp) and
combined over multiple records by summing modulo 2^128.
- murmur3primesums128:
- checksums are computed via MurmurHash3_x64_128 (see
https://github.com/aappleby/smhasher/blob/master/src/MurmurHash3.cpp) and
combined over multiple records by summing modulo 2^128+51.
Written by David Jackson (using code by German Tischler as a template). Extended
to hash digests beyond crc32prod by German Tischler.
Report bugs to <
[email protected]>
Copyright © 2014-2014 David Jackson, © 2014-2014 Genome Research
Limited. Copyright © 2009-2016 German Tischler, © 2011-2014
Genome Research Limited. License GPLv3+: GNU GPL version 3
<
http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it. There is NO
WARRANTY, to the extent permitted by law.