bzz - DjVu general purpose compression utility.
bzz -e[blocksize] inputfile outputfile
bzz -d inputfile outputfile
The first form of the command line (option
-e) compresses the data from
file
inputfile and writes the compressed data into
outputfile.
The second form of the command line (option
-d) decompressed file
inputfile and writes the output to
outputfile.
- -d
- Decoding mode.
-
-e[blocksize]
- Encoding mode. The optional argument blocksize
specifies the size of the input file blocks processed by the
Burrows-Wheeler transform expressed in kilobytes. The default block sizes
is 2048 KB. The maximal block size is 4096
KB. Specifying a larger block size usually produces higher
compression ratios and increases the memory requirements of both the
encoder and decoder. It is useless to specify a block size that is larger
than the input file.
The Burrows-Wheeler transform is performed using a combination of the
Karp-Miller-Rosenberg and the Bentley-Sedgewick algorithms. This is comparable
to (Sadakane, DCC 98) with a slightly more flexible ranking scheme. Symbols
are then ordered according to a running estimate of their occurrence
frequencies. The symbol ranks are then coded using a simple fixed tree and the
ZP binary adaptive coder (Bottou, DCC 98).
The Burrows-Wheeler transform is also used in the well known compressor
bzip2. The originality of
bzz is the use of the ZP adaptive
coder. The adaptation noise can cost up to 5 percent in file size, but this
penalty is usually offset by the benefits of adaptation.
The following table shows comparative results (in bits per character) on the
Canterbury Corpus (
http://corpus.canterbury.ac.nz ). The very good
bzz performance on the spreadsheet file
excl puts the weighted
average ahead of much more sophisticated compressors such as
fsmx.
Compression performance |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
text |
fax |
csrc |
excl |
sprc |
tech |
poem |
html |
lisp |
man |
play |
Weighted |
Average |
|
compress |
3.27 |
0.97 |
3.56 |
2.41 |
4.21 |
3.06 |
3.38 |
3.68 |
3.90 |
4.43 |
3.51 |
2.55 |
3.31 |
gzip -9 |
2.85 |
0.82 |
2.24 |
1.63 |
2.67 |
2.71 |
3.23 |
2.59 |
2.65 |
3.31 |
3.12 |
2.08 |
2.53 |
bzip2 -9 |
2.27 |
0.78 |
2.18 |
1.01 |
2.70 |
2.02 |
2.42 |
2.48 |
2.79 |
3.33 |
2.53 |
1.54 |
2.23 |
ppmd |
2.31 |
0.99 |
2.11 |
1.08 |
2.68 |
2.19 |
2.48 |
2.38 |
2.43 |
3.00 |
2.53 |
1.65 |
2.20 |
fsmx |
2.10 |
0.79 |
1.89 |
1.48 |
2.52 |
1.84 |
2.21 |
2.24 |
2.29 |
2.91 |
2.35 |
1.63 |
2.06 |
bzz |
2.25 |
0.76 |
2.13 |
0.78 |
2.67 |
2.00 |
2.40 |
2.52 |
2.60 |
3.19 |
2.52 |
1.44 |
2.16 |
Note that DjVu contributors have several entries in this table. Program
compress was written some time ago by Joe Orost. Program
ppmd is
an improvement of the
PPM-C method invented by Paul Howard.
Program
bzz was written by Léon Bottou
<
[email protected]> and was then improved by Andrei Erofeev
<
[email protected]>, Bill Riemers <
[email protected]>
and many others.
djvu(1),
compress(1),
gzip(1),
bzip2(1)