ovdb - Overview storage method for INN
The ovdb overview is a storage method that uses the Berkeley DB library
to store overview data. It requires version 4.4 or later of the
Berkeley DB library (4.7+ is recommended because older versions suffer
from various issues).
The ovdb overview method makes use of the full transaction/logging/locking
functionality of the Berkeley DB environment. Berkeley DB may be
downloaded from
<
https://www.oracle.com/database/technologies/related/berkeleydb.html>
and is needed to build the ovdb backend.
There are several versions of the ovdb storage method:
- •
- Version 1, the initial version shipped with
INN 2.3.0 up to INN 2.3.5.
- •
- Version 2, with improved performance, since
INN 2.4.0.
- •
- Version 3, corresponding to version 2 with compression
enabled, starting with INN 2.5.0.
If you have a database created with a previous version of ovdb, your database
will need to be upgraded using
ovdb_init. See the
ovdb_init(8)
man page for upgrade instructions, as well as the "COMPRESSION"
section below.
Note that when the Berkeley DB library is updated to a newer version, the
ovdb database also needs being upgraded.
If the Berkeley DB library is found at configure time, INN will be built
with Berkeley DB support unless the
--without-bdb flag is
explicitly passed to configure. By default, configure will search for
Berkeley DB in standard locations; there will be a message in the
configure output indicating the pathname that will be used.
You can override this pathname by adding a path to the option, for instance
--with-bdb=/usr/BerkeleyDB.4.4. This directory is expected to have
subdirectories
include and
lib (
lib32 and
lib64
are also checked), containing respectively
db.h, and the library
itself. In case non-standard paths to the Berkeley DB libraries are
used, one or both of the options
--with-bdb-include and
--with-bdb-lib can be given to configure with a path.
The ovdb database may take up more disk space for a given spool than the other
overview methods. Plan on needing at least 1.1 KB for every article in
your spool (not counting crossposts). So, if you have 5 million articles,
you'll need at least 5.5 GB of disk space for ovdb. With compression
enabled, this estimate changes to 0.9 KB per article, so you'll need at
least 4.5 GB of disk space for 5 million articles. See the
"COMPRESSION" section below. Plus, you'll need additional space for
transaction logs: at least 100 MB. By default, the transaction logs go
in the same directory as the database. To improve performance, they can be
placed on a different disk -- see the "DB_CONFIG" section.
To enable the ovdb overview method, set the
ovmethod parameter in
inn.conf to "ovdb". The ovdb database is stored in the
directory specified by the
pathoverview parameter in
inn.conf.
This is the "DB_HOME" directory. To start out, this directory should
be empty (other than an optional
DB_CONFIG file; see
"DB_CONFIG" for details), and
innd (or
makehistory)
will create the files as necessary in that directory. Also, make sure the
directory is owned by the news user.
Other parameters for configuring ovdb are in the
ovdb.conf configuration
file. The following parameters can be set in that file:
- compress
- If INN was compiled with zlib, and this compress
parameter is true, ovdb will compress overview records that are longer
than 600 bytes. See the "COMPRESSION" section below.
- cachesize
- Size of the memory pool cache, in kilobytes. The cache will
have a backing store file in the DB directory which will be at least as
big. In general, the bigger the cache, the better. Use "ovdb_stat
-m" to see cache hit percentages. To make a change of this parameter
take effect, shut down and restart INN (be sure to kill all of the
nnrpd processes when shutting down). Default is 8000 (KB), which is
adequate for small to medium-sized servers. Large servers will probably
need at least 20000 (KB).
- ncache
- Number of regions across which to split the cache. The
region size is equal to cachesize divided by ncache. Default
is 1 for ncache, that is to say the cache will be allocated
contiguously in memory.
- numdbfiles
- Overview data is split between this many files. Currently,
innd will keep all of the files open, so don't set this too high or
innd may run out of file descriptors. nnrpd only opens one
at a time, regardless. May be set to one, or just a few, but only do that
if your OS supports large (> 2 GB) files. Changing this
parameter has no effect on an already-established database. Default is
32.
- txn_nosync
- If txn_nosync is set to false, Berkeley DB flushes
the log after every transaction. This minimizes the number of transactions
that may be lost in the event of a crash, but results in significantly
degraded performance. Default is true.
- useshm
- If useshm is set to true, Berkeley DB will
use shared memory instead of mmap for its environment regions (cache,
lock, etc). With some platforms, this may improve performance. Default is
false.
- shmkey
- Sets the shared memory key used by Berkeley DB when
useshm is true. Berkeley DB will create several (usually 5)
shared memory segments, using sequentially numbered keys starting with
"shmkey". Choose a key that does not conflict with any existing
shared memory segments on your system. Default is 6400.
- pagesize
- Sets the page size for the DB files (in bytes). Must be a
power of 2. Best choices are 4096 or 8192. The default is 8192. Changing
this parameter has no effect on an already-established database.
- minkey
- Sets the minimum number of keys per page. See the
Berkeley DB documentation for more information. Default is based on
page size and whether compression is enabled:
default_minkey = MAX(2, pagesize / 2600) if compress is false
default_minkey = MAX(2, pagesize / 1500) if compress is true
The lowest allowed minkey is 2. Setting minkey higher than the
default is not recommended, as it will cause the databases to have a lot
of overflow pages. Changing this parameter has no effect on an
already-established database.
- maxlocks
- Sets the Berkeley DB lk_max parameter, which
is the maximum number of locks that can exist in the database at the same
time. Default is 4000.
- nocompact
- The nocompact parameter affects the behaviour of
expireover. The expireover function in ovdb can do its job
in one of two ways: by simply deleting expired records from the database;
or by re-writing the overview records into a different location leaving
out the expired records. The first method is faster, but it leaves 'holes'
that result in space that can not immediately be reused. The second method
'compacts' the records by rewriting them.
If this parameter is set to 0, expireover will compact all
newsgroups; if set to 1, expireover will not compact any
newsgroups; and if set to a value greater than one, expireover will
only compact groups that have less than that number of articles.
Experience has shown that compacting has minimal effect (other than making
expireover take longer) so the default is 1. This parameter will
probably be removed in the future.
- readserver
- When the readserver parameter is set to false, each
nnrpd process directly accesses the Berkeley DB environment.
The process of attaching to the database (and detaching when finished) is
fairly expensive, and can result in high loads in situations when there
are lots of reader connections of relatively short duration.
When the readserver parameter is set to true, the nnrpd
processes will access overview via a helper server ( ovdb_server
-- which is started by ovdb_init). All ovdb reads will then
be funnelled through a single process with a cleaner interface to the
underlying Berkeley DB database. This will result in cleaner
shutdowns for the database, improving stability and avoiding deadlocks,
timing issues and corrupted databases. That's why you should try to set
this parameter to true if you are experiencing any instability in the ovdb
overview method.
Default value is true.
- numrsprocs
- This parameter is only used when readserver is true.
It sets the number of ovdb_server processes. As each
ovdb_server can process only one transaction at a time, running
more servers can improve reader response times. Default is 5.
- maxrsconn
- This parameter is only used when readserver is true.
It sets a maximum number of readers that a given ovdb_server
process will serve at one time. This means the maximum number of readers
for all of the ovdb_server processes is (numrsprocs *
maxrsconn). This does not limit the actual number of
readers, since nnrpd will fall back to opening the database
directly if it can't connect to an ovdb_server. Default is 0, which
means an unlimited number of connections is allowed.
The ovdb storage method has the ability to compress overview data before it is
stored into the database. In addition to consuming less disk space,
compression keeps the average size of the database keys smaller. This in turn
increases the average number of keys per page, which can significantly improve
performance and also helps keep the database more compact. This feature
requires that INN be built with zlib. Only records larger than 600 bytes get
compressed, because that is the point at which compression starts to become
significant.
If compression is not enabled (either from the
compress option in
ovdb.conf or INN was not built with zlib support), the database will be
backward compatible with older versions of ovdb. However, if compression is
enabled, the database is marked with a newer version that will prevent older
versions of ovdb from opening the database.
You can upgrade an existing database to use compression simply by setting
compress to true in
ovdb.conf. Note that existing records in the
database will remain uncompressed; only new records added after enabling
compression will be compressed.
If you disable compression on a database that previously had it enabled, new
records will be stored uncompressed, but the database will still be
incompatible with older versions of ovdb (and will also be incompatible with
this version of ovdb if INN was not built with zlib support). So to downgrade
to a completely uncompressed database, you will have to rebuild the database
using
makehistory.
A file called
DB_CONFIG may be placed in the database directory (
pathoverview in
inn.conf) to customize where the various
database files and transaction logs are written. By default, all of the files
are written in the "DB_HOME" directory. One way to improve
performance is to put the transaction logs on a different disk. To do this,
put:
DB_LOG_DIR /path/to/logs
in the
DB_CONFIG file. If the pathname you give starts with a
"/", it is treated as an absolute path; otherwise, it is relative to
the "DB_HOME" directory. Make sure that any directories you specify
exist and have proper ownership/mode before starting INN, because they won't
be created automatically. Also, don't change the
DB_CONFIG file while
anything that uses ovdb is running.
Another thing that you can do with this file is to split the overview database
across multiple disks. In the
DB_CONFIG file, you can list directories
that Berkeley DB will search when it goes to open a database.
For example, let's say that you have
pathoverview set to
/mnt/overview and you have four additional file systems created on
/mnt/ovX. You would create a file
/mnt/overview/DB_CONFIG
containing the following lines:
set_data_dir /mnt/overview
set_data_dir /mnt/ov1
set_data_dir /mnt/ov2
set_data_dir /mnt/ov3
set_data_dir /mnt/ov4
Distribute your
ovNNNNN files into the four filesystems (say, 8 each).
When called upon to open a database file, the db library will look for it in
each of the specified directories (in order). If said file is not found, one
will be created in the first of those directories.
Whenever you change
DB_CONFIG or move database files around, make sure
all news processes that use the database are shut down first (including
nnrpd processes).
The
DB_CONFIG functionality is part of Berkeley DB itself, rather
than something provided by ovdb. See the Berkeley DB documentation for
complete details for the version of Berkeley DB that you're running.
When starting the news system,
rc.news will invoke the
ovdb_init
program. See the
ovdb_init(8) man page for information about the tasks
it performs.
ovdb_init must be run before using the database.
And when stopping INN,
rc.news kills the
ovdb_monitor processes
after the other INN processes have been shut down.
Problems relating to ovdb are logged to
news.err with "OVDB" in
the error message.
INN programs that use overview will fail to start up if the
ovdb_monitor
processes aren't running. Be sure to run
ovdb_init before running
anything that accesses overview.
Also, INN programs that use overview will fail to start up if the user running
them is not the news user.
If a program accessing the database crashes, or otherwise exits uncleanly, it
might leave a stale lock in the database. This lock could cause other
processes to deadlock on that stale lock. To fix this, shut down all news
processes (using "kill -9" if necessary) and then restart.
ovdb_init should perform a recovery operation which will remove the
locks and repair damage caused by killing the deadlocked processes.
-
pathetc/inn.conf
- The ovmethod and pathoverview parameters are
relevant to ovdb.
-
pathetc/ovdb.conf
- Optional configuration file for tuning. See
"CONFIGURATION" above.
- pathoverview
- Directory where the database goes. Berkeley DB calls
it the "DB_HOME" directory.
-
pathoverview/DB_CONFIG
- Optional file to configure the layout of the database
files.
-
pathrun/ovdb.sem
- A file that gets locked by every process that is accessing
the database. This is used by ovdb_init to determine whether the
database is active or quiescent.
-
pathrun/ovdb_monitor.pid
- Contains the process ID of ovdb_monitor.
Implement a way to limit how many databases can be open at once (to reduce file
descriptor usage); maybe using something similar to the cache code in legacy
ov3.c file.
Written by Heath Kehoe <
[email protected]> for InterNetNews.
inn.conf(5),
innd(8),
makehistory(8),
nnrpd(8),
ovdb_init(8),
ovdb_monitor(8),
ovdb_stat(8).
Berkeley DB documentation: in the
docs directory of the
Berkeley DB source distribution, or on the Oracle Berkeley DB
web page
(<
https://www.oracle.com/database/technologies/related/berkeleydb.html>).