storage.conf - Configuration file for storage manager
The file
pathetc/storage.conf contains the rules to be used in assigning
articles to different storage methods. These rules determine where incoming
articles will be stored.
The storage manager is a unified interface between INN and a variety of
different storage methods, allowing the news administrator to choose between
different storage methods with different trade-offs (or even use several at
the same time for different newsgroups, or articles of different sizes). The
rest of INN need not care what type of storage method was used for a given
article; the storage manager will figure this out automatically when that
article is retrieved via the storage API. Note that you may also want to see
the options provided in
inn.conf(5) regarding article storage.
The
storage.conf file consists of a series of storage method entries.
Blank lines and lines beginning with a number sign ("#") are
ignored. The maximum number of characters in each line is 255. The order of
entries in this file is important, see below.
Each entry specifies a storage method and a set of rules. Articles which match
all of the rules of a storage method entry will be stored using that storage
method; if an article matches multiple storage method entries, the first one
will be used. Each entry is formatted as follows:
method <methodname> {
class: <storage_class>
newsgroups: <wildmat>
size: <minsize>[,<maxsize>]
expires: <mintime>[,<maxtime>]
options: <options>
exactmatch: <bool>
}
If spaces or tabs are included in a value, that value must be enclosed in double
quotes (""). If either a number sign ("#") or a double
quote are meant to be included verbatim in a value, they should be escaped
with "\".
<methodname> is the name of a storage method to use for articles which
match the rules of this entry. The currently available storage methods are:
cnfs
timecaf
timehash
tradspool
trash
See the "STORAGE METHODS" section below for more details.
The meanings of the keys in each storage method entry are as follows:
-
class: <storage_class>
- An identifier for this storage method entry.
<storage_class> should be a number between 0 and 255. It should be
unique across all of the entries in this file. It is mainly used for
specifying expiration times by storage class as described in
expire.ctl(5); "timehash" and "timecaf" will
also set the top-level directory in which articles accepted by this
storage class are stored. The assignment of a particular number to a
storage class is arbitrary but permanent (since it is used in storage
tokens). Storage classes can be for instance numbered sequentially in
storage.conf.
-
newsgroups: <wildmat>
- What newsgroups are stored using this storage method.
<wildmat> is a uwildmat pattern which is matched against the
newsgroups an article is posted to. If storeonxref in
inn.conf is true, this pattern will be matched against the
newsgroup names in the Xref header field body; otherwise, it will be
matched against the newsgroup names in the Newsgroups header field body
(see inn.conf(5) for discussion of the differences between these
possibilities). Poison wildmat expressions (expressions starting with
"@") are allowed and can be used to exclude certain group
patterns: articles crossposted to poisoned newsgroups will not be stored
using this storage method. The <wildmat> pattern is matched in
order.
There is no default newsgroups pattern; if an entry should match all
newsgroups, use an explicit "newsgroups: *".
-
size: <minsize>[,<maxsize>]
- A range of article sizes (in bytes) which should be stored
using this storage method. If <maxsize> is 0 or not given, the upper
size of articles is limited only by maxartsize in inn.conf.
The size: field is optional and may be omitted entirely if you want
articles of any size to be stored in this storage method (if, of course,
these articles fulfill all the other requirements of this storage method
entry). By default, <minsize> is set to 0.
-
expires: <mintime>[,<maxtime>]
- A range of article expiration times which should be stored
using this storage method. Be careful; this is less useful than it may
appear at first. This is based only on the Expires header field of
the article, not on any local expiration policies or anything in
expire.ctl! If <mintime> is non-zero, then this entry will
not match any article without an Expires header field. This key is
therefore only really useful for assigning articles with requested longer
expire times to a separate storage method. Articles only match if the time
until expiration (that is to say, the amount of time into the future that
the Expires header field of the article requests that it remain around)
falls in the interval specified by <mintime> and <maxtime>.
The format of these parameters is "0d0h0m0s" (days, hours,
minutes, and seconds into the future). If <maxtime> is
"0s" or is not specified, there is no upper bound on expire
times falling into this entry (note that this key has no effect on when
the article will actually be expired, but only on whether or not the
article will be stored using this storage method). This field is also
optional and may be omitted entirely if you do not want to store articles
according to their Expires header field, if any.
A <mintime> value greater than "0s" implies that this
storage method won't match any article without an Expires header
field.
-
options: <options>
- This key is for passing special options to storage methods
that require them (currently only "cnfs"). See the "STORAGE
METHODS" section below for a description of its use.
-
exactmatch: <bool>
- If this key is set to true, all the newsgroups in the
Newsgroups header field body of incoming articles will be examined to see
if they match newsgroups patterns. (Normally, any non-zero number of
matching newsgroups is sufficient, provided no newsgroup matches a poison
wildmat as described above.) This is a boolean value; "true",
"yes" and "on" are usable to enable this key. The case
of these values is not significant. The default is false.
If an article matches all of the constraints of an entry, it is stored via that
storage method and is associated with that <storage_class>. This file is
scanned in order and the first matching entry is used to store the article.
If an article does not match any entry, either by being posted to a newsgroup
which does not match any of the <wildmat> patterns or by being outside
the size and expires ranges of all entries whose newsgroups pattern it does
match, the article is not stored and is rejected by
innd. When this
happens, the error message:
cant store article: no matching entry in storage.conf
is logged to syslog. If you want to silently drop articles matching certain
newsgroup patterns or size or expires ranges, assign them to the
"trash" storage method rather than having them not match any storage
method entry.
Currently, there are five storage methods available. Each method has its pros
and cons; you can choose any mixture of them as is suitable for your
environment. Note that each method has an attribute EXPENSIVESTAT which
indicates whether checking the existence of an article is expensive or not.
This is used to run
expireover(8).
- cnfs
- The "cnfs" storage method stores articles in
large cyclic buffers (CNFS stands for Cyclic News File System). Articles
are stored in CNFS buffers in arrival order, and when the buffer fills, it
wraps around to the beginning and stores new articles over the top of the
oldest articles in the buffer. The expire time of articles stored in CNFS
buffers is therefore entirely determined by how long it takes the buffer
to wrap around, which depends on how quickly data is being stored in it.
(This method is therefore said to have self-expire functionality. It also
means that when an article is cancelled, the cycbuff doesn't go back and
use space until it rolls over and the whole cycbuff starts being reused.)
EXPENSIVESTAT is false for this method.
CNFS has its own configuration file, cycbuff.conf, which describes
some subtleties to the basic description given above. Storage method
entries for the "cnfs" storage method must have an options:
field specifying the metacycbuff into which articles matching that entry
should be stored; see cycbuff.conf(5) for details on metacycbuffs.
Advantages: By far the fastest of all storage methods (except for
"trash"), since it eliminates the overhead of dealing with a
file system and creating new files. Unlike all other storage methods, it
does not require manual article expiration. With CNFS, the server will
never throttle itself due to a full spool disk, and groups are restricted
to just the buffer files given so that they can never use more than the
amount of disk space allocated to them.
Disadvantages: Article retention times are more difficult to control because
old articles are overwritten automatically. Attacks on Usenet, such as
flooding or massive amounts of spam, can result in wanted articles
expiring much faster than intended (with no warning).
- timecaf
- This method stores multiple articles in one file, whose
name is based on the article's arrival time and the storage class. The
file name will be:
<patharticles>/timecaf-nn/bb/aacc.CF
where "nn" is the hexadecimal value of <storage_class>,
"bb" and "aacc" are the hexadecimal components of the
arrival time, and "CF" is a hardcoded extension. (The arrival
time, in seconds since the epoch, is converted to hexadecimal and
interpreted as 0xaabbccdd, with "aa", "bb", and
"cc" used to build the path.) This method does not have
self-expire functionality (meaning expire has to run periodically
to delete old articles, as well as cancelled articles if
immediatecancel is not set to true in inn.conf).
EXPENSIVESTAT is false for this method.
Advantages: It is roughly four times faster than "timehash" for
article writes, since much of the file system overhead is bypassed, while
still retaining the same fine control over article retention time.
Disadvantages: Using this method means giving up all but the most careful
manually fiddling with the article spool; in this aspect, it looks like
"cnfs". As one of the newer and least widely used storage types,
"timecaf" has not been as thoroughly tested as the other
methods.
- timehash
- This method is very similar to "timecaf" except
that each article is stored in a separate file. The name of the file for a
given article will be:
<patharticles>/time-nn/bb/cc/yyyy-aadd
where "nn" is the hexadecimal value of <storage_class>,
"yyyy" is a hexadecimal sequence number, and "bb",
"cc", and "aadd" are components of the arrival time in
hexadecimal (the arrival time is interpreted as documented above under
"timecaf"). This method does not have self-expire functionality.
Cancelled articles are removed immediately. EXPENSIVESTAT is true for this
method.
Advantages: Heavy traffic groups do not cause bottlenecks, and a fine
control of article retention time is still possible.
Disadvantages: The ability to easily find all articles in a given newsgroup
and manually fiddle with the article spool is lost, and INN still suffers
from speed degradation due to file system overhead (creating and deleting
individual files is a slow operation).
- tradspool
- Traditional spool, or "tradspool", is the
traditional news article storage format. Each article is stored in an
individual text file named:
<patharticles>/news/group/name/nnnnn
where "news/group/name" is the name of the newsgroup to which the
article was posted with each period changed to a slash, and
"nnnnn" is the sequence number of the article in that newsgroup.
For crossposted articles, the article is linked into each newsgroup to
which it is crossposted (using either hard or symbolic links). This is the
way versions of INN prior to 2.0 stored all articles, as well as being the
article storage format used by C News and earlier news systems. This
method does not have self-expire functionality. Cancelled articles are
removed immediately. EXPENSIVESTAT is true for this method.
Advantages: It is widely used and well-understood; it can read article
spools written by older versions of INN and it is compatible with all
third-party INN add-ons. This storage mechanism provides easy and direct
access to the articles stored on the server and makes writing programs
that fiddle with the news spool very easy, and gives fine control over
article retention times.
Disadvantages: It takes a very fast file system and I/O system to keep up
with current Usenet traffic volumes due to file system overhead. Groups
with heavy traffic tend to create a bottleneck because of inefficiencies
in storing large numbers of article files in a single directory. It
requires a nightly expire program to delete old articles out of the news
spool, a process that can slow down the server for several hours or
more.
- trash
- This method silently discards all articles stored in it.
Its only real uses are for testing and for silently discarding articles
matching a particular storage method entry (for whatever reason). Articles
stored in this method take up no disk space and can never be retrieved, so
this method has self-expire functionality of a sort. EXPENSIVESTAT is
false for this method.
The following sample
storage.conf file would store all articles posted to
alt.binaries.* in the "BINARIES" CNFS metacycbuff, all articles over
roughly 50 KB in any other hierarchy in the "LARGE" CNFS
metacycbuff, all other articles in alt.* in one timehash class, and all other
articles in any newsgroups in a second timehash class, except for the
internal.* hierarchy which is stored in traditional spool format.
method tradspool {
class: 1
newsgroups: internal.*
}
method cnfs {
class: 2
newsgroups: alt.binaries.*
options: BINARIES
}
method cnfs {
class: 3
newsgroups: *
size: 50000
options: LARGE
}
method timehash {
class: 4
newsgroups: alt.*
}
method timehash {
class: 5
newsgroups: *
}
Notice that the last storage method entry will catch everything. This is a good
habit to get into; make sure that you have at least one catch-all entry just
in case something you did not expect falls through the cracks. Notice also
that the special rule for the internal.* hierarchy is first, so it will catch
even articles crossposted to alt.binaries.* or over 50 KB in size.
As for poison wildmat expressions, if you have for instance an article
crossposted between misc.foo and misc.bar, the pattern:
misc.*,!misc.bar
will match that article whereas the pattern:
misc.*,@misc.bar
will not match that article. An article posted only to misc.bar will fail to
match either pattern.
Usually, high-volume groups and groups whose articles do not need to be kept
around very long (binaries groups, *.jobs*, news.lists.filters, etc.) are
stored in CNFS buffers. Use the other methods (or CNFS buffers again) for
everything else. However, it is as often as not most convenient to keep in
"tradspool" special hierarchies like local hierarchies and
hierarchies that should never expire or through the spool of which you need to
go manually.
Written by Katsuhiro Kondou <
[email protected]> for InterNetNews. Rewritten
into POD by Julien Elie.
cycbuff.conf(5),
expire.ctl(5),
expireover(8),
inn.conf(5),
innd(8),
libinn_uwildmat(3).