innfeed, imapfeed - Multi-host, multi-connection, streaming NNTP feeder
innfeed [
-ChmMvxyz] [
-a spool-dir] [
-b
directory] [
-c config-file] [
-d log-level]
[
-e bytes] [
-l logfile] [
-o bytes]
[
-p pid-file] [
-s command] [
-S
status-file] [
file]
innfeed implements the NNTP protocol for transferring news between
computers. It handles the standard IHAVE protocol as well as the
CHECK/TAKETHIS streaming extension.
innfeed can feed any number of
remote hosts at once and will open multiple connections to each host if
configured to do so. The only limitations are the process limits for open file
descriptors and memory.
As an alternative to using NNTP, INN may also be fed to an IMAP server. This is
done by using an executable called
imapfeed, which is identical to
innfeed except for the delivery process. The new version has two types
of connections: an LMTP connection to deliver regular messages and an IMAP
connection to handle control messages.
innfeed has three modes of operation: channel, funnel-file and batch.
Channel mode is used when no filename is given on the command line, the
input-file keyword is
not given in the config file,
and
the
-x option is
not given. In channel mode,
innfeed runs
with stdin connected via a pipe to
innd. Whenever
innd closes
this pipe (and it has several reasons during normal processing to do so),
innfeed will exit. It first will try to finish sending all articles it
was in the middle of transmitting, before issuing a QUIT command. This means
innfeed may take a while to exit depending on how slow your peers are.
It never (well, almost never) just drops the connection. The recommended way
to restart
innfeed when run in channel mode is therefore to tell
innd to close the pipe and spawn a new
innfeed process. This can
be done with "ctlinnd flush
feed" where
feed is the
name of the
innfeed channel feed in the
newsfeeds file.
Funnel-file mode is used when a filename is given as an argument or the
input-file keyword is given in the config file. In funnel-file mode, it
reads the specified file for the same formatted information as
innd
would give in channel mode. It is expected that
innd is continually
writing to this file, so when
innfeed reaches the end of the file, it
will check periodically for new information. To prevent the funnel file from
growing without bounds, you will need to periodically move the file to the
side (or simply remove it) and have
innd flush the file. Then, after
the file is flushed by
innd, you can send
innfeed a SIGALRM, and
it too will close the file and open the new file created by
innd.
Something like:
innfeed -p <pathrun in inn.conf>/innfeed.pid my-funnel-file &
while true; do
sleep 43200
rm -f my-funnel-file
ctlinnd flush funnel-file-site
kill -ALRM `cat <pathrun>/innfeed.pid`
done
Batch mode is used when the
-x flag is used. In batch mode,
innfeed will ignore stdin, and will simply process any backlog created
by a previously running
innfeed. This mode is not normally needed as
innfeed will take care of backlog processing.
innfeed expects a couple of things to be able to run correctly: a
directory where it can store backlog files and a configuration file to
describe which peers it should handle.
The configuration file is described in
innfeed.conf(5). The
-c
option can be used to specify a different file. For each peer (say,
"foo"),
innfeed manages up to 4 files in the backlog
directory:
- •
- A foo.lock file, which prevents other instances of
innfeed from interfering with this one.
- •
- A foo.input file which has old article information
innfeed is reading for re-processing.
- •
- A foo.output file where innfeed is writing
information on articles that could not be processed (normally due to a
slow or blocked peer).
- •
- A foo file that is never created by innfeed,
but if innfeed notices it, it will rename it to foo.input at
the next opportunity and will start reading from it. This lets you create
a batch file and put it in a place where innfeed will find it.
You should never alter the
foo.input or
foo.output files of a
running
innfeed. The format of these last three files is one of the
following:
/path/to/article <message-id>
@token@ <message-id>
This is the same as the first two fields of the lines
innd feeds to
innfeed, and the same as the first two fields of the lines of the batch
file
innd will write if
innfeed is unavailable for some reason.
When
innfeed processes its own batch files, it ignores everything after
the first two whitespace separated fields, so moving the
innd-created
batch file to the appropriate spot will work, even though the lines have extra
fields.
The first field can also be a storage API token. The two types of lines can be
intermingled;
innfeed will use the storage manager if appropriate, and
otherwise treat the first field as a filename to read directly.
innfeed writes its current status to the file
innfeed.status (or
the file given by the
-S option). This file contains details on the
process as a whole, and on each peer this instance of
innfeed is
managing.
If
innfeed is told to send an article to a host it is not managing, then
the article information will be put into a file matching the pattern
innfeed-dropped.*, with part of the file name matching the pid of the
innfeed process that is writing to it.
innfeed will not process
this file except to write to it. If nothing is written to the file, then it
will be removed if
innfeed exits normally. Otherwise, the file remains,
and
procbatch can be invoked to process it afterwards.
Upon receipt of a SIGALRM,
innfeed will close the funnel file specified
on the command line, and will reopen it (see funnel file description above).
innfeed with catch SIGINT and will write a large debugging snapshot of
the state of the running system.
innfeed will catch SIGHUP and will reload both the config and the log
files. See
innfeed.conf(5) for more details.
innfeed will catch SIGCHLD and will close and reopen all backlog files.
innfeed will catch SIGTERM and will do an orderly shutdown.
Upon receipt of a SIGUSR1,
innfeed will increment the debugging level by
one; receipt of a SIGUSR2 will decrement it by one. The debugging level starts
at zero (unless the
-d option it used), in which case no debugging
information is emitted. A larger value for the level means more debugging
information. Numbers up to 5 are currently useful.
There are 3 different categories of syslog entries for statistics: host,
connection and global.
The host statistics are generated for a given peer at regular intervals after
the first connection is made (or, if the remote is unreachable, after spooling
starts). The host statistics give totals over all connections that have been
active during the given time frame. For example (broken here to fit the page,
with "vixie" being the peer):
May 23 12:49:08 news innfeed[16015]: vixie checkpoint
seconds 1381 offered 2744 accepted 1286 refused 1021
rejected 437 missing 0 accsize 8506220 rejsize 142129
spooled 990 on_close 0 unspooled 240 deferred 10/15.3
requeued 25 queue 42.1/100:14,35,13,4,24,10
The meanings of these fields are:
- seconds
- The time since innfeed connected to the host or
since the statistics were reset by a "final" log entry.
- offered
- The number of IHAVE commands sent to the host if it is not
in streaming mode. The sum of the number of TAKETHIS commands sent when
no-CHECK mode is in effect plus the number of CHECK commands sent in
streaming mode (when no-CHECK mode is not in effect).
- accepted
- The number of articles which were sent to the remote host
and accepted by it.
- refused
- The number of articles offered to the host that it
indicated it did not want because it had already seen the message-ID. The
remote host indicates this by sending a 435 response to an IHAVE command
or a 438 response to a CHECK command.
- rejected
- The number of articles transferred to the host that it did
not accept because it determined either that it already had the article or
it did not want it because of the article's Newsgroups or Distribution
header fields, etc. The remote host indicates that it is rejecting the
article by sending a 437 or 439 response after innfeed sent the
entire article.
- missing
- The number of articles which innfeed was told to
offer to the host but which were not present in the article spool. These
articles were probably cancelled or expired before innfeed was able
to offer them to the host.
- accsize
- The number of bytes of all accepted articles transferred to
the host.
- rejsize
- The number of bytes of all rejected articles transferred to
the host.
- spooled
- The number of article entries that were written to the
.output backlog file because the articles either could not be sent
to the host or were refused by it. Articles are generally spooled either
because new articles are arriving more quickly than they can be offered to
the host, or because innfeed closed all the connections to the host
and pushed all the articles currently in progress to the .output
backlog file.
- on_close
- The number of articles that were spooled when
innfeed closed all the connections to the host.
- unspooled
- The number of article entries that were read from the
.input backlog file.
- deferred
- The first number is the number of articles that the host
told innfeed to retry later by sending a 431 or 436 response.
innfeed immediately puts these articles back on the tail of the
queue.
The second number is the average (mean) size of deferred articles during the
previous logging interval
- requeued
- The number of articles that were in progress on connections
when innfeed dropped those connections and put the articles back on
the queue. These connections may have been broken by a network problem or
became unresponsive causing innfeed to time them out.
- queue
- The first number is the average (mean) queue size during
the previous logging interval. The second number is the maximum allowable
queue size. The third number is the percentage of the time that the queue
was empty. The fourth through seventh numbers are the percentages of the
time that the queue was >0% to 25% full, 25% to 50% full, 50% to 75%
full, and 75% to <100% full. The last number is the percentage of the
time that the queue was totally full.
If the
-z option is used (see below), then when the peer stats are
generated, each connection will log its stats too. For example, for connection
number zero (from a set of five):
May 23 12:49:08 news innfeed[16015]: vixie:0 checkpoint
seconds 1381 offered 596 accepted 274 refused 225
rejected 97 accsize 773623 rejsize 86591
If you only open a maximum of one connection to a remote, then there will be a
close correlation between connection numbers and host numbers, but in general
you cannot tie the two sets of number together in any easy or very meaningful
way. When a connection closes, it will always log its stats.
If all connections for a host get closed together, then the host logs its stats
as "final" and resets its counters. If the feed is so busy that
there is always at least one connection open and running, then after some
amount of time (set via the config file), the host stats are logged as final
and reset. This is to make generating higher level stats from log files, by
other programs, easier.
There is one log entry that is emitted for a host just after its last connection
closes and
innfeed is preparing to exit. This entry contains counts
over the entire life of the process. The "seconds" field is from the
first time a connection was successfully built, or the first time spooling
started. If a host has been completely idle, it will have no such log entry.
May 23 12:49:08 news innfeed[16015]: decwrl global
seconds 1381 offered 34 accepted 22 refused 3 rejected 7
missing 0 accsize 81277 rejsize 12738 spooled 0 unspooled 0
The final log entry is emitted immediately before exiting. It contains a summary
of the statistics over the entire life of the process.
Feb 13 14:43:41 news innfeed[22344]: ME global
seconds 15742 offered 273441 accepted 45750 refused 222008
rejected 3334 missing 217 accsize 93647166 rejsize 7421839
spooled 10 unspooled 0
innfeed takes the following options.
-
-a spool-dir
- The -a flag is used to specify the top of the
article spool tree. innfeed does a chdir(2) to this
directory, so it should probably be an absolute path. The default is
patharticles as set in inn.conf.
-
-b directory
- The -b flag may be used to specify a different
directory for backlog file storage and retrieval, as well as for lock
files. If the path is relative, then it is relative to pathspool as
set in inn.conf. The default is "innfeed".
-
-c config-file
- The -c flag may be used to specify a different
config file from the default value. If the path is relative, then it is
relative to pathetc as set in inn.conf. The default is
innfeed.conf.
- -C
- The -C flag is used to have innfeed simply
check the config file, report on any errors and then exit.
-
-d log-level
- The -d flag may be used to specify the initial
logging level. All debugging messages go to stderr (which may not be what
you want, see the -l flag below).
-
-e bytes
- The -e flag may be used to specify the size limit
(in bytes) for the .output backlog files innfeed creates. If
the output file gets bigger than 10% more than the given number,
innfeed will replace the output file with the tail of the original
version. The default value is 0, which means there is no limit.
- -h
- Use the -h flag to print the usage message.
-
-l logfile
- The -l flag may be used to specify a different log
file from stderr. As innd starts innfeed with stderr
attached to /dev/null, using this option can be useful in catching any
abnormal error messages, or any debugging messages (all "normal"
errors messages go to syslog).
- -m
- The -m flag is used to turn on logging of all
missing articles. Normally, if an article is missing, innfeed keeps
a count, but logs no further information. When this flag is used, details
about message-IDs and expected path names are logged.
- -M
- If innfeed has been built with mmap support, then
the -M flag turns OFF the use of mmap(); otherwise, it has
no effect.
-
-o bytes
- The -o flag sets a value of the maximum number of
bytes of article data innfeed is supposed to keep in memory. This
does not work properly yet.
-
-p pid-file
- The -p flag is used to specify the file name to
write the pid of the process into. A relative path is relative to
pathrun as set in inn.conf. The default is
innfeed.pid.
-
-s command
- The -s flag specifies the name of a command to run
in a subprocess and read article information from. This is similar to
channel mode operation, only that command takes the place usually
occupied by innd.
-
-S status-file
- The -S flag specifies the name of the file to write
the periodic status to. If the path is relative, it is considered relative
to pathlog as set in inn.conf. The default is
innfeed.status.
- -v
- When the -v flag is given, version information is
printed to stderr and then innfeed exits.
- -x
- The -x flag is used to tell innfeed not to
expect any article information from innd but just to process any
backlog files that exist and then exit.
- -y
- The -y flag is used to allow dynamic peer binding.
If this flag is used and article information is received from innd
that specifies an unknown peer, then the peer name is taken to be the IP
name too, and an association with it is created. Using this, it is
possible to only have the global defaults in the innfeed.conf file,
provided the peer name as used by innd is the same as the IP name.
Note that innfeed with -y and no peer in innfeed.conf
would cause a problem that innfeed drops the first article.
- -z
- The -z flag is used to cause each connection, in a
parallel feed configuration, to report statistics when the controller for
the connections prints its statistics.
When using the
-x option, the config file entry's
initial-connections field will be the total number of connections
created and used, no matter how many big the batch file, and no matter how big
the
max-connections field specifies. Thus a value of 0 for
initial-connections means nothing will happen in
-x mode.
innfeed does not automatically grab the file out of
pathoutgoing.
This needs to be prepared for it by external means.
Probably too many other bugs to count.
An alternative to
innfeed can be
innduct, maintained by Ian
Jackson and available at
<
http://www.chiark.greenend.org.uk/ucgi/~ian/git-manpage/innduct.git/innduct.8>.
It is intended to solve a design issue in the way
innfeed works. As a
matter of fact, the program feed protocol spoken between
innd and
innfeed is lossy: if
innfeed dies unexpectedly, articles which
innd has written to the pipe to
innfeed will be skipped.
innd has no way of telling which articles those are, no useful records,
and no attempts to resend these articles.
-
pathbin/innfeed
- The binary program itself.
-
pathetc/innfeed.conf
- The configuration file.
-
pathspool/innfeed
- The directory for backlog files.
Written by James Brister <
[email protected]> for InterNetNews. Converted to
POD by Julien Elie.
Earlier versions of
innfeed (up to 0.10.1) were shipped separately;
innfeed is now part of INN and shares the same version number.
ctlinnd(8),
inn.conf(5),
innfeed.conf(5),
innd(8),
procbatch(8).