recon - Check if LAM can be started.
recon [-a] [-b] [-d] [-h] [-v] [-nn] [-np] [-ssi key value] [bhost]
- -a
- Report all host errors.
- -b
- Assume local and remote shell are the same. This means that
only one remote shell invocation is used to each node. If -b is not
used, two remote shell invocations are used to each node.
- -d
- Turn on debugging.
- -h
- Print the command help menu.
- -ssi key value
- Send arguments to various SSI modules. See the
"SSI" section, below.
- -v
- Be verbose.
- -nn
- Don't add "-n" to the remote agent command
line
- -np
- Do not force the execution of $HOME/.profile on remote
hosts
In order for LAM to be started on a remote UNIX machine, several requirements
have to be fulfilled:
- 1)
- The machine must be reachable via the network.
- 2)
- The user must be able to remotely execute on the machine
with the default remote shell program that was chosen when LAM was
configured. This is usually rsh(1), but any remote shell program is
acceptable (such as ssh(1), etc.). Note that remote host permission must
be configured such that the remote shell program will not ask for a
password when a command is invoked on remote host.
- 3)
- The remote user's shell must have a search path that will
locate LAM executables.
- 4)
- The remote shell's startup file must not print anything to
standard error when invoked non-interactively.
If any of these requirements is not met for any machine declared in
bhost, LAM will not be able to start. By running
recon first,
the user will be able to quickly identify and correct problems in the setup
that would inhibit LAM from starting.
The local machine where
recon is invoked must be one of the machines
specified in
bhost.
The
bhost file is a LAM boot schema written in the host file syntax. See
bhost(5). Instead of the command line, a boot schema can be specified in the
LAMBHOST environment variable. Otherwise a default file, bhost.def, is used.
LAM seaches for
bhost first in the local directory and then in the
installation directory under etc/.
recon tests each machine defined in
bhost by attempting to execute
on it the
tkill(1) command using its "pretend" option (no action is
taken). This test, if successful, indicates that all the requirements listed
above are met, and thus LAM can be started on the machine. If the attempt is
successful, the next machine is checked. In case the attempt fails, a
descriptive error message is displayed and
recon stops unless the
-a option is used, in which case
recon continues checking the
remaining machines.
If
recon takes a long time to finish successfully, this will be a good
indication to the user that the LAM system to be started has slow
communication links or heavily loaded machines, and it might be preferable to
exclude or replace some of the machines in the system.
The
-ssi switch allows the passing of parameters to various SSI modules.
LAM's SSI modules are described in detail in
lamssi(7). SSI modules have
direct impact on MPI programs because they allow tunable parameters to be set
at run time (such as which boot device driver to use, what parameters to pass
to that driver, etc.).
The
-ssi switch takes two arguments:
key and
value. The
key argument generally specifies which SSI module will receive the
value. For example, the
key "boot" is used to select which
RPI to be used for starting processes on remote nodes. The
value
argument is the value that is passed. For example:
- recon -ssi boot tm
- Tells LAM to use the "tm" boot module for native
launching in PBSPro / OpenPBS environments (the tm boot module does not
require a boot schema).
- recon -ssi boot rsh -ssi rsh_agent "ssh -x"
boot_file
- Tells LAM to use the "rsh" boot module, and tells
the rsh module to use "ssh -x" as the specific agent to launch
executables on remote nodes.
And so on. LAM's boot SSI modules are described in
lamssi_boot(7). This page
should be consulted for specific actions that are taken by, and how to tweak
the run-time behavior of each boot module.
The
-ssi switch can be used multiple times to specify different
key and/or
value arguments. If the same
key is specified
more than once, the
values are concatenated with a comma
(",") separating them.
Note that the
-ssi switch is simply a shortcut for setting environment
variables. The same effect may be accomplished by setting corresponding
environment variables before running
lamboot. The form of the
environment variables that LAM sets are:
LAM_MPI_SSI_key=value.
Note that the
-ssi switch overrides any previously set environment
variables. Also note that unknown
key arguments are still set as
environment variable -- they are not checked (by
lamwipe) for
correctness. Illegal or incorrect
value arguments may or may not be
reported -- it depends on the specific SSI module.
All tweakable aspects of launching executables on remote nodes during
recon are discussed in
lamssi(7) and
lamssi_boot(7). Topics include
(but are not limited to): discovery of remote shell, run-time overrides of the
agent use to launch remote executables (e.g., rsh and ssh), etc.
- laminstalldir/etc/lam-bhost.def
- default boot schema file, where "laminstalldir"
is the directory where LAM/MPI was installed.
- recon -v mynodes
- Check if LAM can be started on all the UNIX machines
described in the boot schema mynodes. Report about important steps
as they are done.
- recon -v -a
- Check if LAM can be started on all the UNIX machines
described in the default boot schema. Report about important steps as they
are done. Check all the machines; do not stop after the first error
message.
rsh(1),
tkill(1),
bhost(5),
lamboot(1),
lamwipe(1),
lam-helpfile(5),
lamssi(7),
lamssi_boot(7)