bhost - LAM boot schema (host file) format
#
# comments
#
<machine> [cpu=<cpucount>] [user=<userid>]
<machine> [cpu=<cpucount>] [user=<userid>]
...
A boot schema describes the machines that will combine to form a multicomputer
running LAM. It is used by
recon(1) to verify initial conditions for running
LAM, by
lamboot(1) to start LAM, and by
lamhalt(1) to terminate LAM (note that
lamwipe(1) has been deprecated by the
lamhalt(1) command).
The particular syntax of a LAM boot schema is sometimes called the "host
file" syntax. It is line oriented. One line indicates the name of a
machine, typically the full Internet domain name, an optional number of CPUs
available on that machine, and optionally the userid with which to access it.
Common boot schema for a particular site may be created by the system
administrator and placed in the installation directory under
etc/. They
typically start with the prefix
bhost. Individual users usually create
their own boot schema, especially if the configurations are simple.
Note that
lamboot resolves all names listed in
bhost on the node
in which
lamboot was invoked on. The
lamboot(1) man page contains
information about address resolution, examples on how to handle multiple
network interface cards (NICs) in a node, etc.
Here is an example three node boot schema:
#
# example LAM host file
#
server.cluster.example.com schedule=no
beowulf1.cluster.example.com cpu=2
beowulf2.cluster.example.com
beowulf2.cluster.example.com
somewhere.else.example.com user=guest
Note that the "guest" ID is significant, since the user has an
alternate login ID on
somewhere.else.example.com. Additionally note
that
beowulf1 has a CPU count of 2 listed (a CPU count of 1 is assumed
if it is not given). This value is used by
mpirun(1),
MPI_Comm_spawn(2), and
MPI_Comm_spawn_multiple(2) for the "C" (or CPU) notation that
specifies how many ranks to start. This is particularly useful for running on
SMP machines.
Note the
schedule=no clause. This means that LAM will boot a daemon on
that node, but by default, will not launch any MPI processes on that node.
This is handy for when you want to control your MPI applications from one node
(e.g., a server), but don't want to run any MPI applications on it. In some
environments this is the default (e.g., BProc). See the LAM User's Guide for
more details.
beowulf2 is listed twice, but has no specific CPU count listed. In this
case, LAM will keep a running tally of the total number of CPUs for that host.
Hence, LAM will calculate that
beowulf2 has two CPUs available for use.
Calculating the number of CPUs by counting occurances of a hostname is useful
in a batch environment where a hostfile may list the same hostname multiple
times, indicating that the batch scheduler has allocated multiple CPUs for a
single job (e.g., PBS operates this way).
For the above-mentioned schema, the command "mpirun C foo" would start
five instances of the foo program; two on
beowulf1, two on
beowulf2, and one on
somewhere.else.
- $LAMHOME/etc/bhost.def
- default boot schema file
LAM User's Guide,
lamboot(1),
lamhalt(1),
mpirun(1),
MPI_Comm_spawn(1),
MPI_Comm_spawn_multiple(1),
recon(1),
lamwipe(1)