slurmd - The compute node daemon for Slurm.
slurmd [
OPTIONS...]
slurmd is the compute node daemon of Slurm. It monitors all tasks running
on the compute node , accepts work (tasks), launches tasks, and kills running
tasks upon request.
- -b
- Report node rebooted when daemon restarted. Used for
testing purposes.
-
- -c
- Clear system locks as needed. This may be required if
slurmd terminated abnormally.
-
- -C
- Print the actual hardware configuration (not the
configuration from the slurm.conf file) and exit. The format of output is
the same as used in slurm.conf to describe a node's configuration
plus its uptime.
-
- --conf <node parameters>
- Used in conjunction with the -Z option. Used to
override or define additional parameters of a dynamic node using the same
syntax and parameters used to define nodes in the slurm.conf. Specifying
any of CPUs, Boards, SocketsPerBoard,
CoresPerSocket or ThreadsPerCore will override the defaults
defined by the -C option. NodeName and Port are not
supported.
For example if slurmd -C reports
NodeName=node1 CPUs=16 Boards=1 SocketsPerBoard=1 CoresPerSocket=8 ThreadsPerCore=2 RealMemory=31848
the following --conf specifications will generate the corresponding node
definitions:
--conf "Gres=gpu:2"
NodeName=node1 CPUs=16 Boards=1 SocketsPerBoard=1 CoresPerSocket=8 ThreadsPerCore=2 RealMemory=31848 Gres=gpu:2
--conf "RealMemory=30000"
NodeName=node1 CPUs=16 Boards=1 SocketsPerBoard=1 CoresPerSocket=8 ThreadsPerCore=2 RealMemory=30000
--conf "CPUs=16"
NodeName=node1 CPUs=16 RealMemory=331848
--conf "CPUs=16 RealMemory=30000 Gres=gpu:2"
NodeName=node1 CPUs=16 RealMemory=30000 Gres=gpu:2"
-
- --conf-server <host>[:<port>]
- Comma-separated list of controllers, the first being the
primary slurmctld. A port can (optionally) be specified for each
controller. These hosts are where the slurmd will fetch the configuration
from when running in "configless" mode.
-
- -d <file>
- Specify the fully qualified pathname to the
slurmstepd program to be used for shepherding user job steps. This
can be useful for testing purposes.
-
- -D
- Run slurmd in the foreground. Error and debug messages will
be copied to stderr.
-
- -f <file>
- Read configuration from the specified file. See
NOTES below.
-
- -F[feature]
- Start this node as a Dynamic Future node. It will try to
match a node definition with a state of FUTURE, optionally using
the specified feature to match the node definition.
-
- -G
- Print Generic RESource (GRES) configuration (based upon
slurm.conf GRES merged with gres.conf contents for this node) and
exit.
-
- -h
- Help; print a brief summary of command options.
-
- -L <file>
- Write log messages to the specified file.
-
- -M
- Lock slurmd pages into system memory using mlockall (2) to
disable paging of the slurmd process. This may help in cases where nodes
are marked DOWN during periods of heavy swap activity. If the mlockall (2)
system call is not available, an error will be printed to the log and
slurmd will continue as normal.
It is suggested to set LaunchParameters=slurmstepd_memlock in
slurm.conf(5) when setting -M.
-
- -n <value>
- Set the daemon's nice value to the specified value,
typically a negative number. Also note the PropagatePrioProcess
configuration parameter.
-
- -N <nodename>
- Run the daemon with the given nodename. Used to emulate a
larger system with more than one slurmd daemon per node. Requires that
Slurm be built using the --enable-multiple-slurmd configure option.
-
- -s
- Change working directory of slurmd to SlurmdLogFile path if
possible, or to SlurmdSpoolDir otherwise. If both of them fail it will
fallback to /var/tmp.
-
- -v
- Verbose operation. Multiple -v's increase verbosity.
-
-
-V, --version
- Print version information and exit.
-
- -Z
- Start this node as a Dynamic Normal node. If no
--conf is specified, then the slurmd will register with the same
hardware configuration as defined by the -C option.
-
The following environment variables can be used to override settings compiled
into slurmd.
- SLURM_CONF
- The location of the Slurm configuration file. This is
overridden by explicitly naming a configuration file on the command
line.
-
- SLURM_DEBUG_FLAGS
- Specify debug flags for slurmd to use. See DebugFlags in
the slurm.conf(5) man page for a full list of flags. The
environment variable takes precedence over the setting in the
slurm.conf.
-
- SIGTERM SIGINT
-
slurmd will shutdown cleanly, waiting for
in-progress rollups to finish.
-
- SIGHUP
- Reloads the slurm configuration files, similar to 'scontrol
reconfigure'.
-
- SIGUSR2
- Reread the log level from the configs, and then reopen the
log file. This should be used when setting up logrotate(8).
-
- SIGPIPE
- This signal is explicitly ignored.
-
If slurmd is started with the
-D option then the core file will be
written to the current working directory. Otherwise if
SlurmdLogFile is
a fully qualified path name (starting with a slash), the core file will be
written to the same directory as the log file. Otherwise the core file will be
written to the
SlurmSpoolDir directory, or "/var/tmp/" as a
last resort. If none of the above directories can be written, no core file
will be produced.
It may be useful to experiment with different
slurmd specific
configuration parameters using a distinct configuration file (e.g. timeouts).
However, this special configuration file will not be used by the
slurmctld daemon or the Slurm programs, unless you specifically tell
each of them to use it. If you desire changing communication ports, the
location of the temporary file system, or other parameters used by other Slurm
components, change the common configuration file,
slurm.conf.
If you are using configless mode with a login node that runs a lot of client
commands, you may consider running
slurmd on that machine so it can
manage a cached version of the configuration files. Otherwise, each client
command will use the DNS record to contact the controller and get the
configuration information, which could place additional load on the
controller.
Copyright (C) 2002-2007 The Regents of the University of California. Copyright
(C) 2008-2010 Lawrence Livermore National Security. Copyright (C) 2010-2022
SchedMD LLC. Produced at Lawrence Livermore National Laboratory (cf,
DISCLAIMER).
This file is part of Slurm, a resource management program. For details, see
<
https://slurm.schedmd.com/>.
Slurm is free software; you can redistribute it and/or modify it under the terms
of the GNU General Public License as published by the Free Software
Foundation; either version 2 of the License, or (at your option) any later
version.
Slurm is distributed in the hope that it will be useful, but WITHOUT ANY
WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR
A PARTICULAR PURPOSE. See the GNU General Public License for more details.
/etc/slurm.conf
slurm.conf(5),
slurmctld(8)