gfs2 - GFS2 reference guide
Overview of the GFS2 filesystem
GFS2 is a clustered filesystem, designed for sharing data between multiple nodes
connected to a common shared storage device. It can also be used as a local
filesystem on a single node, however since the design is aimed at clusters,
that will usually result in lower performance than using a filesystem designed
specifically for single node use.
GFS2 is a journaling filesystem and one journal is required for each node that
will mount the filesystem. The one exception to that is spectator mounts which
are equivalent to mounting a read-only block device and as such can neither
recover a journal or write to the filesystem, so do not require a journal
assigned to them.
The GFS2 documentation has been split into a number of sections:
mkfs.gfs2(8) Create a GFS2 filesystem
fsck.gfs2(8) The GFS2 filesystem checker
gfs2_grow(8) Growing a GFS2 filesystem
gfs2_jadd(8) Adding a journal to a GFS2 filesystem
tunegfs2(8) Tool to manipulate GFS2 superblocks
gfs2_edit(8) A GFS2 debug tool (use with caution)
-
lockproto=LockProtoName
- This specifies which inter-node lock protocol is used by
the GFS2 filesystem for this mount, overriding the default lock protocol
name stored in the filesystem's on-disk superblock.
The LockProtoName must be one of the supported locking protocols,
currently these are lock_nolock and lock_dlm.
The default lock protocol name is written to disk initially when creating
the filesystem with mkfs.gfs2(8), -p option. It can be changed
on-disk by using the tunegfs2(8) command.
The lockproto mount option should be used only under special
circumstances in which you want to temporarily use a different lock
protocol without changing the on-disk default. Using the incorrect lock
protocol on a cluster filesystem mounted from more than one node will
almost certainly result in filesystem corruption.
-
locktable=LockTableName
- This specifies the identity of the cluster and of the
filesystem for this mount, overriding the default cluster/filesystem
identify stored in the filesystem's on-disk superblock. The
cluster/filesystem name is recognized globally throughout the cluster, and
establishes a unique namespace for the inter-node locking system, enabling
the mounting of multiple GFS2 filesystems.
The format of LockTableName is lock-module-specific. For
lock_dlm, the format is clustername:fsname. For
lock_nolock, the field is ignored.
The default cluster/filesystem name is written to disk initially when
creating the filesystem with mkfs.gfs2(8), -t option. It can be
changed on-disk by using the tunegfs2(8) command.
The locktable mount option should be used only under special
circumstances in which you want to mount the filesystem in a different
cluster, or mount it as a different filesystem name, without changing the
on-disk default.
- localflocks
- This flag tells GFS2 that it is running as a local (not
clustered) filesystem, so it can allow the kernel VFS layer to do all
flock and fcntl file locking. When running in cluster mode, these file
locks require inter-node locks, and require the support of GFS2. When
running locally, better performance is achieved by letting VFS handle the
whole job.
This is turned on automatically by the lock_nolock module.
-
errors=[panic|withdraw]
- Setting errors=panic causes GFS2 to oops when encountering
an error that would otherwise cause the mount to withdraw or print an
assertion warning. The default setting is errors=withdraw. This option
should not be used in a production system. It replaces the earlier
debug option on kernel versions 2.6.31 and above.
- acl
- Enables POSIX Access Control List acl(5) support
within GFS2.
- spectator
- Mount this filesystem using a special form of read-only
mount. The mount does not use one of the filesystem's journals. The node
is unable to recover journals for other nodes.
- norecovery
- A synonym for spectator
- suiddir
- Sets owner of any newly created file or directory to be
that of parent directory, if parent directory has S_ISUID permission
attribute bit set. Sets S_ISUID in any new directory, if its parent
directory's S_ISUID is set. Strips all execution bits on a new file, if
parent directory owner is different from owner of process creating the
file. Set this option only if you know why you are setting it.
-
quota=[off/account/on]
- Turns quotas on or off for a filesystem. Setting the quotas
to be in the "account" state causes the per UID/GID usage
statistics to be correctly maintained by the filesystem, limit and warn
values are ignored. The default value is "off".
- discard
- Causes GFS2 to generate "discard" I/O requests
for blocks which have been freed. These can be used by suitable hardware
to implement thin-provisioning and similar schemes. This feature is
supported in kernel version 2.6.30 and above.
- barrier
- This option, which defaults to on, causes GFS2 to send I/O
barriers when flushing the journal. The option is automatically turned off
if the underlying device does not support I/O barriers. We highly
recommend the use of I/O barriers with GFS2 at all times unless the block
device is designed so that it cannot lose its write cache content (e.g.
its on a UPS, or it doesn't have a write cache)
-
commit=secs
- This is similar to the ext3 commit= option in that
it sets the maximum number of seconds between journal commits if there is
dirty data in the journal. The default is 60 seconds. This option is only
provided in kernel versions 2.6.31 and above.
-
data=[ordered|writeback]
- When data=ordered is set, the user data modified by a
transaction is flushed to the disk before the transaction is committed to
disk. This should prevent the user from seeing uninitialized blocks in a
file after a crash. Data=writeback mode writes the user data to the disk
at any time after it's dirtied. This doesn't provide the same consistency
guarantee as ordered mode, but it should be slightly faster for some
workloads. The default is ordered mode.
- meta
- This option results in selecting the meta filesystem root
rather than the normal filesystem root. This option is normally only used
by the GFS2 utility functions. Altering any file on the GFS2 meta
filesystem may render the filesystem unusable, so only experts in the GFS2
on-disk layout should use this option.
-
quota_quantum=secs
- This sets the number of seconds for which a change in the
quota information may sit on one node before being written to the quota
file. This is the preferred way to set this parameter. The value is an
integer number of seconds greater than zero. The default is 60 seconds.
Shorter settings result in faster updates of the lazy quota information
and less likelihood of someone exceeding their quota. Longer settings make
filesystem operations involving quotas faster and more efficient.
-
statfs_quantum=secs
- Setting statfs_quantum to 0 is the preferred way to set the
slow version of statfs. The default value is 30 secs which sets the
maximum time period before statfs changes will be syned to the master
statfs file. This can be adjusted to allow for faster, less accurate
statfs values or slower more accurate values. When set to 0, statfs will
always report the true values.
-
statfs_percent=value
- This setting provides a bound on the maximum percentage
change in the statfs information on a local basis before it is synced back
to the master statfs file, even if the time period has not expired. If the
setting of statfs_quantum is 0, then this setting is ignored.
- rgrplvb
- This flag tells gfs2 to look for information about a
resource group's free space and unlinked inodes in its glock lock value
block. This keeps gfs2 from having to read in the resource group data from
disk, speeding up allocations in some cases. This option was added in the
3.6 Linux kernel. Prior to this kernel, no information was saved to the
resource group lvb. Note: To safely turn on this option, all nodes
mounting the filesystem must be running at least a 3.6 Linux kernel. If
any nodes had previously mounted the filesystem using older kernels, the
filesystem must be unmounted on all nodes before it can be mounted with
this option enabled. This option does not need to be enabled on all nodes
using a filesystem.
- loccookie
- This flag tells gfs2 to use location based readdir cookies,
instead of its usual filename hash readdir cookies. The filename hash
cookies are not guaranteed to be unique, and as the number of files in a
directory increases, so does the likelihood of a collision. NFS requires
readdir cookies to be unique, which can cause problems with very large
directories (over 100,000 files). With this flag set, gfs2 will try to
give out location based cookies. Since the cookie is 31 bits, gfs2 will
eventually run out of unique cookies, and will fail back to using hash
cookies. The maximum number of files that could have unique location
cookies assuming perfectly even hashing and names of 8 or fewer characters
is 1,073,741,824. An average directory should be able to give out well
over half a billion location based cookies. This option was added in the
4.5 Linux kernel. Prior to this kernel, gfs2 did not add directory entries
in a way that allowed it to use location based readdir cookies.
Note: To safely turn on this option, all nodes mounting the
filesystem must be running at least a 4.5 Linux kernel. If this option is
only enabled on some of the nodes mounting a filesystem, the cookies
returned by nodes using this option will not be valid on nodes that are
not using this option, and vice versa. Finally, when first enabling this
option on a filesystem that had been previously mounted without it, you
must make sure that there are no outstanding cookies being cached by other
software, such as NFS.
GFS2 clustering is driven by the dlm, which depends on dlm_controld to provide
clustering from userspace. dlm_controld clustering is built on corosync
cluster/group membership and messaging. GFS2 also requires clustered lvm which
is provided by lvmlockd or, previously, clvmd. Refer to the documentation for
each of these components and ensure that they are configured before setting up
a GFS2 filesystem. Also refer to your distribution's documentation for any
specific support requirements.
Ensure that gfs2-utils is installed on all nodes which mount the filesystem as
it provides scripts required for correct withdraw event response.
1. Create the gfs2 filesystem
mkfs.gfs2 -p lock_dlm -t cluster_name:fs_name -j num /path/to/storage
The cluster_name must match the name configured in corosync (and thus dlm). The
fs_name must be a unique name for the filesystem in the cluster. The -j option
is the number of journals to create; there must be one for each node that will
mount the filesystem.
2. Mount the gfs2 filesystem
If you are using a clustered resource manager, see its documentation for
enabling a gfs2 filesystem resource. Otherwise, run:
mount /path/to/storage /mountpoint
Run "dlm_tool ls" to verify the nodes that have each fs mounted.
3. Shut down
If you are using a clustered resource manager, see its documentation for
disabling a gfs2 filesystem resource. Otherwise, run:
umount -a -t gfs2
mount(8) and
umount(8) for general mount information,
chmod(1) and
chmod(2) for access permission flags,
acl(5)
for access control lists,
lvm(8) for volume management,
dlm_controld(8),
dlm_tool(8),
dlm.conf(5),
corosync(8),
corosync.conf(5),