NAME
CURRENT SUPPORT
DESCRIPTION
Dependencies
In order to snapshot the VRAM and other
GPU device states, we require an updated version of amdkfd(amdgpu)
driver. The kernel patches are under review currently.
criu 3.16
This work is rebased on latest criu release
available at this time.
OPTIONS
Enable or disable firmware version check. If
enabled, firmware version on restored gpu needs to be greater than or equal
firmware version on checkpointed GPU. Default:Enabled
KFD_SDMA_FW_VER_CHECK
E.g: KFD_FW_VER_CHECK=0
Enable or disable SDMA firmware version check.
If enabled, SDMA firmware version on restored gpu needs to be greater than or
equal firmware version on checkpointed GPU. Default:Enabled
KFD_CACHES_COUNT_CHECK
E.g: KFD_SDMA_FW_VER_CHECK=0
Enable or disable caches count check. If
enabled, the caches count on restored GPU needs to be greater than or equal
caches count on checkpointed GPU. Default:Enabled
KFD_NUM_GWS_CHECK
E.g: KFD_CACHES_COUNT_CHECK=0
Enable or disable num_gws check. If enabled,
the num_gws on restored GPU needs to be greater than or equal num_gws on
checkpointed GPU. Default:Enabled
KFD_VRAM_SIZE_CHECK
E.g: KFD_NUM_GWS_CHECK=0
Enable or disable VRAM size check. If enabled,
the VRAM size on restored GPU needs to be greater than or equal VRAM size on
checkpointed GPU. Default:Enabled
KFD_NUMA_CHECK
E.g: KFD_VRAM_SIZE_CHECK=0
Enable or disable NUMA CPU region check. If
enabled, the plugin will restore GPUs that belong to one CPU NUMA region to
the same CPU NUMA region. Default:Enabled
KFD_CAPABILITY_CHECK
E.g: KFD_NUMA_CHECK=1
Enable or disable capability check. If
enabled, the capability on restored GPU needs to be equal to the capability on
the checkpointed GPU. Default:Enabled
E.g: KFD_CAPABILITY_CHECK=1
AUTHOR
COPYRIGHT
12/20/2022 |