We have dealing with a problem where a NVME drive fails every so often. More than it really should. While we are trying to make sense of the hardware issue, we are also looking at the recovery options. Currently we are using Ubuntu 20.04 LTS on XFS with a single NVME disk. If the disk fails the following error is reported. Nov 6, 2022 @ 20:27:12.000 [1095930.104279] nvme nvme0: controller is down; will reset: CSTS=0x3, PCI_STATUS=0x10 Nov 6, 2022 @ 20:27:12.000 [1095930.451711] nvme nvme0: 64/0/0 default/read/poll queues Nov 6, 2022 @ 20:27:12.000 [1095930.453846] blk_update_request: I/O error, dev nvme0n1, sector 34503744 op 0x1:(WRITE) flags 0x800 phys_seg 1 prio class 0 And the system becomes completely unresponsive. I am looking for a solution to stop the system when this happens, so the other nodes in our cluster can carry the work. However since the system is unresponsive and the disk presumably in read-only mode we stuck in a sort of zombie state, where the processes are still running but don't have access to the disk. On EXT3/4 there is an option to take the system down. errors={continue|remount-ro|panic} Define the behavior when an error is encountered. (Either ignore errors and just mark the filesystem erroneous and continue, or remount the filesystem read-only, or panic and halt the system.) The default is set in the filesystem superblock, and can be changed using tune2fs(8). Is there an equivalent for XFS ? I didn't find anything similar on the XFS man page. Also any other suggestions to better handle this ?