Re: Misbehavior of md-raid RAID on failed NVMe.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



09.06.2022 1:48, Roger Heflin пишет:
You might want to see if specific disk devices are getting reset/rebooted, the more often they are getting reset/rebooted the higher chance of data loss. The vendor's solution in the case I know about was to treat unrequested device resets/reboots as a failing device, and disable and replace it.
How to detect these resets/reboots?  Is there is a "counter" in kernel or in NVMe itself?

I don't know if this is what is causing your issue or not, but it is a possible issue, and an issue that is hard to write code to handle.

We see log messages explicitly reporting an I/O error and data not being written:

[Tue Jun  7 09:58:45 2022] I/O error, dev nvme0n1, sector 538918912 op 0x1:(WRITE) flags 0x0 phys_seg 1 prio class 0 [Tue Jun  7 09:58:45 2022] I/O error, dev nvme0n1, sector 538988816 op 0x1:(WRITE) flags 0x0 phys_seg 1 prio class 0 [Tue Jun  7 09:58:48 2022] I/O error, dev nvme0n1, sector 126839568 op 0x1:(WRITE) flags 0x0 phys_seg 1 prio class 0 [Tue Jun  7 09:58:48 2022] I/O error, dev nvme0n1, sector 126888224 op 0x1:(WRITE) flags 0x0 phys_seg 1 prio class 0 [Tue Jun  7 09:58:48 2022] I/O error, dev nvme0n1, sector 126894288 op 0x1:(WRITE) flags 0x0 phys_seg 1 prio class 0

I think that is enough reason to mark array member as failed as it has inconsistend data now.




[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux