Re: 4 out of 16 drives show up as 'removed'

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Dear Neil,
this issue went OK for the OP (and thanks for your continuous support), however, exactly this situation is my worst nightmare regarding MD RAID.

It seems to me that MD has no mechanism to safeguard the situation of a disconnecting cable (holding multiple drives) and I think this could cause major puzzlement of the user potentially followed by major data loss.

I think it should be possible in line of principle to implement a mechanism that discriminates between cable disconnects (on multiple drives) and a failed single drive:

The technique would be:
BEFORE failing a drive with any symptom that *could* be caused by a cable disconnect, (maybe wait a couple of seconds and then) perform a read and/or a write (not cached, mandatorily from the platters, and sync in case of write) from each of the drives of the array. If multiple drives which were believed to be working, do not respond to such read/write command, then assume a cable to be disconnected and either block the array (is there a blocked state like for other linux blockdevices? if not it should be implemented) or set it as read-only. Or worst case, disassemble the array. But DO NOT proceed failing the drive. OTOH if all other drives respond correctly, assume it's not a cable problem and, go ahead failing the drive which was supposed to be failed.

The current behaviour is not good because MD will start declaring all the failed drives onto the metadatas of the good drives, before discovering that there are so many failed drives that the array cannot be kept running at all.

So you end up with a down array, but which has also an inconsistent state (I think writes could have been performed between the first discovered failure and the last discovered failure so the array would indeed be inconsistent) and also does not cleanly assemble anymore.

Thank you

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux