Dear Neil,
this issue went OK for the OP (and thanks for your continuous support),
however, exactly this situation is my worst nightmare regarding MD RAID.
It seems to me that MD has no mechanism to safeguard the situation of a
disconnecting cable (holding multiple drives) and I think this could
cause major puzzlement of the user potentially followed by major data loss.
I think it should be possible in line of principle to implement a
mechanism that discriminates between cable disconnects (on multiple
drives) and a failed single drive:
The technique would be:
BEFORE failing a drive with any symptom that *could* be caused by a
cable disconnect, (maybe wait a couple of seconds and then) perform a
read and/or a write (not cached, mandatorily from the platters, and sync
in case of write) from each of the drives of the array. If multiple
drives which were believed to be working, do not respond to such
read/write command, then assume a cable to be disconnected and either
block the array (is there a blocked state like for other linux
blockdevices? if not it should be implemented) or set it as read-only.
Or worst case, disassemble the array. But DO NOT proceed failing the
drive. OTOH if all other drives respond correctly, assume it's not a
cable problem and, go ahead failing the drive which was supposed to be
failed.
The current behaviour is not good because MD will start declaring all
the failed drives onto the metadatas of the good drives, before
discovering that there are so many failed drives that the array cannot
be kept running at all.
So you end up with a down array, but which has also an inconsistent
state (I think writes could have been performed between the first
discovered failure and the last discovered failure so the array would
indeed be inconsistent) and also does not cleanly assemble anymore.
Thank you
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html