Re: 4 out of 16 drives show up as 'removed'

Asdo <asdo@xxxxxxxxxxxxx> · Fri, 09 Dec 2011 17:40:54 +0100

Dear Neil,
this issue went OK for the OP (and thanks for your continuous support), 
however, exactly this situation is my worst nightmare regarding MD RAID.

It seems to me that MD has no mechanism to safeguard the situation of a 
disconnecting cable (holding multiple drives) and I think this could 
cause major puzzlement of the user potentially followed by major data loss.

I think it should be possible in line of principle to implement a 
mechanism that discriminates between cable disconnects (on multiple 
drives) and a failed single drive:

The technique would be:
BEFORE failing a drive with any symptom that *could* be caused by a 
cable disconnect, (maybe wait a couple of seconds and then) perform a 
read and/or a write (not cached, mandatorily from the platters, and sync 
in case of write) from each of the drives of the array. If multiple 
drives which were believed to be working, do not respond to such 
read/write command, then assume a cable to be disconnected and either 
block the array (is there a blocked state like for other linux 
blockdevices? if not it should be implemented) or set it as read-only. 
Or worst case, disassemble the array. But DO NOT proceed failing the 
drive. OTOH if all other drives respond correctly, assume it's not a 
cable problem and, go ahead failing the drive which was supposed to be 
failed.

The current behaviour is not good because MD will start declaring all 
the failed drives onto the metadatas of the good drives, before 
discovering that there are so many failed drives that the array cannot 
be kept running at all.

So you end up with a down array, but which has also an inconsistent 
state (I think writes could have been performed between the first 
discovered failure and the last discovered failure so the array would 
indeed be inconsistent) and also does not cleanly assemble anymore.

Thank you

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html