On Wed, 21 Nov 2012 08:17:57 -0800 Ross Boylan <ross@xxxxxxxxxxxxxxxx> wrote: > After I failed and removed a partition, mdadm --examine seems to show > that partition is fine. Correct. When a device fails it is assumed that it has failed and probably cannot be written to. So no attempt is made to write to it, so it will look unchanged to --examine. All the other devices in the array will record the fact that that device is now faulty, and their event counts are increased so their idea of the status of the various devices will take priority over the info stored on the faulty device - should it still be readable. > > Perhaps related to this, I failed a partition and when I rebooted it > came up as the sole member of its RAID array. This is a bug which is fixed in my mdadm development tree which will eventually become mdadm-3.3. Does the other decice get assembled into a different array, so you end up with two arrays (split brain)? What can happen is "mdadm --incremental /dev/whatever" is call on each device and that results in the correct array (with non-failed device) being assembled. Then "mdadm -As" gets run and it sees the failed device and doesn't notice the other array, so it assembles the failed device into an array of its own. The fix causes "mdadm -As" to notice the arrays that "mdadm --incremental" has created. > > Is this behavior expected? Is there a way to make the failures more > convincing? mdadm --zero /dev/whatever after failing and removing the device. Or unplug it and put in an acid bath - that makes failure pretty convincing. NeilBrown
Attachment:
signature.asc
Description: PGP signature