Re: Bugreport ddf rebuild problems

NeilBrown <neilb@xxxxxxx> · Tue, 6 Aug 2013 10:16:33 +1000

On Mon, 05 Aug 2013 23:24:28 +0200 Martin Wilck <mwilck@xxxxxxxx> wrote:

> Hi Albert, Neil,
> 
> I just submitted a new patch series; patch 3/5 integrates your 2nd case
> as a new unit test and 4/5 should fix it.
> 
> However @Neil: I am not yet entirely happy with this solution. AFAICS
> there is a possible race condition here, if a disk fails and mdadm -CR
> is called to create a new array before the metadata reflecting the
> failure is written to disk. If a disk failure happens in one array,
> mdmon will call reconcile_failed() to propagate the failure to other
> already known arrays in the same container, by writing "faulty" to the
> sysfs state attribute. It can't do that for a new container though.
> 
> I thought that process_update() may need to check the kernel state of
> array members against meta data state when a new VD configuration record
> is received, but that's impossible because we can't call open() on the
> respective sysfs files. It could be done in prepare_update(), but that
> would require major changes, I wanted to ask you first.
> 
> Another option would be changing manage_new(). But we don't seem to have
> a suitable metadata handler method to pass the meta data state to the
> manager....
> 
> Ideas?

Thanks for the patches - I applied them all.

Is there a race here?  When "mdadm -C" looks at the metadata the device will
either be an active member of another array, or it will be marked faulty.
Either way mdadm won't use it.

If the first array was created to use only (say) half of each device and the
second array was created with a size to fit in the other half of the device
then it might get interesting.
"mdadm -C" might see that everything looks good, create the array using the
second half of that drive that has just failed, and give that info to mdmon.

I suspect that ddf_open_new (which currently looks like it is just a stub)
needs to help out here.
When manage_new() gets told about a new array it will collect relevant info
from sysfs and call ->open_new() to make sure it matches the metadata.
ddf_open_new should check that all the devices in the array are recorded as
working in the metadata.  If any are failed, it can write 'faulty' to the
relevant state_fd.

Possibly the same thing can be done generically in manage_new() as you
suggested.  After the new array has been passed over to the monitor thread,
manage_new() could check if any devices should be failed much like
reconcile_failed() does and just fail them.

Does that make any sense?  Did I miss something?

Thanks,
NeilBrown
Attachment:
signature.asc

Description: PGP signature