Re: [mdadm git pull] support for removed disks / imsm updates

Dan Williams <dan.j.williams@xxxxxxxxx> · Wed, 4 Mar 2009 16:59:27 -0700

On Wed, Mar 4, 2009 at 3:41 PM, Neil Brown <neilb@xxxxxxx> wrote:
> On Friday February 27, dan.j.williams@xxxxxxxxx wrote:
>> 2/ Support for handling removed disks as currently all container
>> manipulations fail once a live disk is hot-unplugged.
>
> So this is when md thinks the device is in the array, but the device
> has actually been removed so with the block/dev file is missing or
> empty, or the status is not 'online'..
>
> But we only check for that if mdmon is running.  For some reason that
> seems odd, but I'm not really sure.
> Why do we want to treat this case differently depending on whether
> mdmon is running or not?

The thinking, dubious or otherwise, is that if mdmon is not running
then the administrator is in charge of managing the container, and
would want to know about these errors.  I could not convince myself
that we *always* wanted to ignore missing disks here... so I erred
conservative.

However, we have already found another location where SKIP_GONE_DEVS
is needed, so part of me wonders about just making it the default?

>> 3/ An initial mdmon man page
>> 4/ imsm auto layout support
>> 5/ Updates to --incremental in pursuit of assembling external metadata
>> arrays in the initramfs via udev events
>
> Thanks.
>
> Most look good.
> My attention was caught by Create: wait_for container creation.
>
> I vaguely remember trying that and it didn't work.  Something about
> the md array not being in the right sort of state for udev to create a
> device, or something...  But I expect you have tested it so maybe I'm
> remembering something else.

It corrected a test script failure here fwiw, but will keep an eye out
for container creation deadlocks.

>>
>> The one "fix" that is missing from this update is to teach mdmon to kick
>> "non-fresh" drives similar to what the kernel does at initial assembly.
>> I dropped the attempt after realizing I would need to take an O_EXCL
>> open on the container in an awkward place.  I guess it is not necessary,
>> but it is a quirk of containers that known failed drives can be allowed
>> back into the container.
>
> I always thought it was a slightly odd quirk that if you had an array
> with failed drives, then stopped and restarted the array, those failed
> drives would no longer be there.
> My feeling is that it doesn't matter a great deal one way or the
> other.  The important thing is that when mdadm describes the state of
> an array, it describes it in a way that doesn't confuse people (an
> area in which v1.x metadata lets us down at the moment).

Ok, that clarifies things...

[..]
> For now, all these patches have been pulled and pushed to neil.brown.name/mdadm

Thanks!

--
Dan
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html