Re: Spare drive won't spin down

Neil Brown <neilb@xxxxxxx> · Wed, 12 May 2010 06:53:18 +1000

On Mon, 10 May 2010 12:59:33 -0400
Bill Davidsen <davidsen@xxxxxxx> wrote:

> Neil Brown wrote:
> > On Fri, 7 May 2010 00:40:40 -0700 (PDT)
> > Joe Bryant <tenminjoe@xxxxxxxxx> wrote:
> >
> >   
> >>> I'll see about fixing that.
> >>>
> >>> Recreate the array with "--metadata 1.0" or "--metadata 1.2" and it should work better.
> >>>       
> >> That's great, thanks very much.
> >>
> >>     
> >
> > It turns out it is a bit more subtle than that, though that approach may work
> > for you.
> > If you make an odd number of changes to the metadata, it will switch from
> > doing what you want, to not.
> > e.g. if /dev/foo is your spare device, then
> >
> >   mdadm /dev/md0 -f /dev/foo
> >   mdadm /dev/md0 -r /dev/foo
> >   mdadm /dev/md0 -a /dev/foo
> >
> > will switch between working and not working.  v0.90 metadata starts out not
> > working.  v1.x start out working.
> >   
> 
> So we can assume that the little dance steps above will make 1.x 
> misbehave in the same way?

Yes.

> 
> Could you explain (point to an explanation) why this whole odd/even 
> thing exists?
> 

Maybe ....

For backwards compatibility, the event counts in all the devices in an array
must not differ by more than 1.  And if the information in the superblocks is
different, then the event counts must be different to ensure that the current
information is used when the array is restarted.

Consequently, if the event counts are uniform across an array it is safe to
just mark the superblocks on active drives as 'dirty' leaving spare drives
alone.
To then mark the array as 'clean' again we would need to either update the
metadata on the spares (which we don't want to do) or decrease the event
count on the active devices.
However there are cases where decreasing the event count on active devices is
not safe.
If the array was dirty and we failed a device that would update the event
count everywhere but on the failed device.
When we then want to mark the array as 'clean' it is *not* safe to decrement
the event count as then the failed drive could look like it is still a valid
member of the array.

I had the idea that I could encode this extra information in the odd/even
status of the event count.  However it seems that now I explain it out loud
it doesn't actually make a lot of sense.

I should keep the "it is safe to decrement the event count" state in some
internal state variable and assume it is 'false' when an array is started.
That would be heaps cleaner and would actually do the right thing.

Theoretically, when the spares are one behind the active array and we need to
update them all, we should update the spares first, then the rest.  If we
don't and there is a crash at the wrong time, some spares could be 2 events
behind the most recent device.  However that is a fairly unlikely race to
lose and the cost is only having a spare device fall out of the array, which
is quite easy to put back it, that I might not worry to much about it.

So if you haven't seen a patch to fix this in a week or two, please remind me.

Thanks,
NeilBrown
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html