Re: Likely forced assemby with wrong disk during raid5 grow. Recoverable?

NeilBrown <neilb@xxxxxxx> · Mon, 21 Feb 2011 11:53:03 +1100

On Sun, 20 Feb 2011 15:44:35 +0100 Claude Nobs <claudenobs@xxxxxxxxx> wrote:

> > They are the 'Number' column in the --detail output below.  This is /dev/md1
> > - I can tell from the --examine outputs, but it is a bit confusing.  Newer
> > versions of mdadm make this a little less confusing.  If you look for
> > patterns of U and u  in the 'Array State' line, the U is 'this device', the
> > 'u' is some other devices.
> 
> Actually this is running a stock Ubunutu 10.10 server kernel. But as
> it is from my memory it could very well have been :
> 
>        2930281920 blocks super 1.2 level 5, 64k chunk, algorithm 2 [4/5] [U_UUU]
> 

I'm quite sure it would have been '[U_UUU]' as you say.

When I say "Newer versions" I mean of mdadm, not the kernel.

What does
   mdadm -V

show?  Version 3.0 or later gives less confusing output for "mdadm --examine"
on 1.x metadata.

> > Just to go through some of the numbers...
> >
> > Chunk size is 64K.  Reshape was 4->5, so 3 -> 4 data disks.
> > So old stripes have 192K, new stripes have 256K.
> >
> > The 'good' disks think reshape has reached 502815488K which is
> > 1964123 new stripes. (2618830.66 old stripes)
> > md1 thinks reshape has only reached 489510400K which is 1912150
> > new stripes (2549533.33 old stripes).
> 
> i think you mixed up sdd1 with md1 here? (the numbers above for md1
> are for sdd1. md1 would be :  reshape has reached 502809856K which
> would be 1964101 new stripes. so the difference between the good disks
> and md1 would be 22 stripes.)

Yes, I got them mixed up.  But the net result is the same - the 'new' stripes
numbers haven't got close to overwriting the 'old' stripe numbers.

> 
> >
> > So of the 51973 stripes that have been reshaped since the last metadata
> > update on sdd1, some will have been done on sdd1, but some not, and we don't
> > really know how many.  But it is perfectly safe to repeat those stripes
> > as all writes to that region will have been suspended (and you probably
> > weren't writing anyway).
> 
> jep there was nothing writing to the array. so now i am a little
> confused, if you meant sdd1 (which failed first is 51973 stripes
> behind) this would imply that at least so many stripes of data are
> kept of the old (3 data disks) configuration as well as the new one?
> if continuing from there is possible then the array would no longer be
> degraded right? so i think you meant md1 (22 stripes behind), as
> keeping 5.5M of data from the old and new config seems more
> reasonable. however this is just a guess :-)

Yes, it probably is possible to re-assemble the array to include sdd1 and not
have a degraded array, and still have all your data safe - providing you are
sure that nothing at all changed on the array (e.g. maybe it was unmounted?).

I'm not sure I'd recommend it though....  I cannot see anything that would go
wrong, but it is somewhat unknown territory.
Up to you...

If you:

% git clone git://neil.brown.name/mdadm master
% cd mdadm
% make
% sudo bash
# ./mdadm -S /dev/md2
# ./mdadm -Afvv /dev/md2 /dev/sda1 /dev/md0 /dev/md1 /dev/sdc1

It should restart your array - degraded - and repeat the last stages of
reshape just in case.

Alternately, before you run 'make' you could edit Assemble.c, find:
	while (force && !enough(content->array.level, content->array.raid_disks,
				content->array.layout, 1,
				avail, okcnt)) {

around line 818, and change the '1,' to '0,', then run make, mdadm -S, and
then
# ./mdadm -Afvv /dev/md2 /dev/sda1 /dev/md0 /dev/md1 /dev/sdc1 /dev/sdd1

it should assemble the array non-degraded and repeat all of the reshape since
sdd1 fell out of the array.

As you have a backup, this is probably safe because even if to goes bad you
can restore from backups - not that I expect it to go bad but ....

> >
> > Thanks for the excellent problem report.
> >
> > NeilBrown
> 
> Well i thank you for providing such an elaborate and friendly answer!
> this is actually my first mailing list post and considering how many
> questions get ignored (don't know about this list though) i just hoped
> someone would at least answer with a one liner... i never expected
> this. so thanks again.

All part of the service... :-)

NeilBrown

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html