Re: Likely forced assemby with wrong disk during raid5 grow. Recoverable?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Feb 21, 2011 at 01:53, NeilBrown <neilb@xxxxxxx> wrote:
>
> When I say "Newer versions" I mean of mdadm, not the kernel.
>
> What does
> Â mdadm -V
>
> show? ÂVersion 3.0 or later gives less confusing output for "mdadm --examine"
> on 1.x metadata.

mdadm - v2.6.7.1 - 15th October 2008
so yes the ubuntu mdadm seems to be a very old version indeed

> Yes, it probably is possible to re-assemble the array to include sdd1 and not
> have a degraded array, and still have all your data safe - providing you are
> sure that nothing at all changed on the array (e.g. maybe it was unmounted?).
>
> I'm not sure I'd recommend it though.... ÂI cannot see anything that would go
> wrong, but it is somewhat unknown territory.
> Up to you...
>
> If you:
>
> % git clone git://neil.brown.name/mdadm master
> % cd mdadm
> % make
> % sudo bash
> # ./mdadm -S /dev/md2
> # ./mdadm -Afvv /dev/md2 /dev/sda1 /dev/md0 /dev/md1 /dev/sdc1
>
> It should restart your array - degraded - and repeat the last stages of
> reshape just in case.
>
> Alternately, before you run 'make' you could edit Assemble.c, find:
> Â Â Â Âwhile (force && !enough(content->array.level, content->array.raid_disks,
> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Âcontent->array.layout, 1,
> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Âavail, okcnt)) {
>
> around line 818, and change the '1,' to '0,', then run make, mdadm -S, and
> then
> # ./mdadm -Afvv /dev/md2 /dev/sda1 /dev/md0 /dev/md1 /dev/sdc1 /dev/sdd1
>
> it should assemble the array non-degraded and repeat all of the reshape since
> sdd1 fell out of the array.
>
> As you have a backup, this is probably safe because even if to goes bad you
> can restore from backups - not that I expect it to go bad but ....

I tried to recreate the scenario so i could test both versions first
but i just could not recreate this step (resp. it's result (different
reshape posn on the last 3+1 drives)) :

bernstein@server:~$ sudo mdadm --assemble --run /dev/md2 /dev/md0
/dev/sda1 /dev/sdc1 /dev/sdd1
mdadm: Could not open /dev/sda1 for write - cannot Assemble array.
mdadm: Failed to restore critical section for reshape, sorry.

which i think lead to the inconsistent state. all i got was :

$ sudo mdadm --create /dev/md4 --level raid5 --metadata=1.2
--raid-devices=4 /dev/sde[5678]
$ sudo mkfs.ext4 /dev/md4
$ sudo mdadm --add /dev/md4 /dev/sde9
$ sudo mdadm --grow --raid-devices 5 /dev/md4
$ sudo mdadm /dev/md4 --fail /dev/sde9
$ sudo umount /dev/md4 && sudo mdadm -S /dev/md4
$ sudo reboot
$ sudo mdadm -S /dev/md4
$ sudo mdadm --assemble --run /dev/md4 /dev/sde[6789]
mdadm: failed to RUN_ARRAY /dev/md4: Input/output error
mdadm: Not enough devices to start the array.
$ sudo mdadm --examine /dev/sde[56789]
/dev/sde5:
 Reshape pos'n : 126720 (123.77 MiB 129.76 MB)
 Delta Devices : 1 (4->5)
ÂÂÂ Update Time : Tue Feb 22 23:52:56 2011
ÂÂÂ Array Slot : 0 (0, 1, 2, failed, failed, failed)
ÂÂ Array State : Uuu__ 3 failed
/dev/sde6:
 Reshape pos'n : 126720 (123.77 MiB 129.76 MB)
 Delta Devices : 1 (4->5)
ÂÂÂ Update Time : Tue Feb 22 23:52:56 2011
ÂÂÂ Array Slot : 1 (0, 1, 2, failed, failed, failed)
ÂÂ Array State : uUu__ 3 failed
/dev/sde7:
 Reshape pos'n : 126720 (123.77 MiB 129.76 MB)
 Delta Devices : 1 (4->5)
ÂÂÂ Update Time : Tue Feb 22 23:52:56 2011
ÂÂÂ Array Slot : 2 (0, 1, 2, failed, failed, failed)
ÂÂ Array State : uuU__ 3 failed
/dev/sde8:
 Reshape pos'n : 126720 (123.77 MiB 129.76 MB)
 Delta Devices : 1 (4->5)
ÂÂÂ Update Time : Tue Feb 22 23:52:15 2011
ÂÂÂ Array Slot : 4 (0, 1, 2, failed, 3, failed)
ÂÂ Array State : uuuU_ 2 failed
/dev/sde9:
 Reshape pos'n : 54016 (52.76 MiB 55.31 MB)
 Delta Devices : 1 (4->5)
ÂÂÂ Update Time : Tue Feb 22 23:52:11 2011
ÂÂÂ Array Slot : 5 (0, 1, 2, failed, 3, 4)
ÂÂ Array State : uuuuU 1 failed

which got instantly correctly reshaped by the freshly compiled
version. without any more real testing, i chose the safer way and went
ahead on the real array :

bernstein@server:~/mdadm$ sudo ./mdadm -Afvv /dev/md2 /dev/sda1
/dev/md0 /dev/md1 /dev/sdc1
mdadm: looking for devices for /dev/md2
mdadm: /dev/sda1 is identified as a member of /dev/md2, slot 4.
mdadm: /dev/md0 is identified as a member of /dev/md2, slot 3.
mdadm: /dev/md1 is identified as a member of /dev/md2, slot 2.
mdadm: /dev/sdc1 is identified as a member of /dev/md2, slot 0.
mdadm: forcing event count in /dev/md1(2) from 133603 upto 133609
mdadm: Cannot open /dev/sdc1: Device or resource busy
bernstein@server:~/mdadm$ cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
[raid4] [raid10]
md2 : active raid5 md1[3] md0[4] sda1[5] sdc1[0]
ÂÂÂÂÂ 2930281920 blocks super 1.2 level 5, 64k chunk, algorithm 2 [5/4] [U_UUU]
ÂÂÂÂÂ [==>..................]Â reshape = 12.8% (125839952/976760640)
finish=825.1min speed=17186K/sec

md1 : active raid0 sdg1[1] sdf1[0]
ÂÂÂÂÂ 976770944 blocks super 1.2 64k chunks

md0 : active raid0 sdh1[0] sdb1[1]
ÂÂÂÂÂ 976770944 blocks super 1.2 64k chunks

unused devices: <none>

reshape is in progress and is looking good to complete overnight.
although i am a little scared about the "mdadm: forcing event count in
/dev/md1(2) from 133603 upto 133609" and the "device busy" line. is
this the way it's supposed to be? i assumed that when it's repeating
all the reshape it would be like : forcing event count in /dev/sda1,
md0, sdc1 from 133609 downto 133603...

this i not strictly a raid/mdadm question, but do you know a simple
way to ckeck everything went ok? i think that an e2fsck (ext4 fs) and
checksumming some random files located behind the interruption point
should verify all went ok. plus just to be sure i'd like to check
files located at the interruption point. is the offset to the
interruption point into the md device simply the reshape pos'n (e.g.
502815488K) ?

> All part of the service... :-)

Well then, great service!
Thanks a lot.

Claude
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux