Likely forced assemby with wrong disk during raid5 grow. Recoverable?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi All,

I was wondering if someone might be willing to share if this array is
recoverable.

I had a clean, running raid5 using 4 block devices (two of those were
2 disk raid0 md devices) in RAID 5. Last night I decided it was safe
to grow the array by one disk. But then a) a disk failed, b) a power
loss occured, c) i probably switched the wrong disk and forced
assembly, resulting in an inconsistent state. Here is a complete set
of actions taken :

> bernstein@server:~$ sudo mdadm --grow --raid-devices=5 --backup-file=/raid.grow.backupfile /dev/md2
> mdadm: Need to backup 768K of critical section..
> mdadm: ... critical section passed.
> bernstein@server:~$ cat /proc/mdstat
> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
> md1 : active raid0 sdg1[1] sdf1[0]
> ÂÂÂÂÂ 976770944 blocks super 1.2 64k chunks
>
> md2 : active raid5 sda1[5] md0[4] md1[3] sdd1[1] sdc1[0]
> ÂÂÂÂÂ 2930281920 blocks super 1.2 level 5, 64k chunk, algorithm 2 [5/5] [UUUUU]
> ÂÂÂÂÂ [>....................]Â reshape =Â 1.6% (16423164/976760640) finish=902.2min speed=17739K/sec
>
> md0 : active raid0 sdh1[0] sdb1[1]
> ÂÂÂÂÂ 976770944 blocks super 1.2 64k chunks
>
> unused devices: <none>


now i thought /dev/sdg1 failed. unfortunately i have no log for this
one, just my memory of seeing this changed to the one above :

> ÂÂÂÂÂ 2930281920 blocks super 1.2 level 5, 64k chunk, algorithm 2 [4/5] [UU_UU]

some 10 minutes later a power loss occurred, thanks to an ups the
server shut down as with 'shutdown -h now'. now i exchanged /dev/sdg1,
rebooted and in a lapse of judgement forced assembly:

> bernstein@server:~$ sudo mdadm --assemble --run /dev/md2 /dev/md0 /dev/sda1 /dev/sdc1 /dev/sdd1
> mdadm: Could not open /dev/sda1 for write - cannot Assemble array.
> mdadm: Failed to restore critical section for reshape, sorry.
>
> bernstein@server:~$ sudo mdadm --detail /dev/md2
> /dev/md2:
> ÂÂÂÂÂÂÂ Version : 01.02
> Â Creation Time : Sat Jan 22 00:15:43 2011
> ÂÂÂÂ Raid Level : raid5
> Â Used Dev Size : 976760640 (931.51 GiB 1000.20 GB)
> ÂÂ Raid Devices : 5
> Â Total Devices : 3
> Preferred Minor : 3
> ÂÂÂ Persistence : Superblock is persistent
>
> ÂÂÂ Update Time : Sat Feb 19 22:32:04 2011
> ÂÂÂÂÂÂÂÂÂ State : active, degraded, Not Started
> ÂActive Devices : 3
> Working Devices : 3
> ÂFailed Devices : 0
> Â Spare Devices : 0
>
> ÂÂÂÂÂÂÂÂ Layout : left-symmetric
> ÂÂÂÂ Chunk Size : 64K
>
> Â Delta Devices : 1, (4->5)
>
> ÂÂÂÂÂÂÂÂÂÂ Name : master:public
> ÂÂÂÂÂÂÂÂÂÂ UUID : c3b6db19:b61c3ba9:0a74b12b:3041a523
> ÂÂÂÂÂÂÂÂ Events : 133609
>
> ÂÂÂ NumberÂÂ MajorÂÂ MinorÂÂ RaidDevice State
> ÂÂÂÂÂÂ 0ÂÂÂÂÂÂ 8ÂÂÂÂÂÂ 33ÂÂÂÂÂÂÂ 0ÂÂÂÂÂ active syncÂÂ /dev/sdc1
> ÂÂÂÂÂÂ 1ÂÂÂÂÂÂ 0ÂÂÂÂÂÂÂ 0ÂÂÂÂÂÂÂ 1ÂÂÂÂÂ removed
> ÂÂÂÂÂÂ 2ÂÂÂÂÂÂ 0ÂÂÂÂÂÂÂ 0ÂÂÂÂÂÂÂ 2ÂÂÂÂÂ removed
> ÂÂÂÂÂÂ 4ÂÂÂÂÂÂ 9ÂÂÂÂÂÂÂ 0ÂÂÂÂÂÂÂ 3ÂÂÂÂÂ active syncÂÂ /dev/block/9:0
> ÂÂÂÂÂÂ 5ÂÂÂÂÂÂ 8ÂÂÂÂÂÂÂ 1ÂÂÂÂÂÂÂ 4ÂÂÂÂÂ active syncÂÂ /dev/sda1

so i reattached the old disk, got /dev/md1 back and did the
investigation i should have done before :

> bernstein@server:~$ sudo mdadm --examine /dev/sdd1
> /dev/sdd1:
> ÂÂÂÂÂÂÂÂÂ Magic : a92b4efc
> ÂÂÂÂÂÂÂ Version : 1.2
> ÂÂÂ Feature Map : 0x4
> ÂÂÂÂ Array UUID : c3b6db19:b61c3ba9:0a74b12b:3041a523
> ÂÂÂÂÂÂÂÂÂÂ Name : master:public
> Â Creation Time : Sat Jan 22 00:15:43 2011
> ÂÂÂÂ Raid Level : raid5
> ÂÂ Raid Devices : 5
>
> ÂAvail Dev Size : 1953521392 (931.51 GiB 1000.20 GB)
> ÂÂÂÂ Array Size : 7814085120 (3726.05 GiB 4000.81 GB)
> Â Used Dev Size : 1953521280 (931.51 GiB 1000.20 GB)
> ÂÂÂ Data Offset : 272 sectors
> ÂÂ Super Offset : 8 sectors
> ÂÂÂÂÂÂÂÂÂ State : clean
> ÂÂÂ Device UUID : 5e37fc7c:50ff3b50:de3755a1:6bdbebc6
>
> Â Reshape pos'n : 489510400 (466.83 GiB 501.26 GB)
> Â Delta Devices : 1 (4->5)
>
> ÂÂÂ Update Time : Sat Feb 19 22:23:09 2011
> ÂÂÂÂÂÂ Checksum : fd0c1794 - correct
> ÂÂÂÂÂÂÂÂ Events : 133567
>
> ÂÂÂÂÂÂÂÂ Layout : left-symmetric
> ÂÂÂÂ Chunk Size : 64K
>
> ÂÂÂ Array Slot : 1 (0, 1, failed, 2, 3, 4)
> ÂÂ Array State : uUuuu 1 failed
> bernstein@server:~$ sudo mdadm --examine /dev/sda1
> /dev/sda1:
> ÂÂÂÂÂÂÂÂÂ Magic : a92b4efc
> ÂÂÂÂÂÂÂ Version : 1.2
> ÂÂÂ Feature Map : 0x4
> ÂÂÂÂ Array UUID : c3b6db19:b61c3ba9:0a74b12b:3041a523
> ÂÂÂÂÂÂÂÂÂÂ Name : master:public
> Â Creation Time : Sat Jan 22 00:15:43 2011
> ÂÂÂÂ Raid Level : raid5
> ÂÂ Raid Devices : 5
>
> ÂAvail Dev Size : 1953521392 (931.51 GiB 1000.20 GB)
> ÂÂÂÂ Array Size : 7814085120 (3726.05 GiB 4000.81 GB)
> Â Used Dev Size : 1953521280 (931.51 GiB 1000.20 GB)
> ÂÂÂ Data Offset : 272 sectors
> ÂÂ Super Offset : 8 sectors
> ÂÂÂÂÂÂÂÂÂ State : clean
> ÂÂÂ Device UUID : baebd175:e4128e4c:f768b60f:4df18f77
>
> Â Reshape pos'n : 502815488 (479.52 GiB 514.88 GB)
> Â Delta Devices : 1 (4->5)
>
> ÂÂÂ Update Time : Sat Feb 19 22:32:04 2011
> ÂÂÂÂÂÂ Checksum : 12c832c6 - correct
> ÂÂÂÂÂÂÂÂ Events : 133609
>
> ÂÂÂÂÂÂÂÂ Layout : left-symmetric
> ÂÂÂÂ Chunk Size : 64K
>
> ÂÂÂ Array Slot : 5 (0, failed, failed, failed, 3, 4)
> ÂÂ Array State : u__uU 3 failed
> bernstein@server:~$ sudo mdadm --examine /dev/sdc1
> /dev/sdc1:
> ÂÂÂÂÂÂÂÂÂ Magic : a92b4efc
> ÂÂÂÂÂÂÂ Version : 1.2
> ÂÂÂ Feature Map : 0x4
> ÂÂÂÂ Array UUID : c3b6db19:b61c3ba9:0a74b12b:3041a523
> ÂÂÂÂÂÂÂÂÂÂ Name : master:public
> Â Creation Time : Sat Jan 22 00:15:43 2011
> ÂÂÂÂ Raid Level : raid5
> ÂÂ Raid Devices : 5
>
> ÂAvail Dev Size : 1953521392 (931.51 GiB 1000.20 GB)
> ÂÂÂÂ Array Size : 7814085120 (3726.05 GiB 4000.81 GB)
> Â Used Dev Size : 1953521280 (931.51 GiB 1000.20 GB)
> ÂÂÂ Data Offset : 272 sectors
> ÂÂ Super Offset : 8 sectors
> ÂÂÂÂÂÂÂÂÂ State : clean
> ÂÂÂ Device UUID : 82f5284a:2bffb837:19d366ab:ef2e3d94
>
> Â Reshape pos'n : 502815488 (479.52 GiB 514.88 GB)
> Â Delta Devices : 1 (4->5)
>
> ÂÂÂ Update Time : Sat Feb 19 22:32:04 2011
> ÂÂÂÂÂÂ Checksum : 8aa7d094 - correct
> ÂÂÂÂÂÂÂÂ Events : 133609
>
> ÂÂÂÂÂÂÂÂ Layout : left-symmetric
> ÂÂÂÂ Chunk Size : 64K
>
> ÂÂÂ Array Slot : 0 (0, failed, failed, failed, 3, 4)
> ÂÂ Array State : U__uu 3 failed
> bernstein@server:~$ sudo mdadm --examine /dev/md0
> /dev/md0:
> ÂÂÂÂÂÂÂÂÂ Magic : a92b4efc
> ÂÂÂÂÂÂÂ Version : 1.2
> ÂÂÂ Feature Map : 0x4
> ÂÂÂÂ Array UUID : c3b6db19:b61c3ba9:0a74b12b:3041a523
> ÂÂÂÂÂÂÂÂÂÂ Name : master:public
> Â Creation Time : Sat Jan 22 00:15:43 2011
> ÂÂÂÂ Raid Level : raid5
> ÂÂ Raid Devices : 5
>
> ÂAvail Dev Size : 1953541616 (931.52 GiB 1000.21 GB)
> ÂÂÂÂ Array Size : 7814085120 (3726.05 GiB 4000.81 GB)
> Â Used Dev Size : 1953521280 (931.51 GiB 1000.20 GB)
> ÂÂÂ Data Offset : 272 sectors
> ÂÂ Super Offset : 8 sectors
> ÂÂÂÂÂÂÂÂÂ State : clean
> ÂÂÂ Device UUID : 83ecd60d:f3947a5e:a69c4353:3c4a0893
>
> Â Reshape pos'n : 502815488 (479.52 GiB 514.88 GB)
> Â Delta Devices : 1 (4->5)
>
> ÂÂÂ Update Time : Sat Feb 19 22:32:04 2011
> ÂÂÂÂÂÂ Checksum : 1bbf913b - correct
> ÂÂÂÂÂÂÂÂ Events : 133609
>
> ÂÂÂÂÂÂÂÂ Layout : left-symmetric
> ÂÂÂÂ Chunk Size : 64K
>
> ÂÂÂ Array Slot : 4 (0, failed, failed, failed, 3, 4)
> ÂÂ Array State : u__Uu 3 failed
> bernstein@server:~$ sudo mdadm --examine /dev/md1
> /dev/md1:
> ÂÂÂÂÂÂÂÂÂ Magic : a92b4efc
> ÂÂÂÂÂÂÂ Version : 1.2
> ÂÂÂ Feature Map : 0x4
> ÂÂÂÂ Array UUID : c3b6db19:b61c3ba9:0a74b12b:3041a523
> ÂÂÂÂÂÂÂÂÂÂ Name : master:public
> Â Creation Time : Sat Jan 22 00:15:43 2011
> ÂÂÂÂ Raid Level : raid5
> ÂÂ Raid Devices : 5
>
> ÂAvail Dev Size : 1953541616 (931.52 GiB 1000.21 GB)
> ÂÂÂÂ Array Size : 7814085120 (3726.05 GiB 4000.81 GB)
> Â Used Dev Size : 1953521280 (931.51 GiB 1000.20 GB)
> ÂÂÂ Data Offset : 272 sectors
> ÂÂ Super Offset : 8 sectors
> ÂÂÂÂÂÂÂÂÂ State : clean
> ÂÂÂ Device UUID : 3c7e2c3f:8b6c7c43:a0ce7e33:ad680bed
>
> Â Reshape pos'n : 502809856 (479.52 GiB 514.88 GB)
> Â Delta Devices : 1 (4->5)
>
> ÂÂÂ Update Time : Sat Feb 19 22:30:29 2011
> ÂÂÂÂÂÂ Checksum : 6c591e90 - correct
> ÂÂÂÂÂÂÂÂ Events : 133603
>
> ÂÂÂÂÂÂÂÂ Layout : left-symmetric
> ÂÂÂÂ Chunk Size : 64K
>
> ÂÂÂ Array Slot : 3 (0, failed, failed, 2, 3, 4)
> ÂÂ Array State : u_Uuu 2 failed

so obviously not /dev/sdd1 failed. however (due to that silly forced
assembly?!) the reshape pos'n field of md0, sd[ac]1 differs from md1 a
few bytes, resulting in an inconsistent state...

> bernstein@server:~$ sudo mdadm --assemble /dev/md2 /dev/sda1 /dev/md0 /dev/md1 /dev/sdd1 /dev/sdc1
>
> mdadm: /dev/md2 assembled from 3 drives - not enough to start the array.
> bernstein@server:~$ cat /proc/mdstat
> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
> md2 : inactive sdc1[0](S) sda1[5](S) md0[4](S) md1[3](S) sdd1[1](S)
> ÂÂÂÂÂ 4883823704 blocks super 1.2
>
> md1 : active raid0 sdf1[0] sdg1[1]
> ÂÂÂÂÂ 976770944 blocks super 1.2 64k chunks
>
> md0 : active raid0 sdb1[1] sdh1[0]
> ÂÂÂÂÂ 976770944 blocks super 1.2 64k chunks
>
> unused devices: <none>

i do have a backup but since recovery from it takes a few days, i'd
like to know if there is a way to recover the array or if it's
completely lost.

Any suggestions gratefully received,

claude
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux