Re: Trouble recovering raid5 array

Andrey Falko <ma3oxuct@xxxxxxxxx> · Sun, 1 Mar 2009 10:27:32 -0800



On Fri, Feb 27, 2009 at 11:56 PM, Andrey Falko <ma3oxuct@xxxxxxxxx> wrote:> Hi everyone,>> I'm having some strange problems putting one of my raid5 back> together. Here is back ground story:>> I have 4 drives paritioned into a bunch of raid arrays. One of the> drives failed and I replaced it with a new one. I was able to get> mdadm to recover all arrays, except one raid5 array. The array with> troubles is /dev/md8 and it is supposed to have /dev/sd[abcd]13 under> it.>> This command started the recovery process (same thing that worked for> my other raid5 arrays):> mdadm --manage --add /dev/md8 /dev/sdc13>> md8 : active raid5 sdc13[4] sdd13[3] sdb13[1] sda13[0]>       117185856 blocks level 5, 64k chunk, algorithm 2 [4/3] [UU_U]>       [>....................]  recovery =  1.6% (634084/39061952)> finish=12.1min speed=52840K/sec>> However sometime after 1.6% into recovery, I did a "cat /proc/mdstat" and saw:>> md8 : active raid5 sdc13[4](S) sdd13[3] sdb13[1] sda13[5](F)>       117185856 blocks level 5, 64k chunk, algorithm 2 [4/2] [_U_U]>> /dev/sda only "failed" for this array and not on any of the other> arrays. I proceeded trying to remove and re-add /dev/sda13 and> /dev/sdc13, however that did not work. I ran the following:>> # mdadm --examine /dev/sda13> /dev/sda13:>           Magic : a92b4efc>         Version : 00.90.00>            UUID : ec0fab43:4fb8d991:e6a58c12:> 482d89e4>   Creation Time : Sat Sep 15 00:55:37 2007>      Raid Level : raid5>   Used Dev Size : 39061952 (37.25 GiB 40.00 GB)>      Array Size : 117185856 (111.76 GiB 120.00 GB)>    Raid Devices : 4>   Total Devices : 4> Preferred Minor : 8>>     Update Time : Sat Feb 28 00:49:48 2009>           State : clean>  Active Devices : 2> Working Devices : 4>  Failed Devices : 1>   Spare Devices : 2>        Checksum : a6b02b9d - correct>          Events : 0.36>>          Layout : left-symmetric>      Chunk Size : 64K>>       Number   Major   Minor   RaidDevice State> this     4       8       13        4      spare   /dev/sda13>>    0     0       0        0        0      removed>    1     1       8       29        1      active sync   /dev/sdb13>    2     2       0        0        2      faulty removed>    3     3       8       61        3      active sync   /dev/sdd13>    4     4       8       13        4      spare   /dev/sda13>    5     5       8       45        5      spare   /dev/sdc13> # mdadm --examine /dev/sdb13> /dev/sdb13:>           Magic : a92b4efc>         Version : 00.90.00>            UUID : ec0fab43:4fb8d991:e6a58c12:482d89e4>   Creation Time : Sat Sep 15 00:55:37 2007>      Raid Level : raid5>   Used Dev Size : 39061952 (37.25 GiB 40.00 GB)>      Array Size : 117185856 (111.76 GiB 120.00 GB)>    Raid Devices : 4>   Total Devices : 4> Preferred Minor : 8>>     Update Time : Sat Feb 28 00:49:48 2009>           State : clean>  Active Devices : 2> Working Devices : 4>  Failed Devices : 1>   Spare Devices : 2>        Checksum : a6b02bad - correct>          Events : 0.36>>          Layout : left-symmetric>      Chunk Size : 64K>>       Number   Major   Minor   RaidDevice State> this     1       8       29        1      active sync   /dev/sdb13>>    0     0       0        0        0      removed>    1     1       8       29        1      active sync   /dev/sdb13>    2     2       0        0        2      faulty removed>    3     3       8       61        3      active sync   /dev/sdd13>    4     4       8       13        4      spare   /dev/sda13>    5     5       8       45        5      spare   /dev/sdc13> # mdadm --examine /dev/sdc13> /dev/sdc13:>           Magic : a92b4efc>         Version : 00.90.00>            UUID : ec0fab43:4fb8d991:e6a58c12:482d89e4>   Creation Time : Sat Sep 15 00:55:37 2007>      Raid Level : raid5>   Used Dev Size : 39061952 (37.25 GiB 40.00 GB)>      Array Size : 117185856 (111.76 GiB 120.00 GB)>    Raid Devices : 4>   Total Devices : 4> Preferred Minor : 8>>     Update Time : Sat Feb 28 00:49:48 2009>           State : clean>  Active Devices : 2> Working Devices : 4>  Failed Devices : 1>   Spare Devices : 2>        Checksum : a6b02bbf - correct>          Events : 0.36>>          Layout : left-symmetric>      Chunk Size : 64K>>       Number   Major   Minor   RaidDevice State> this     5       8       45        5      spare   /dev/sdc13>>    0     0       0        0        0      removed>    1     1       8       29        1      active sync   /dev/sdb13>    2     2       0        0        2      faulty removed>    3     3       8       61        3      active sync   /dev/sdd13>    4     4       8       13        4      spare   /dev/sda13>    5     5       8       45        5      spare   /dev/sdc13> # mdadm --examine /dev/sdd13> /dev/sdd13:>           Magic : a92b4efc>         Version : 00.90.00>            UUID : ec0fab43:4fb8d991:e6a58c12:482d89e4>   Creation Time : Sat Sep 15 00:55:37 2007>      Raid Level : raid5>   Used Dev Size : 39061952 (37.25 GiB 40.00 GB)>      Array Size : 117185856 (111.76 GiB 120.00 GB)>    Raid Devices : 4>   Total Devices : 4> Preferred Minor : 8>>     Update Time : Sat Feb 28 00:49:48 2009>           State : clean>  Active Devices : 2> Working Devices : 4>  Failed Devices : 1>   Spare Devices : 2>        Checksum : a6b02bd1 - correct>          Events : 0.36>>          Layout : left-symmetric>      Chunk Size : 64K>>       Number   Major   Minor   RaidDevice State> this     3       8       61        3      active sync   /dev/sdd13>>    0     0       0        0        0      removed>    1     1       8       29        1      active sync   /dev/sdb13>    2     2       0        0        2      faulty removed>    3     3       8       61        3      active sync   /dev/sdd13>    4     4       8       13        4      spare   /dev/sda13>    5     5       8       45        5      spare   /dev/sdc13>>> At this point I started googling and ended up doing this:>> # mdadm --stop /dev/md8> # mdadm --zero-superblock /dev/sda13> # mdadm --zero-superblock /dev/sdb13> # mdadm --zero-superblock /dev/sdc13> # mdadm --zero-superblock /dev/sdd13> # mdadm -A /dev/md8 /dev/sda13 /dev/sdb13 /dev/sdc13 /dev/sdd13 --force> mdadm: no recogniseable superblock on /dev/sda13> mdadm: /dev/sda13 has no superblock - assembly aborted> # mdadm --create /dev/md8 -l 5 -n 4 /dev/sda13 /dev/sdb13 /dev/sdc13 /dev/sdd13> mdadm: /dev/sda13 appears to contain an ext2fs file system>     size=117185856K  mtime=Fri Feb 27 17:22:46 2009> mdadm: /dev/sdd13 appears to contain an ext2fs file system>     size=117185856K  mtime=Fri Feb 27 17:22:46 2009> Continue creating array? y> mdadm: array /dev/md8 started.> # cat /proc/mdstat> md8 : active raid5 sdd13[4] sdc13[2] sdb13[1] sda13[0]>       117185856 blocks level 5, 64k chunk, algorithm 2 [4/3] [UUU_]>       [>....................]  recovery =  0.7% (294244/39061952)> finish=10.9min speed=58848K/sec>> Again very shortly after 1.6% into recovery, it failed. Now I see the following:>> md8 : active raid5 sdd13[4](S) sdc13[2] sdb13[1] sda13[5](F)>       117185856 blocks level 5, 64k chunk, algorithm 2 [4/2] [_UU_]>> This surely does not look good. I hope that /dev/sda13 did not get> wrongly synced. Does anyone have any suggestions about what I should> do to recover this array? Does anyone have any ideas on what could> have possibly caused these issues? How can /dev/sda13, a healthy part> of the array lose its superblock?>> Let me know if you'd like any more info. mdadm version is 2.6.4.> Kernel version 2.6.24.3.>> Thanks in advance for any help,> Andrey>
Hi everyone,
I upgraded mdadm to 2.6.8 from 2.6.4 and re-tried the last procedureabove. This time, recovery went up to 16.2% before it stopped therefor about 8 seconds and showed me this:md8 : active raid5 sdd13[4](S) sdc13[2] sdb13[1] sda13[5](F)       117185856 blocks level 5, 64k chunk, algorithm 2 [4/2] [_UU_]��.n��������+%������w��{.n�����{����w��ܨ}���Ơz�j:+v�����w����ޙ��&�)ߡ�a����z�ޗ���ݢj��w�f