On Fri, Feb 27, 2009 at 11:56 PM, Andrey Falko <ma3oxuct@xxxxxxxxx> wrote:> Hi everyone,>> I'm having some strange problems putting one of my raid5 back> together. Here is back ground story:>> I have 4 drives paritioned into a bunch of raid arrays. One of the> drives failed and I replaced it with a new one. I was able to get> mdadm to recover all arrays, except one raid5 array. The array with> troubles is /dev/md8 and it is supposed to have /dev/sd[abcd]13 under> it.>> This command started the recovery process (same thing that worked for> my other raid5 arrays):> mdadm --manage --add /dev/md8 /dev/sdc13>> md8 : active raid5 sdc13[4] sdd13[3] sdb13[1] sda13[0]> 117185856 blocks level 5, 64k chunk, algorithm 2 [4/3] [UU_U]> [>....................] recovery = 1.6% (634084/39061952)> finish=12.1min speed=52840K/sec>> However sometime after 1.6% into recovery, I did a "cat /proc/mdstat" and saw:>> md8 : active raid5 sdc13[4](S) sdd13[3] sdb13[1] sda13[5](F)> 117185856 blocks level 5, 64k chunk, algorithm 2 [4/2] [_U_U]>> /dev/sda only "failed" for this array and not on any of the other> arrays. I proceeded trying to remove and re-add /dev/sda13 and> /dev/sdc13, however that did not work. I ran the following:>> # mdadm --examine /dev/sda13> /dev/sda13:> Magic : a92b4efc> Version : 00.90.00> UUID : ec0fab43:4fb8d991:e6a58c12:> 482d89e4> Creation Time : Sat Sep 15 00:55:37 2007> Raid Level : raid5> Used Dev Size : 39061952 (37.25 GiB 40.00 GB)> Array Size : 117185856 (111.76 GiB 120.00 GB)> Raid Devices : 4> Total Devices : 4> Preferred Minor : 8>> Update Time : Sat Feb 28 00:49:48 2009> State : clean> Active Devices : 2> Working Devices : 4> Failed Devices : 1> Spare Devices : 2> Checksum : a6b02b9d - correct> Events : 0.36>> Layout : left-symmetric> Chunk Size : 64K>> Number Major Minor RaidDevice State> this 4 8 13 4 spare /dev/sda13>> 0 0 0 0 0 removed> 1 1 8 29 1 active sync /dev/sdb13> 2 2 0 0 2 faulty removed> 3 3 8 61 3 active sync /dev/sdd13> 4 4 8 13 4 spare /dev/sda13> 5 5 8 45 5 spare /dev/sdc13> # mdadm --examine /dev/sdb13> /dev/sdb13:> Magic : a92b4efc> Version : 00.90.00> UUID : ec0fab43:4fb8d991:e6a58c12:482d89e4> Creation Time : Sat Sep 15 00:55:37 2007> Raid Level : raid5> Used Dev Size : 39061952 (37.25 GiB 40.00 GB)> Array Size : 117185856 (111.76 GiB 120.00 GB)> Raid Devices : 4> Total Devices : 4> Preferred Minor : 8>> Update Time : Sat Feb 28 00:49:48 2009> State : clean> Active Devices : 2> Working Devices : 4> Failed Devices : 1> Spare Devices : 2> Checksum : a6b02bad - correct> Events : 0.36>> Layout : left-symmetric> Chunk Size : 64K>> Number Major Minor RaidDevice State> this 1 8 29 1 active sync /dev/sdb13>> 0 0 0 0 0 removed> 1 1 8 29 1 active sync /dev/sdb13> 2 2 0 0 2 faulty removed> 3 3 8 61 3 active sync /dev/sdd13> 4 4 8 13 4 spare /dev/sda13> 5 5 8 45 5 spare /dev/sdc13> # mdadm --examine /dev/sdc13> /dev/sdc13:> Magic : a92b4efc> Version : 00.90.00> UUID : ec0fab43:4fb8d991:e6a58c12:482d89e4> Creation Time : Sat Sep 15 00:55:37 2007> Raid Level : raid5> Used Dev Size : 39061952 (37.25 GiB 40.00 GB)> Array Size : 117185856 (111.76 GiB 120.00 GB)> Raid Devices : 4> Total Devices : 4> Preferred Minor : 8>> Update Time : Sat Feb 28 00:49:48 2009> State : clean> Active Devices : 2> Working Devices : 4> Failed Devices : 1> Spare Devices : 2> Checksum : a6b02bbf - correct> Events : 0.36>> Layout : left-symmetric> Chunk Size : 64K>> Number Major Minor RaidDevice State> this 5 8 45 5 spare /dev/sdc13>> 0 0 0 0 0 removed> 1 1 8 29 1 active sync /dev/sdb13> 2 2 0 0 2 faulty removed> 3 3 8 61 3 active sync /dev/sdd13> 4 4 8 13 4 spare /dev/sda13> 5 5 8 45 5 spare /dev/sdc13> # mdadm --examine /dev/sdd13> /dev/sdd13:> Magic : a92b4efc> Version : 00.90.00> UUID : ec0fab43:4fb8d991:e6a58c12:482d89e4> Creation Time : Sat Sep 15 00:55:37 2007> Raid Level : raid5> Used Dev Size : 39061952 (37.25 GiB 40.00 GB)> Array Size : 117185856 (111.76 GiB 120.00 GB)> Raid Devices : 4> Total Devices : 4> Preferred Minor : 8>> Update Time : Sat Feb 28 00:49:48 2009> State : clean> Active Devices : 2> Working Devices : 4> Failed Devices : 1> Spare Devices : 2> Checksum : a6b02bd1 - correct> Events : 0.36>> Layout : left-symmetric> Chunk Size : 64K>> Number Major Minor RaidDevice State> this 3 8 61 3 active sync /dev/sdd13>> 0 0 0 0 0 removed> 1 1 8 29 1 active sync /dev/sdb13> 2 2 0 0 2 faulty removed> 3 3 8 61 3 active sync /dev/sdd13> 4 4 8 13 4 spare /dev/sda13> 5 5 8 45 5 spare /dev/sdc13>>> At this point I started googling and ended up doing this:>> # mdadm --stop /dev/md8> # mdadm --zero-superblock /dev/sda13> # mdadm --zero-superblock /dev/sdb13> # mdadm --zero-superblock /dev/sdc13> # mdadm --zero-superblock /dev/sdd13> # mdadm -A /dev/md8 /dev/sda13 /dev/sdb13 /dev/sdc13 /dev/sdd13 --force> mdadm: no recogniseable superblock on /dev/sda13> mdadm: /dev/sda13 has no superblock - assembly aborted> # mdadm --create /dev/md8 -l 5 -n 4 /dev/sda13 /dev/sdb13 /dev/sdc13 /dev/sdd13> mdadm: /dev/sda13 appears to contain an ext2fs file system> size=117185856K mtime=Fri Feb 27 17:22:46 2009> mdadm: /dev/sdd13 appears to contain an ext2fs file system> size=117185856K mtime=Fri Feb 27 17:22:46 2009> Continue creating array? y> mdadm: array /dev/md8 started.> # cat /proc/mdstat> md8 : active raid5 sdd13[4] sdc13[2] sdb13[1] sda13[0]> 117185856 blocks level 5, 64k chunk, algorithm 2 [4/3] [UUU_]> [>....................] recovery = 0.7% (294244/39061952)> finish=10.9min speed=58848K/sec>> Again very shortly after 1.6% into recovery, it failed. Now I see the following:>> md8 : active raid5 sdd13[4](S) sdc13[2] sdb13[1] sda13[5](F)> 117185856 blocks level 5, 64k chunk, algorithm 2 [4/2] [_UU_]>> This surely does not look good. I hope that /dev/sda13 did not get> wrongly synced. Does anyone have any suggestions about what I should> do to recover this array? Does anyone have any ideas on what could> have possibly caused these issues? How can /dev/sda13, a healthy part> of the array lose its superblock?>> Let me know if you'd like any more info. mdadm version is 2.6.4.> Kernel version 2.6.24.3.>> Thanks in advance for any help,> Andrey> Hi everyone, I upgraded mdadm to 2.6.8 from 2.6.4 and re-tried the last procedureabove. This time, recovery went up to 16.2% before it stopped therefor about 8 seconds and showed me this:md8 : active raid5 sdd13[4](S) sdc13[2] sdb13[1] sda13[5](F) 117185856 blocks level 5, 64k chunk, algorithm 2 [4/2] [_UU_]��.n��������+%������w��{.n�����{����w��ܨ}���Ơz�j:+v�����w����ޙ��&�)ߡ�a����z�ޗ���ݢj��w�f