Hi, My machine suffered a system crash, a couple of days ago. Although the OS appeared to be still running, there was no means of input by any external device (except the power switch), so I power cycled it. When it came back up, it was obvious that there was a problem with the RAID 10 array containing my /home partition (c. 2TB). The crash was only the latest of a recent series. First, I ran some diagnostics, whose results are printed in the second text attachment to this email (the first attachment tells you what I know about the current state of the array, i.e. after my intervention). The results shown in the second attachment, together with the recent crashes and some previous experience, led me to believe that the four partitions in the array were not actually (or seriously) damaged, but simply out of synch. So I looked up the linux-raid mailing list thread in which I had reported my previous problem: http://www.spinics.net/lists/raid/msg22811.html Unfortunately, in a moment of reckless hope and blind panic I then did something very stupid ... I applied the 'solution' which Neil Brown had recommended for my previous RAID failures, without thinking through the differences in the new context. ... I realised this stupidity, at almost exactly at the moment when the ENTER key sprang back up after sending the following command: $ sudo mdadm --assemble --force --verbose /dev/md1 /dev/sdf4 /dev/sdg4 /dev/sdh4 /dev/sdi4 Producing these results some time later: $ cat /proc/mdstat Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] md_d0 : inactive sdi2[0](S) 9767424 blocks md1 : active raid10 sdf4[4] sdg4[1] sdh4[2] 1931767808 blocks 64K chunks 2 near-copies [4/2] [_UU_] [=====>...............] recovery = 29.4% (284005568/965883904) finish=250.0min speed=45440K/sec unused devices: <none> $ sudo mdadm --detail /dev/md1 /dev/md1: Version : 00.90 Creation Time : Tue May 6 02:06:45 2008 Raid Level : raid10 Array Size : 1931767808 (1842.28 GiB 1978.13 GB) Used Dev Size : 965883904 (921.14 GiB 989.07 GB) Raid Devices : 4 Total Devices : 3 Preferred Minor : 1 Persistence : Superblock is persistent Update Time : Sun May 30 00:25:19 2010 State : clean, degraded, recovering Active Devices : 2 Working Devices : 3 Failed Devices : 0 Spare Devices : 1 Layout : near=2, far=1 Chunk Size : 64K Rebuild Status : 25% complete UUID : f4ddbd55:206c7f81:b855f41b:37d33d37 Events : 0.8079536 Number Major Minor RaidDevice State 4 8 84 0 spare rebuilding /dev/sdf4 1 8 100 1 active sync /dev/sdg4 2 8 116 2 active sync /dev/sdh4 3 0 0 3 removed This result temporally raised my hopes because it indicated recovery in a degraded state ... and I had read somewhere (http://www.aput.net/~jheiss/raid10/) that 'degraded' meant "lost one or more drives but has not lost the right combination of drives to completely fail" Unfortunately this result also raised my fears, because the "RaidDevice State" indicated that it was treating /dev/sdf4 as the spare and writing to it ... whereas I believed that /dev/sdf4 was supposed to be a full member of the array ... and that /dev/sdj4 was supposed to be the spare. I think this belief is confirmed by these data on /dev/sdj4 (from the second attachment): Update Time : Tue Oct 6 18:01:45 2009 Events : 370 It may be too late, but at this point I came to my senses and resolved to stop tinkering and to email the following questions instead. QUESTION 1: Have I now wrecked any chance of recovering the data, or have I been lucky enough to retain enough data to rebuild the entire array by employing /dev/sdi4 and/or /dev/sdj4? QUESTION 2: If I have had 'the luck of the stupid', how do I proceed safely with the recovery? QUESTION 3: If I have NOT been unfeasibly lucky, is there any way of recovering some of the data files from the raw partitions? N.B. I would be more than happy to recover data at the date shown by /dev/sdi4's update time. The non-backed-up, business critical data, has not been modified in several weeks. I hope you can help and I'd be desperately grateful for it. Best wishes, Dave Fisher
$ cat /proc/mdstat Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] md_d0 : inactive sdi2[0](S) 9767424 blocks md1 : active raid10 sdf4[4](F) sdg4[5](F) sdh4[2] 1931767808 blocks 64K chunks 2 near-copies [4/1] [__U_] unused devices: <none> $ sudo mdadm -E /dev/sd{f,g,h,i,j}4 /dev/sdf4: Magic : a92b4efc Version : 00.90.00 UUID : f4ddbd55:206c7f81:b855f41b:37d33d37 Creation Time : Tue May 6 02:06:45 2008 Raid Level : raid10 Used Dev Size : 965883904 (921.14 GiB 989.07 GB) Array Size : 1931767808 (1842.28 GiB 1978.13 GB) Raid Devices : 4 Total Devices : 3 Preferred Minor : 1 Update Time : Sun May 30 04:47:20 2010 State : clean Active Devices : 1 Working Devices : 2 Failed Devices : 2 Spare Devices : 1 Checksum : 7d4a18fc - correct Events : 8079558 Layout : near=2, far=1 Chunk Size : 64K Number Major Minor RaidDevice State this 4 8 84 4 spare /dev/sdf4 0 0 0 0 0 removed 1 1 0 0 1 faulty removed 2 2 8 116 2 active sync /dev/sdh4 3 3 0 0 3 faulty removed 4 4 8 84 4 spare /dev/sdf4 /dev/sdg4: Magic : a92b4efc Version : 00.90.00 UUID : f4ddbd55:206c7f81:b855f41b:37d33d37 Creation Time : Tue May 6 02:06:45 2008 Raid Level : raid10 Used Dev Size : 965883904 (921.14 GiB 989.07 GB) Array Size : 1931767808 (1842.28 GiB 1978.13 GB) Raid Devices : 4 Total Devices : 3 Preferred Minor : 1 Update Time : Sun May 30 04:25:29 2010 State : clean Active Devices : 2 Working Devices : 3 Failed Devices : 1 Spare Devices : 1 Checksum : 7d4a13de - correct Events : 8079557 Layout : near=2, far=1 Chunk Size : 64K Number Major Minor RaidDevice State this 1 8 100 1 active sync /dev/sdg4 0 0 0 0 0 removed 1 1 8 100 1 active sync /dev/sdg4 2 2 8 116 2 active sync /dev/sdh4 3 3 0 0 3 faulty removed 4 4 8 84 4 spare /dev/sdf4 /dev/sdh4: Magic : a92b4efc Version : 00.90.00 UUID : f4ddbd55:206c7f81:b855f41b:37d33d37 Creation Time : Tue May 6 02:06:45 2008 Raid Level : raid10 Used Dev Size : 965883904 (921.14 GiB 989.07 GB) Array Size : 1931767808 (1842.28 GiB 1978.13 GB) Raid Devices : 4 Total Devices : 3 Preferred Minor : 1 Update Time : Sun May 30 08:50:37 2010 State : clean Active Devices : 1 Working Devices : 1 Failed Devices : 2 Spare Devices : 0 Checksum : 7d4a5230 - correct Events : 8079565 Layout : near=2, far=1 Chunk Size : 64K Number Major Minor RaidDevice State this 2 8 116 2 active sync /dev/sdh4 0 0 0 0 0 removed 1 1 0 0 1 faulty removed 2 2 8 116 2 active sync /dev/sdh4 3 3 0 0 3 faulty removed /dev/sdi4: Magic : a92b4efc Version : 00.90.00 UUID : f4ddbd55:206c7f81:b855f41b:37d33d37 Creation Time : Tue May 6 02:06:45 2008 Raid Level : raid10 Used Dev Size : 965883904 (921.14 GiB 989.07 GB) Array Size : 1931767808 (1842.28 GiB 1978.13 GB) Raid Devices : 4 Total Devices : 4 Preferred Minor : 1 Update Time : Mon May 24 02:12:54 2010 State : clean Active Devices : 4 Working Devices : 4 Failed Devices : 0 Spare Devices : 0 Checksum : 7d3a6276 - correct Events : 7828427 Layout : near=2, far=1 Chunk Size : 64K Number Major Minor RaidDevice State this 0 8 132 0 active sync /dev/sdi4 0 0 8 132 0 active sync /dev/sdi4 1 1 8 100 1 active sync /dev/sdg4 2 2 8 116 2 active sync /dev/sdh4 3 3 8 84 3 active sync /dev/sdf4 /dev/sdj4: Magic : a92b4efc Version : 00.90.00 UUID : f4ddbd55:206c7f81:b855f41b:37d33d37 Creation Time : Tue May 6 02:06:45 2008 Raid Level : raid10 Used Dev Size : 965883904 (921.14 GiB 989.07 GB) Array Size : 1931767808 (1842.28 GiB 1978.13 GB) Raid Devices : 4 Total Devices : 5 Preferred Minor : 1 Update Time : Tue Oct 6 18:01:45 2009 State : clean Active Devices : 4 Working Devices : 5 Failed Devices : 0 Spare Devices : 1 Checksum : 7b1d23e4 - correct Events : 370 Layout : near=2, far=1 Chunk Size : 64K Number Major Minor RaidDevice State this 3 8 148 3 active sync /dev/sdj4 0 0 8 132 0 active sync /dev/sdi4 1 1 8 100 1 active sync /dev/sdg4 2 2 8 116 2 active sync /dev/sdh4 3 3 8 148 3 active sync /dev/sdj4 4 4 8 84 4 spare /dev/sdf4
$ cat /proc/mdstat Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] md1 : inactive sdh4[2](S) sdf4[3](S) sdg4[1](S) sdi4[0](S) 3863535616 blocks unused devices: <none> sudo mdadm --examine /dev/md1 mdadm: No md superblock detected on /dev/md1. $ sudo mdadm --examine /dev/sdf4 /dev/sdf4: Magic : a92b4efc Version : 00.90.00 UUID : f4ddbd55:206c7f81:b855f41b:37d33d37 Creation Time : Tue May 6 02:06:45 2008 Raid Level : raid10 Used Dev Size : 965883904 (921.14 GiB 989.07 GB) Array Size : 1931767808 (1842.28 GiB 1978.13 GB) Raid Devices : 4 Total Devices : 4 Preferred Minor : 1 Update Time : Mon May 24 02:12:54 2010 State : clean Active Devices : 4 Working Devices : 4 Failed Devices : 0 Spare Devices : 0 Checksum : 7d3a624c - correct Events : 7828427 Layout : near=2, far=1 Chunk Size : 64K Number Major Minor RaidDevice State this 3 8 84 3 active sync /dev/sdf4 0 0 8 132 0 active sync /dev/sdi4 1 1 8 100 1 active sync /dev/sdg4 2 2 8 116 2 active sync /dev/sdh4 3 3 8 84 3 active sync /dev/sdf4 </pre> $ sudo mdadm --examine /dev/sdg4 /dev/sdg4: Magic : a92b4efc Version : 00.90.00 UUID : f4ddbd55:206c7f81:b855f41b:37d33d37 Creation Time : Tue May 6 02:06:45 2008 Raid Level : raid10 Used Dev Size : 965883904 (921.14 GiB 989.07 GB) Array Size : 1931767808 (1842.28 GiB 1978.13 GB) Raid Devices : 4 Total Devices : 4 Preferred Minor : 1 Update Time : Sat May 29 01:12:30 2010 State : active Active Devices : 2 Working Devices : 2 Failed Devices : 1 Spare Devices : 0 Checksum : 7ccd4c92 - correct Events : 8079459 Layout : near=2, far=1 Chunk Size : 64K Number Major Minor RaidDevice State this 1 8 100 1 active sync /dev/sdg4 0 0 0 0 0 removed 1 1 8 100 1 active sync /dev/sdg4 2 2 8 116 2 active sync /dev/sdh4 3 3 0 0 3 faulty removed $ sudo mdadm --examine /dev/sdh4 /dev/sdh4: Magic : a92b4efc Version : 00.90.00 UUID : f4ddbd55:206c7f81:b855f41b:37d33d37 Creation Time : Tue May 6 02:06:45 2008 Raid Level : raid10 Used Dev Size : 965883904 (921.14 GiB 989.07 GB) Array Size : 1931767808 (1842.28 GiB 1978.13 GB) Raid Devices : 4 Total Devices : 4 Preferred Minor : 1 Update Time : Sat May 29 01:26:30 2010 State : clean Active Devices : 1 Working Devices : 1 Failed Devices : 2 Spare Devices : 0 Checksum : 7d4898bb - correct Events : 8079505 Layout : near=2, far=1 Chunk Size : 64K Number Major Minor RaidDevice State this 2 8 116 2 active sync /dev/sdh4 0 0 0 0 0 removed 1 1 0 0 1 faulty removed 2 2 8 116 2 active sync /dev/sdh4 3 3 0 0 3 faulty removed $ sudo mdadm --examine /dev/sdi4 /dev/sdi4: Magic : a92b4efc Version : 00.90.00 UUID : f4ddbd55:206c7f81:b855f41b:37d33d37 Creation Time : Tue May 6 02:06:45 2008 Raid Level : raid10 Used Dev Size : 965883904 (921.14 GiB 989.07 GB) Array Size : 1931767808 (1842.28 GiB 1978.13 GB) Raid Devices : 4 Total Devices : 4 Preferred Minor : 1 Update Time : Mon May 24 02:12:54 2010 State : clean Active Devices : 4 Working Devices : 4 Failed Devices : 0 Spare Devices : 0 Checksum : 7d3a6276 - correct Events : 7828427 Layout : near=2, far=1 Chunk Size : 64K Number Major Minor RaidDevice State this 0 8 132 0 active sync /dev/sdi4 0 0 8 132 0 active sync /dev/sdi4 1 1 8 100 1 active sync /dev/sdg4 2 2 8 116 2 active sync /dev/sdh4 3 3 8 84 3 active sync /dev/sdf4 $ sudo mdadm --examine /dev/sdj4 [sudo] password for davef: /dev/sdj4: Magic : a92b4efc Version : 00.90.00 UUID : f4ddbd55:206c7f81:b855f41b:37d33d37 Creation Time : Tue May 6 02:06:45 2008 Raid Level : raid10 Used Dev Size : 965883904 (921.14 GiB 989.07 GB) Array Size : 1931767808 (1842.28 GiB 1978.13 GB) Raid Devices : 4 Total Devices : 5 Preferred Minor : 1 Update Time : Tue Oct 6 18:01:45 2009 State : clean Active Devices : 4 Working Devices : 5 Failed Devices : 0 Spare Devices : 1 Checksum : 7b1d23e4 - correct Events : 370 Layout : near=2, far=1 Chunk Size : 64K Number Major Minor RaidDevice State this 3 8 148 3 active sync /dev/sdj4 0 0 8 132 0 active sync /dev/sdi4 1 1 8 100 1 active sync /dev/sdg4 2 2 8 116 2 active sync /dev/sdh4 3 3 8 148 3 active sync /dev/sdj4 4 4 8 84 4 spare /dev/sdf4