Re: RAID5 disk failure during rebuild of spare, any chance of recovery when one of the failed devices is suspected to be intact?

Nicolas Jungers <nicolas@xxxxxxxxxxx> · Mon, 16 Aug 2010 18:37:56 +0200

On 08/16/2010 06:27 PM, Tor Arne Vestbø wrote:
On Mon, Aug 16, 2010 at 10:43 AM, Tim Small<tim@xxxxxxxxxxx>  wrote:
On 16/08/10 07:12, Nicolas Jungers wrote:

On 08/16/2010 07:54 AM, Tor Arne Vestbø wrote:

You mean you sdc and sde plus either sdb or sdd, depending on which
one I think is more sane a this point?

I'd try both.  Do a ddrescue of the failing one and try that (with copy of
the others) and check what's coming out.

As an alternative to using ddrescue, you could quickly prototype various
arrangements (without writing anything to the drives) using a device-mapper
copy-on-write mapping - I posted some details to the list a while back when
I was trying to use this to reconstruct a hw raid array...  Check the list
archives for details.

Cool, here's what I tried:

Created spares files for each of the devices

   dd if=/dev/zero of=sdb_cow bs=1 count=0 seek=2GB

Mapped that to a loop device

   losetup /dev/loop1 sdb_cow

Then ran the following for each device:

   cow_size=`blockdev --getsize /dev/sdb1`
   chunk_size=64
   echo "0 $cow_size snapshot /dev/sdb1 /dev/loop1 p $chunk_size" |
dmsetup create sdb1_cow

After these were created I tried the following:

# mdadm -v -C /dev/md0 -l5 -n4 /dev/mapper/sdb1_cow
/dev/mapper/sdc1_cow missing /dev/mapper/sde1_cow
mdadm: layout defaults to left-symmetric
mdadm: chunk size defaults to 64K
mdadm: /dev/mapper/sdb1_cow appears to be part of a raid array:
     level=raid5 devices=4 ctime=Sun Mar  2 22:52:53 2008
mdadm: /dev/mapper/sdc1_cow appears to be part of a raid array:
     level=raid5 devices=4 ctime=Sun Mar  2 22:52:53 2008
mdadm: /dev/mapper/sde1_cow appears to be part of a raid array:
     level=raid5 devices=4 ctime=Sun Mar  2 22:52:53 2008
mdadm: size set to 732571904K
Continue creating array? Y
mdadm: array /dev/md0 started.

# mdadm --detail /dev/md0
/dev/md0:
         Version : 00.90
   Creation Time : Mon Aug 16 18:20:06 2010
      Raid Level : raid5
      Array Size : 2197715712 (2095.91 GiB 2250.46 GB)
   Used Dev Size : 732571904 (698.64 GiB 750.15 GB)
    Raid Devices : 4
   Total Devices : 3
Preferred Minor : 0
     Persistence : Superblock is persistent

     Update Time : Mon Aug 16 18:20:06 2010
           State : clean, degraded
  Active Devices : 3
Working Devices : 3
  Failed Devices : 0
   Spare Devices : 0

          Layout : left-symmetric
      Chunk Size : 64K

            UUID : 916ceaa2:b877a3cc:3973abef:31f2d600 (local to host monstre)
          Events : 0.1

     Number   Major   Minor   RaidDevice State
        0     251        9        0      active sync   /dev/block/251:9
        1     251       10        1      active sync   /dev/block/251:10
        2       0        0        2      removed
        3     251       12        3      active sync   /dev/block/251:12

And I can now mount /dev/mapper/raid-home !

The question now is, what next? Should I start copying things off to a
backup, or run fsck first or something else to try to repair errors?
Or perhaps are the 2GB sparse files to small for anything like that?

For me: first, copy everything.  You have an unreliable disk in the 
middle of your data.

N.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html