Re: RAID5 disk failure during rebuild of spare, any chance of recovery when one of the failed devices is suspected to be intact?

Tor Arne Vestbø <torarnv@xxxxxxxxx> · Mon, 16 Aug 2010 18:27:27 +0200

On Mon, Aug 16, 2010 at 10:43 AM, Tim Small <tim@xxxxxxxxxxx> wrote:
> On 16/08/10 07:12, Nicolas Jungers wrote:
>>
>> On 08/16/2010 07:54 AM, Tor Arne Vestbø wrote:
>>>
>>> You mean you sdc and sde plus either sdb or sdd, depending on which
>>> one I think is more sane a this point?
>>
>> I'd try both.  Do a ddrescue of the failing one and try that (with copy of
>> the others) and check what's coming out.
>
> As an alternative to using ddrescue, you could quickly prototype various
> arrangements (without writing anything to the drives) using a device-mapper
> copy-on-write mapping - I posted some details to the list a while back when
> I was trying to use this to reconstruct a hw raid array...  Check the list
> archives for details.

Cool, here's what I tried:

Created spares files for each of the devices

  dd if=/dev/zero of=sdb_cow bs=1 count=0 seek=2GB

Mapped that to a loop device

  losetup /dev/loop1 sdb_cow

Then ran the following for each device:

  cow_size=`blockdev --getsize /dev/sdb1`
  chunk_size=64
  echo "0 $cow_size snapshot /dev/sdb1 /dev/loop1 p $chunk_size" |
dmsetup create sdb1_cow

After these were created I tried the following:

# mdadm -v -C /dev/md0 -l5 -n4 /dev/mapper/sdb1_cow
/dev/mapper/sdc1_cow missing /dev/mapper/sde1_cow
mdadm: layout defaults to left-symmetric
mdadm: chunk size defaults to 64K
mdadm: /dev/mapper/sdb1_cow appears to be part of a raid array:
    level=raid5 devices=4 ctime=Sun Mar  2 22:52:53 2008
mdadm: /dev/mapper/sdc1_cow appears to be part of a raid array:
    level=raid5 devices=4 ctime=Sun Mar  2 22:52:53 2008
mdadm: /dev/mapper/sde1_cow appears to be part of a raid array:
    level=raid5 devices=4 ctime=Sun Mar  2 22:52:53 2008
mdadm: size set to 732571904K
Continue creating array? Y
mdadm: array /dev/md0 started.

# mdadm --detail /dev/md0
/dev/md0:
        Version : 00.90
  Creation Time : Mon Aug 16 18:20:06 2010
     Raid Level : raid5
     Array Size : 2197715712 (2095.91 GiB 2250.46 GB)
  Used Dev Size : 732571904 (698.64 GiB 750.15 GB)
   Raid Devices : 4
  Total Devices : 3
Preferred Minor : 0
    Persistence : Superblock is persistent

    Update Time : Mon Aug 16 18:20:06 2010
          State : clean, degraded
 Active Devices : 3
Working Devices : 3
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 64K

           UUID : 916ceaa2:b877a3cc:3973abef:31f2d600 (local to host monstre)
         Events : 0.1

    Number   Major   Minor   RaidDevice State
       0     251        9        0      active sync   /dev/block/251:9
       1     251       10        1      active sync   /dev/block/251:10
       2       0        0        2      removed
       3     251       12        3      active sync   /dev/block/251:12

And I can now mount /dev/mapper/raid-home !

The question now is, what next? Should I start copying things off to a
backup, or run fsck first or something else to try to repair errors?
Or perhaps are the 2GB sparse files to small for anything like that?

Thanks!

Tor Arne
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html