Re: Cannot assemble DDF raid

NeilBrown <neilb@xxxxxxx> · Wed, 26 Feb 2014 17:10:36 +1100

On Fri, 21 Feb 2014 05:26:15 +0100 Christian Iversen <ci@xxxxxxxxxx> wrote:

> (please CC, not on the list currently)
> 
> I'm trying to recover from a 2-disk RAID5 failure on a Dell PERC 
> controller running:
> 
>    2 x 146GB RAID1 (system)
>    6 x 2TB RAID5 (data1)
>    6 x 3TB RAID5 (data2)
> 
> Normally, data1 and data2 are then striped with mdadm on Linux, to 
> increase performance over a JBOD-style usage. This has worked nicely for 
> a while.. until we lost 2 disks in data2 within a few hours of each 
> other. Murphy's law, and all that.
> 
> 
> I've made a raw disk copy (using ddrescue) from one of the dead disks, 
> onto a new disk. I tried putting this disk in the server, but it would 
> not accept it. (It said it was recognized as foreign, but import failed)
> 
> If I try to assemble the raid, I get this error:
> 
> [root@rescue]~ #mdadm -A /dev/md10 /dev/sd[abcde]
> mdadm: superblock on /dev/sde doesn't match others - assembly aborted
> 
> Now, this does seem to be true. All the GUIDs on sda-sdd:
> 
> Controller GUID : 44656C6C:20202020:32374730:32524100:00743D30:00000021
>   Container GUID : 44656C6C:20202020:1000005B:10281F34:40371E8C:E9A398EA
>        VD GUID[0] : 44656C6C:20202020:1000005B:10281F34:3DB931F1:D8857F5D
>        VD GUID[1] : 44656C6C:20202020:1000005B:10281F34:3DB9326E:61E7B2D7
>        VD GUID[2] : 44656C6C:20202020:1000005B:10281F34:3F6ADA39:99DCAA67
> 
> While on the last 2 disks, we have this:
> 
> Controller GUID : 44656C6C:20202020:32374730:32524100:00743D30:00000021
>   Container GUID : 44656C6C:20202020:1000005B:10281F34:3DB931F1:40FC2989
>        VD GUID[0] : 44656C6C:20202020:1000005B:10281F34:3DB931F1:D8857F5D
>        VD GUID[1] : 44656C6C:20202020:1000005B:10281F34:3DB9326E:61E7B2D7
>        VD GUID[2] : 44656C6C:20202020:1000005B:10281F34:3F6ADA39:99DCAA67
> 
> Notice how the last 8 bytes of the Container is different.
> 
> I'm not quite sure how this happened, but I have a strong suspicion the 
> PERC controller did something less than clever, and now I can't start 
> the raid with mdadm OR perc.
> 
> 
> 
> I've tried to simply update the container GUID using a hex editor, but 
> this of course causes the CRCs to fail. (I reverted this change)
> 
> I have the following questions:
> 
>    1) If I could manage to change the Container GUID, would that
>       be a viable way to force the array to start, for further rescue?

I suspect so.

> 
>    2) Is there any other way to force the array to start? (--force does 
> not help)

Unfortunately not.

> 
>    3) Any other suggestions?
> 

Copy the bad disk to the good disk with ddrescure again?  
or just copy the last megabyte or so.

or maybe hack mdadm to assume all container guids are identical.

NeilBrown
Attachment:
signature.asc

Description: PGP signature