md raid0 with overwritten superblock on 1/4 disks

travis+ml-linux-raid@xxxxxxxxxxxxxxxxx · Sun, 2 Feb 2014 16:41:37 -0800

So, I ran "parted" and "mklabel" on the wrong disk; I did it on a
RAID0 member where I had used the entire disk (instead of a partition
as is the convention).

Even though I quit without saving, the damage was done.

On reboot, the RAID would not assemble.

mdadm --examine looks something like this:

/dev/sda:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 4c59a1bc:9cfbe61a:c08ad011:3fda9db6
           Name : hostname:volname
  Creation Time : Sun May 12 20:27:33 2013
     Raid Level : raid0
   Raid Devices : 4

 Avail Dev Size : 5860531120 (2794.52 GiB 3000.59 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : 6c0fbaf1:314a4802:429dc206:648ef11b

    Update Time : Sun May 12 20:27:33 2013
       Checksum : 381d9692 - correct
         Events : 0

     Chunk Size : 512K

   Device Role : Active device 2
   Array State : AAAA ('A' == active, '.' == missing)

There are active devices 1,2,3 which only leaves 0.

So, I basically reviewed these links:
https://raid.wiki.kernel.org/index.php/RAID_Recovery
https://raid.wiki.kernel.org/index.php/Recovering_a_failed_software_RAID

And that second link was a huge win; using overlay files is a brilliant move.

Aside:

I made a bone-headed move while trying to follow the directions
without GNU parallel, and ended up trying to mdadm --create on
$DEVICES instead of $OVERLAYS but for whatever reason mdadm.super1
failed due to resource in use (by losetup, presumably) on one of the
disks so I think I'm okay.

So, anyway, I can re-create the RAID using:
UUID=4c59a1bc:9cfbe61a:c08ad011:3fda9db6
DEVICES="/dev/sdc /dev/sdd /dev/sda /dev/sdb" (they are in different order now)
parallel 'test -e /dev/loop{#} || mknod -m 660 /dev/loop{#} b 7 {#}' ::: $DEVICES
parallel 'size=$(blockdev --getsize {}); loop=$(losetup -f --show -- overlay-{/}); echo 0 $size snapshot {} $loop P 8 | dmsetup create {/}' ::: $DEVICE
OVERLAYS=$(parallel echo /dev/mapper/{/} ::: $DEVICES) # /dev/mapper/sdc /dev/mapper/sdd /dev/mapper/sda /dev/mapper/sdb

Then I create using overlays:
# mdadm --create /dev/md127  -v -l 0 -n 4 $OVERLAYS
mdadm: chunk size defaults to 512K
mdadm: /dev/mapper/sdc appears to be part of a raid array:
    level=raid0 devices=0 ctime=Wed Dec 31 16:00:00 1969
mdadm: partition table exists on /dev/mapper/sdc but will be lost or
       meaningless after creating array
mdadm: /dev/mapper/sdd appears to be part of a raid array:
    level=raid0 devices=4 ctime=Sun May 12 20:27:33 2013
mdadm: /dev/mapper/sda appears to be part of a raid array:
    level=raid0 devices=4 ctime=Sun May 12 20:27:33 2013
mdadm: /dev/mapper/sdb appears to be part of a raid array:
    level=raid0 devices=4 ctime=Sun May 12 20:27:33 2013
Continue creating array? y
mdadm: Defaulting to version 1.2 metadata
mdadm: array /dev/md127 started.

However, the LUKS on it won't work:
# cryptsetup luksOpen /dev/md127 c  
Device /dev/md127 is not a valid LUKS device.

And I think I know the reason for this.

The LUKS header is here:

# dd if=/dev/md127 bs=1M count=1 | hexdump -C | grep LUKS
0002e000  4c 55 4b 53 ba be 00 01  61 65 73 00 00 00 00 00  |LUKS....aes.....|

And by comparison, the LUKS header on a similarly-organized (4 disk SW
RAID0 partition) that I just created for scratch space looks like this:

# dd if=/dev/md126 bs=1M count=1 | hexdump -C | grep LUKS
00000000  4c 55 4b 53 ba be 00 01  61 65 73 00 00 00 00 00  |LUKS....aes.....|

So, there's an obvious problem with offsets involved; possibly the RAID was
created with a different chunk size as the default one, so when I re-create
a RAID on top of it, I get things at strange offsets.

I am quite confident the drives are in the right order, since:

# dd if=/dev/sdc bs=1M count=1 | hexdump -C | grep LUKS
00030000  4c 55 4b 53 ba be 00 01  61 65 73 00 00 00 00 00  |LUKS....aes.....|

No such LUKS header exists on the first 1M of any of the other disks.

However, I do find it a bit odd that the new, similar RAID has this
(note this RAID is installed into a partition on every drive):

# dd if=/dev/sdf1 bs=1M count=1 | hexdump -C | grep LUKS
00002000  4c 55 4b 53 ba be 00 01  61 65 73 00 00 00 00 00  |LUKS....aes.....|

Which suggests to me that there's something a bit off in the header or chunking.

Oddly:

# dd if=/dev/sdc bs=1M count=100 | hexdump -C | grep LUKS
00030000  4c 55 4b 53 ba be 00 01  61 65 73 00 00 00 00 00  |LUKS....aes.....|
00100000  4c 55 4b 53 ba be 00 01  61 65 73 00 00 00 00 00  |LUKS....aes.....|

It's somewhat possible that one of these is left over from another
time because I did not prefill the disks.  I am quite sure that the
disks in the RAID which failed were not partitioned.

Incidentally, the "break" between unencrypted and encrypted data in each system is:

/dev/sdd:
00005790  20 53 61 74 20 4d 61 72  20 20 39 20 31 39 3a 35  | Sat Mar  9 19:5|
000057a0  37 3a 34 35 20 32 30 31  33 0a 0a 00 00 00 00 00  |7:45 2013.......|
000057b0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00030000  0a d0 1e 45 be 9b e6 89  3c 5b c1 ae 6c 11 61 64  |...E....<[..l.ad|
00030010  94 b2 84 8a c4 3e 25 32  a3 68 43 d8 8a 7a 2b de  |.....>%2.hC..z+.|
00030020  a2 4d e9 86 3c 78 08 c3  be 75 f2 bf 76 d0 12 33  |.M..<x...u..v..3|
00030030  a6 99 58 21 1f b5 ae d6  47 c9 6c 72 18 48 b9 b0  |..X!....G.lr.H..|
00030040  2c 5d a8 43 a5 64 5e 9c  6d e3 dc 3e 63 fe f1 1a  |,].C.d^.m..>c...|
00030050  e7 f0 72 29 60 e1 75 74  4d 90 b6 b0 d8 70 0e ff  |..r)`.utM....p..|
00030060  f4 10 dd 00 09 3d 52 a1  a0 c0 16 52 9c a7 62 96  |.....=R....R..b.|

/dev/sda:
0002fc00  0e 0e 0e 0e 0e 0e 0e 0e  0e 0e 0e 0e 0e 0e 0e 0e  |................|
*
0002fe00  0f 0f 0f 0f 0f 0f 0f 0f  0f 0f 0f 0f 0f 0f 0f 0f  |................|
*
00030000  1a ae e4 a0 e0 35 54 fc  bb 12 68 1a 4d 2f 5b e7  |.....5T...h.M/[.|
00030010  ef cb 96 45 95 8c bd 05  4f cd 95 4e c9 46 80 be  |...E....O..N.F..|
00030020  f3 b4 fc 45 06 d7 84 ee  9b 42 57 92 65 4c 29 2c  |...E.....BW.eL),|

/dev/sdb:
00000260  00 00 00 00 00 00 00 00  00 10 00 00 00 00 00 00  |................|
00000270  00 f0 02 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000280  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00000800  4a be d0 23 57 a8 4b bc  7e 02 05 bd 59 4c 00 07  |J..#W.K.~...YL..|
00000810  03 48 e9 92 2b 3a 16 1f  f5 f3 37 f3 25 46 b0 14  |.H..+:....7.%F..|

It's a bit strange that /dev/sdb has some encrypted data before offset
0x30000, but given the fact that these disks were reused without
clearing or prefilling, and that /dev/sda and /dev/sdd start encrypted
data at 0x30000, chances are good that's left over from a previous
use.

Anyone got any ideas on how to recover it?

It would be trivial to partition /dev/sdc and in so doing, ignore the first
0x30000 or 0x100000 bytes, but since none of the other disks are partitioned,
I don't think that's the way it was configured initially.

FWIW, I may have configured this RAID initially on an older debian
ARMv5TE system and migrated it to a more conventional x64_64 Ubuntu
when the ARMv5/OpenRD box proved unstable.
-- 
http://www.subspacefield.org/~travis/
Remediating... LIKE A BOSS

Attachment:
pgpKgzfqoX6Uz.pgp

Description: PGP signature