RAID 6 (containing LUKS dm-crypt) recovery help.

xar <xar@xxxxxxxxx> · Fri, 07 Nov 2014 00:46:05 -0500

Greetings,

I have a RAID 6 (which contains a LUKS container) that I'm hoping to get 
some help/insight in recovering. The server experienced some sort of 
hardware event that resulted in a mandatory restart of the server.

For the record, after the server completed the restart, the array looked 
like this, "all spares":

md6 : inactive sdl1[7](S) sdh1[13](S) sdg1[14](S) sdk1[11](S) 
sdj1[10](S) sdi1[6](S) sdd1[2](S) sdf1[8](S) sdb1[12](S) sde1[3](S) 
sdc1[15](S) 21488638704 blocks super 1.2

The server in question is Ubuntu ("Precise") 12.04.5 LTS with mdadm 
version 3.2.5-1ubuntu0 installed.

The mdadm array has the following characteristics:

RAID level: 6
Chunk size: 256k
Version: 1.2
Number of devices: 11

All attempts to assemble the array continued to result in the "all 
spare" condition (output above). Thinking that the metadata had been 
corrupted somehow, I set out to recreate the array.

The following is the dev_number fields from the metadata, before I 
attempted to recreate the array:

# for i in /dev/sd?1; do echo -n $i '' ; dd 2> /dev/null if=$i bs=1 
count=4 skip=4256 | od -D | head -n1; done:

/dev/sdb1 0000000         12
/dev/sdc1 0000000         15
/dev/sdd1 0000000          2
/dev/sde1 0000000          3
/dev/sdf1 0000000          8
/dev/sdg1 0000000         14
/dev/sdh1 0000000         13
/dev/sdi1 0000000          6
/dev/sdj1 0000000         10
/dev/sdk1 0000000         11
/dev/sdl1 0000000          7

I used the following to extract the index position of each device on a 
device I suspected wasn't corrupted (for the record, they all returned 
the same data):

# dd 2> /dev/null if=/dev/sdc1 bs=2 count=6 skip=2176 | od -d

0000000 65534 65534     2 65534 65534 65534     4     5
0000020 65534 65534     7     8     0     9 65534
0000036

As you can see, there's already a visible mismatch between the 
dev_number and the listed indexes. For instance, /dev/sdc1 returned a 
device number of 15, but there's not a 15th position in the 
corresponding list.

I pulled from log history, the last known working "layout", circa July 
of this year:

# mdadm -D /dev/md6
/dev/md6:
        Version : 1.2
  Creation Time : Sat Apr 23 06:22:23 2011
     Raid Level : raid6
     Array Size : 17581609728 (16767.13 GiB 18003.57 GB)
  Used Dev Size : 1953512192 (1863.01 GiB 2000.40 GB)
   Raid Devices : 11
  Total Devices : 10
    Persistence : Superblock is persistent

    Update Time : Sat Jun 21 21:13:45 2014
          State : clean, degraded
 Active Devices : 10
Working Devices : 10
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 256K

           Name : server:6  (local to host server)
           UUID : 6958450b:e4dfa2f3:259ff733:f343a584
         Events : 390345

    Number   Major   Minor   RaidDevice State
      12       8       17        0      active sync   /dev/sdb1
       3       8       65        1      active sync   /dev/sde1
       2       8       49        2      active sync   /dev/sdd1
       8       8       81        3      active sync   /dev/sdf1
       6       8      129        4      active sync   /dev/sdi1
       7       8      177        5      active sync   /dev/sdl1
       6       0        0        6      removed
      10       8      145        7      active sync   /dev/sdj1
      11       8      161        8      active sync   /dev/sdk1
      13       8      113        9      active sync   /dev/sdh1
      14       8       97       10      active sync   /dev/sdg1

The dev_numbers and index position information in conjunction with the 
historic data (directly above) seemed to indicate that the proper 
recreation order and command would be the following:

# mdadm --create /dev/md6 --assume-clean --level=6 --raid-devices=11 
--metadata=1.2 --chunk=256 /dev/sdb1 /dev/sde1 /dev/sdd1 /dev/sdf1 
/dev/sdi1 /dev/sdl1 /dev/sdc1 /dev/sdj1 /dev/sdk1 /dev/sdh1 /dev/sdg1

I ran this above command.

Here is the current output of "lsdrv" which I observed several people 
found to be relevant on this list, from creator 'pturmel' on github:

PCI [ahci] 00:11.0 SATA controller: Advanced Micro Devices, Inc. 
[AMD/ATI] SB7x0/SB8x0/SB9x0 SATA Controller [AHCI mode] (rev 40)
âscsi 0:0:0:0 ATA      ST31500541AS     {6XW03WTD}
ââsda 1.36t [8:0] Partitioned (dos)
â âsda1 109.79m [8:1] ext4 {07f99e8c-95d2-483d-9850-05f04820c3f6}
â ââMounted as /dev/sda1 @ /boot
â âsda2 2.01g [8:2] swap {d137430d-815a-4c45-a394-9bece3aa7136}
â âsda3 7.01g [8:3] ext4 {8db73200-8d9d-4991-9802-b13f1550a9d9}
â ââMounted as /dev/disk/by-uuid/8db73200-8d9d-4991-9802-b13f1550a9d9 @ /
â âsda4 1.36t [8:4] Empty/Unknown
â  âdm-0 1.36t [252:0] xfs {db7ddb53-080c-45ba-ab4d-e45d35eb451c}
â   âMounted as /dev/mapper/enc @ /encrypted
âscsi 1:0:0:0 ATA      ST2000DL003-9VT1 {5YD4VZLV}
ââsdb 1.82t [8:16] Partitioned (gpt)
â âsdb1 1.82t [8:17] MD raid6 (11) inactive 'server:6' 
{c88f3b40-c7d4-a33a-5007-ab8ef784c0c2}
âscsi 2:0:0:0 ATA      ST2000DM001-1CH1 {Z1E8GNFQ}
ââsdc 1.82t [8:32] Partitioned (gpt)
â âsdc1 1.82t [8:33] MD raid6 (11) inactive 'server:6' 
{c88f3b40-c7d4-a33a-5007-ab8ef784c0c2}
âscsi 3:0:0:0 ATA      ST2000DL003-9VT1 {5YD2PZM3}
ââsdd 1.82t [8:48] Partitioned (gpt)
â âsdd1 1.82t [8:49] MD raid6 (11) inactive 'server:6' 
{c88f3b40-c7d4-a33a-5007-ab8ef784c0c2}
âscsi 4:0:0:0 ATA      ST2000DL003-9VT1 {5YD2J0XD}
ââsde 1.82t [8:64] Partitioned (gpt)
â âsde1 1.82t [8:65] MD raid6 (11) inactive 'server:6' 
{c88f3b40-c7d4-a33a-5007-ab8ef784c0c2}
âscsi 5:0:0:0 ATA      ST2000DL003-9VT1 {5YD3XE9M}
 âsdf 1.82t [8:80] Partitioned (gpt)
  âsdf1 1.82t [8:81] MD raid6 (11) inactive 'server:6' 
{c88f3b40-c7d4-a33a-5007-ab8ef784c0c2}
PCI [sata_sil24] 04:04.0 RAID bus controller: Silicon Image, Inc. SiI 
3124 PCI-X Serial ATA Controller (rev 02)
âscsi 6:0:0:0 ATA      ST2000DL003-9VT1 {5YD6JW2L}
ââsdg 1.82t [8:96] Partitioned (gpt)
â âsdg1 1.82t [8:97] MD raid6 (11) inactive 'server:6' 
{65daae65-118b-896a-6205-0f2c4dacb4de}
âscsi 7:0:0:0 ATA      ST2000DL003-9VT1 {6YD05E5Y}
ââsdh 1.82t [8:112] Partitioned (gpt)
â âsdh1 1.82t [8:113] MD raid6 (11) inactive 'server:6' 
{c88f3b40-c7d4-a33a-5007-ab8ef784c0c2}
âscsi 8:x:x:x [Empty]
âscsi 9:x:x:x [Empty]
PCI [sata_sil24] 06:04.0 RAID bus controller: Silicon Image, Inc. SiI 
3124 PCI-X Serial ATA Controller (rev 02)
âscsi 10:0:0:0 ATA      ST32000542AS     {5XW1PVCZ}
ââsdi 1.82t [8:128] Partitioned (gpt)
â âsdi1 1.82t [8:129] MD raid6 (11) inactive 'server:6' 
{c88f3b40-c7d4-a33a-5007-ab8ef784c0c2}
âscsi 11:0:0:0 ATA      ST2000DL003-9VT1 {5YD2SND2}
ââsdj 1.82t [8:144] Partitioned (gpt)
â âsdj1 1.82t [8:145] MD raid6 (11) inactive 'server:6' 
{c88f3b40-c7d4-a33a-5007-ab8ef784c0c2}
âscsi 12:0:0:0 ATA      ST2000DL003-9VT1 {5YD4JTZP}
ââsdk 1.82t [8:160] Partitioned (gpt)
â âsdk1 1.82t [8:161] MD raid6 (11) inactive 'server:6' 
{c88f3b40-c7d4-a33a-5007-ab8ef784c0c2}
âscsi 13:0:0:0 ATA      ST32000542AS     {5XW1KAEA}
 âsdl 1.82t [8:176] Partitioned (gpt)
  âsdl1 1.82t [8:177] MD raid6 (11) inactive 'server:6' 
{c88f3b40-c7d4-a33a-5007-ab8ef784c0c2}
Other Block Devices
âloop0 0.00k [7:0] Empty/Unknown
âloop1 0.00k [7:1] Empty/Unknown
âloop2 0.00k [7:2] Empty/Unknown
âloop3 0.00k [7:3] Empty/Unknown
âloop4 0.00k [7:4] Empty/Unknown
âloop5 0.00k [7:5] Empty/Unknown
âloop6 0.00k [7:6] Empty/Unknown
âloop7 0.00k [7:7] Empty/Unknown
âram0 64.00m [1:0] Empty/Unknown
âram1 64.00m [1:1] Empty/Unknown
âram2 64.00m [1:2] Empty/Unknown
âram3 64.00m [1:3] Empty/Unknown
âram4 64.00m [1:4] Empty/Unknown
âram5 64.00m [1:5] Empty/Unknown
âram6 64.00m [1:6] Empty/Unknown
âram7 64.00m [1:7] Empty/Unknown
âram8 64.00m [1:8] Empty/Unknown
âram9 64.00m [1:9] Empty/Unknown
âram10 64.00m [1:10] Empty/Unknown
âram11 64.00m [1:11] Empty/Unknown
âram12 64.00m [1:12] Empty/Unknown
âram13 64.00m [1:13] Empty/Unknown
âram14 64.00m [1:14] Empty/Unknown
âram15 64.00m [1:15] Empty/Unknown

Following the recreation, the array now looks like this:

# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] 
[raid4] [raid10]
md6 : active raid6 sdh1[9] sdk1[8] sdj1[7] sdc1[6] sdl1[5] sdi1[4] 
sdf1[3] sdd1[2] sde1[1] sdb1[0]
      17580439296 blocks super 1.2 level 6, 256k chunk, algorithm 2 
[11/10] [UUUUUUUUUUU]

# mdadm -D /dev/md6
/dev/md6:
        Version : 1.2
  Creation Time : Fri Nov  7 05:40:16 2014
     Raid Level : raid6
     Array Size : 17580439296 (16766.01 GiB 18002.37 GB)
  Used Dev Size : 1953382144 (1862.89 GiB 2000.26 GB)
   Raid Devices : 11
  Total Devices : 11
    Persistence : Superblock is persistent

    Update Time : Fri Nov  7 05:40:16 2014
          State : clean
 Active Devices : 11
Working Devices : 11
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 256K

           Name : server:6  (local to host server)
           UUID : b306872f:5ef902a8:76f5e233:f220f4d4
         Events : 0

    Number   Major   Minor   RaidDevice State
       0       8       17        0      active sync   /dev/sdb1
       1       8       65        1      active sync   /dev/sde1
       2       8       49        2      active sync   /dev/sdd1
       3       8       81        3      active sync   /dev/sdf1
       4       8      129        4      active sync   /dev/sdi1
       5       8      177        5      active sync   /dev/sdl1
       6       8       33        6      active sync   /dev/sdc1
       7       8      145        7      active sync   /dev/sdj1
       8       8      161        8      active sync   /dev/sdk1
       9       8      113        9      active sync   /dev/sdh1
      10       8       97       10      active sync   /dev/sdg1

This array should contain a LUKS container, however, it's missing. If I 
hexdump the first 20 lines, the LUKS header is completely missing:

# cryptsetup luksOpen /dev/md6 luks
Device /dev/md6 is not a valid LUKS device.

# hexdump -C /dev/md6 | head -n16
00000000  0b 37 89 e0 66 96 7a d4  6c 5b 57 09 a5 8d 6a c5 
|.7..f.z.l[W...j.|
00000010  a7 65 20 6e f0 db 74 db  03 d8 e9 2b 39 05 37 a4  |.e 
n..t....+9.7.|
00000020  cb 25 d7 7b fd cf b5 b4  12 ad e2 24 24 de 66 42 
|.%.{.......$$.fB|
00000030  61 a2 1b ea 8b 5c 04 38  7e 5e 61 11 3d ba 99 35 
|a....\.8~^a.=..5|
00000040  b7 e9 e6 76 72 18 d2 d5  bd cd 1b ed 59 15 fb 83 
|...vr.......Y...|
00000050  bc 57 94 85 31 c1 3e af  51 f1 25 50 db 57 d3 cd 
|.W..1.>.Q.%P.W..|
00000060  69 d5 31 23 df 01 ef 03  e3 92 66 c6 1f 38 3f 57 
|i.1#......f..8?W|
00000070  67 20 38 8c c2 ec 25 dc  59 42 b4 5d 9d 9e c1 79  |g 
8...%.YB.]...y|
00000080  4a f5 e1 ad f8 08 16 d5  37 3f f6 83 62 f2 6f f5 
|J.......7?..b.o.|
00000090  53 95 4f 69 ce 7c ba 4c  86 ef a1 1c 04 d7 b3 17 
|S.Oi.|.L........|
000000a0  cd ea 5f 25 56 a4 0d 6f  64 e9 51 b5 71 b3 18 7f 
|.._%V..od.Q.q...|
000000b0  46 e7 8b ab 08 ae f5 ed  65 0d 8f 3e 8b 03 25 5c 
|F.......e..>..%\|
000000c0  bb 50 dc e6 31 33 4a 88  8e 22 20 72 f0 11 71 d0 |.P..13J.." 
r..q.|
000000d0  59 c7 9d 20 f8 e2 f0 f8  75 5f ea 4a 57 d7 d7 9e  |Y.. 
....u_.JW...|
000000e0  c8 05 85 9d d7 cf c9 ab  53 de 11 6f bf d4 e3 b2 
|........S..o....|
000000f0  f6 5e 1e 46 5c 16 ae 46  a3 b5 9b f4 b9 ff ca 0c 
|.^.F\..F........|

Is the "mdadm --create" operation that I issued, incorrect? Have I done 
anything in error?

Unfortunately, I do not have a backup of the LUKS header, as I've 
personally never encountered a situation like this, nor was I privy to 
the knowledge that LUKS headers should be backed up at all.

If relevant, I did observe that /dev/sdb1 and /dev/sdg1 were showing 
"Offline_Uncorrectable" errors via SMART, but nothing that I could 
imagine would have contributed to this current predicament.

Is my data gone? Any and all insight are extremly welcomed and appreciated.

Warm regards,

-xar
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html