Re: RAID 6 (containing LUKS dm-crypt) recovery help.

xar <xar@xxxxxxxxx> · Fri, 07 Nov 2014 06:40:24 -0500

On 11/7/2014 5:24 AM, Peter Grandi wrote:
[ ... ] The server experienced some sort of hardware event
that resulted in a mandatory restart of the server.
Details would be helpful: because if some problem happens the
standard advice is "reload from backups". If you want to
shortcut that to mostly-recovery context matters to figuring out
how and how safely.

[ ... ] completed the restart, the array looked like this,
"all spares":
md6 :
What happened to the other MD sets on the same server, if any?
Any damage? Because if those suffered no damage, there is the
possibility that the disk rack backplane holding the members of
'md6' got damaged, or the specific host adapter; and that the MD
set content is entirely undamaged and the funny stuff being read
is a transmission problem.

inactive sdl1[7](S) sdh1[13](S) sdg1[14](S) sdk1[11](S)
sdj1[10](S) sdi1[6](S) sdd1[2](S) sdf1[8](S) sdb1[12](S)
sde1[3](S) sdc1[15](S) 21488638704 blocks super 1.2
"Clever" people hide details as possible, and go to such lengths
as to actually remove vital information as for example what
literally follows "super 1.2" here. Because actual quotes are
too "insipid" and paraphrases are more "challenging":

The mdadm array has the following characteristics: RAID level:
6 Chunk size: 256k Version: 1.2 Number of devices: 11
How do you know? Is this part of your records or from actual
output of 'mdadm --examine'?

But assuming the above is somewhat reliable there is an
"interesting" situation: in "21488638704 blocks" the number
21,488,638,704 is not a whole multiple of 9:

   $ factor 21488638704
   21488638704: 2 2 2 2 3 13 1801 19121

All attempts to assemble the array continued to result in the "all
spare" condition (output above). Thinking that the metadata had been
corrupted somehow,
Apparently without ever trying 'mdadm --detail /dev/md6' or
'mdadm --examine /dev/sd...' as per:

   https://raid.wiki.kernel.org/index.php/RAID_Recovery

I set out to recreate the array.
Quite "brave":

   https://raid.wiki.kernel.org/index.php/RAID_Recovery
   «Restore array by recreating (after multiple device failure)
   Recreating should be considered a *last* resort, only to be
   used when everything else fails.
   People getting this wrong is one of the primary reasons people
   lose data. It is very commonly used way too early in the fault
   finding process. You have been warned!»

The following is the dev_number fields from the metadata,
before I attempted to recreate the array: for i in /dev/sd?1;
do echo -n $i '' ; dd 2> /dev/null if=$i bs=1 count=4
skip=4256 | od -D | head -n1; done: I used the following to
extract the index position of each device on a device I
suspected wasn't corrupted (for the record, they all returned
the same data): [ ... ]
It is very "astute" indeed to use 'dd' instead of 'mdadm
--examine'.  For example it "encourages" people who might want
to help to spend some extra time checking your offsets, that
"teaches" them.

[ ... ]
      Number   Major   Minor   RaidDevice State
        12       8       17        0      active sync   /dev/sdb1
         3       8       65        1      active sync   /dev/sde1
         2       8       49        2      active sync   /dev/sdd1
         8       8       81        3      active sync   /dev/sdf1
         6       8      129        4      active sync   /dev/sdi1
         7       8      177        5      active sync   /dev/sdl1
         6       0        0        6      removed
        10       8      145        7      active sync   /dev/sdj1
        11       8      161        8      active sync   /dev/sdk1
        13       8      113        9      active sync   /dev/sdh1
        14       8       97       10      active sync   /dev/sdg1
The dev_numbers and index position information in conjunction
with the historic data (directly above) seemed to indicate
that the proper recreation order and command would be the
following:
mdadm --create /dev/md6 --assume-clean --level=6
--raid-devices=11 --metadata=1.2 --chunk=256 /dev/sdb1
/dev/sde1 /dev/sdd1 /dev/sdf1 /dev/sdi1 /dev/sdl1 /dev/sdc1
/dev/sdj1 /dev/sdk1 /dev/sdh1 /dev/sdg1
The main consequence of the above is that the original MD member
metadata blocks are no longer available unless something like
this has been done:

   https://raid.wiki.kernel.org/index.php/RAID_Recovery
   «Preserving RAID superblock information
   One of the most useful things to do first, when trying to
   recover a broken RAID array, is to preserve the information
   reported in the RAID superblocks on each device at the time
   the array went down (and before you start trying to recreate
   the array). Something like
     mdadm --examine /dev/sd[bcdefghijklmn]1 >> raid.status»

If you went to the lengths to write 'dd' expressions, you might
as well have saved the output of '--examine'. Perhaps you did,
but if you did not attach that output to your request for help
it would be rather "stunning".

[ ... ]

Is the "mdadm --create" operation that I issued, incorrect?
Have I done anything in error?
There is something strange: what you report being the output of
'--detail' from July:

       Array Size : 17581609728 (16767.13 GiB 18003.57 GB)
    Used Dev Size : 1953512192 (1863.01 GiB 2000.40 GB)

and the output of '--detail' for the re-created:

       Array Size : 17580439296 (16766.01 GiB 18002.37 GB)
    Used Dev Size : 1953382144 (1862.89 GiB 2000.26 GB)

Both numbers don't match. They are *slightly* different. In
particular it is rather strange that the "Used Dev Size" is
different. How is that possible? Have the disks shrunk a little
in the meantime? :-)

It is intriguing that the difference between 1953512192 and
1953382144 is 1024*127KiB or 1024*254 sectors.

Also I have noticed that the MD set is composed of disk of 3
different models (ST2000DL003-9VT1, ST2000DM001-1CH1,
ST32000542AS)...

Is my data gone? Any and all insight are extremly welcomed and
appreciated.
Whether your data is gone depends on what kind of hardware issue
you have had, and to the consequence of the "brave" '--create'
above. But also how the MD set was setup, e.g. with members of
slightly different sizes. The inconsistencies in the reported
numbers are "confusing".
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Hello Peter,

Thank you very much for your thorough and responsive reply. I will do my 
best to clarify where possible.

[ ... ] The server experienced some sort of hardware event
that resulted in a mandatory restart of the server.
Details would be helpful: because if some problem happens the
standard advice is "reload from backups". If you want to
shortcut that to mostly-recovery context matters to figuring out
how and how safely.

Regarding the nature of the hardware event, unfortunately details are in 
short supply: the server became unresponsive over the console when 
attempting to connect via SSH, prompting a restart of the server.  I 
don't believe there was evidence of a power drop or loss. No server or 
kernel logs are available for review.

[ ... ] completed the restart, the array looked like this,
"all spares":
md6 :
What happened to the other MD sets on the same server, if any?
Any damage? Because if those suffered no damage, there is the
possibility that the disk rack backplane holding the members of
'md6' got damaged, or the specific host adapter; and that the MD
set content is entirely undamaged and the funny stuff being read
is a transmission problem.

"md6" is the only MD set on the server, so name as it is has a 
raid-level 6. Sorry for any confusion.

The mdadm array has the following characteristics: RAID level:
6 Chunk size: 256k Version: 1.2 Number of devices: 11
How do you know? Is this part of your records or from actual
output of 'mdadm --examine'?
All attempts to assemble the array continued to result in the "all
spare" condition (output above). Thinking that the metadata had been
corrupted somehow,
Apparently without ever trying 'mdadm --detail /dev/md6' or
'mdadm --examine /dev/sd...' as per:
If you went to the lengths to write 'dd' expressions, you might
as well have saved the output of '--examine'. Perhaps you did,
but if you did not attach that output to your request for help
it would be rather "stunning".

Yes, I saved the -E/--examine information, "just in case". :-)

Before performing a re-create of the array, I did, in fact, print the 
contents (-E, --examine) of the metadata stored on each device:

# cat mdadm.e.bak

/dev/sdb1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 6958450b:e4dfa2f3:259ff733:f343a584
           Name : server:6  (local to host server)
  Creation Time : Sat Apr 23 06:22:23 2011
     Raid Level : raid6
   Raid Devices : 11

 Avail Dev Size : 3907024896 (1863.01 GiB 2000.40 GB)
     Array Size : 17581609728 (16767.13 GiB 18003.57 GB)
  Used Dev Size : 3907024384 (1863.01 GiB 2000.40 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : active
    Device UUID : 12a56302:5b436263:1b841be2:fccd07ed

    Update Time : Fri Nov  7 00:37:26 2014
       Checksum : d7063845 - correct
         Events : 667126

         Layout : left-symmetric
     Chunk Size : 256K

   Device Role : Active device 0
   Array State : A.A.AA.AAA. ('A' == active, '.' == missing)

/dev/sdc1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 6958450b:e4dfa2f3:259ff733:f343a584
           Name : server:6  (local to host server)
  Creation Time : Sat Apr 23 06:22:23 2011
     Raid Level : raid6
   Raid Devices : 11

 Avail Dev Size : 3907024896 (1863.01 GiB 2000.40 GB)
     Array Size : 17581609728 (16767.13 GiB 18003.57 GB)
  Used Dev Size : 3907024384 (1863.01 GiB 2000.40 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : 0416e499:16488db2:5473119d:1a0c8141

    Update Time : Sun Nov  2 12:24:42 2014
       Checksum : cd22e98b - correct
         Events : 667122

         Layout : left-symmetric
     Chunk Size : 256K

   Device Role : Active device 6
   Array State : A.A.AAAAAA. ('A' == active, '.' == missing)

/dev/sdd1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 6958450b:e4dfa2f3:259ff733:f343a584
           Name : server:6  (local to host server)
  Creation Time : Sat Apr 23 06:22:23 2011
     Raid Level : raid6
   Raid Devices : 11

 Avail Dev Size : 3907024896 (1863.01 GiB 2000.40 GB)
     Array Size : 17581609728 (16767.13 GiB 18003.57 GB)
  Used Dev Size : 3907024384 (1863.01 GiB 2000.40 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : active
    Device UUID : 56f35811:d62afc50:a893a3af:10f01367

    Update Time : Fri Nov  7 00:37:26 2014
       Checksum : 1b299f9b - correct
         Events : 667126

         Layout : left-symmetric
     Chunk Size : 256K

   Device Role : Active device 2
   Array State : A.A.AA.AAA. ('A' == active, '.' == missing)

/dev/sde1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 6958450b:e4dfa2f3:259ff733:f343a584
           Name : server:6  (local to host server)
  Creation Time : Sat Apr 23 06:22:23 2011
     Raid Level : raid6
   Raid Devices : 11

 Avail Dev Size : 3907024896 (1863.01 GiB 2000.40 GB)
     Array Size : 17581609728 (16767.13 GiB 18003.57 GB)
  Used Dev Size : 3907024384 (1863.01 GiB 2000.40 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : 63f4d908:16f38b7f:ebd9a1d7:0f186e56

    Update Time : Sun Nov  2 10:23:32 2014
       Checksum : 5896c904 - correct
         Events : 667118

         Layout : left-symmetric
     Chunk Size : 256K

   Device Role : Active device 1
   Array State : AAAAAAAAAA. ('A' == active, '.' == missing)

/dev/sdf1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 6958450b:e4dfa2f3:259ff733:f343a584
           Name : server:6  (local to host server)
  Creation Time : Sat Apr 23 06:22:23 2011
     Raid Level : raid6
   Raid Devices : 11

 Avail Dev Size : 3907026672 (1863.02 GiB 2000.40 GB)
     Array Size : 17581609728 (16767.13 GiB 18003.57 GB)
  Used Dev Size : 3907024384 (1863.01 GiB 2000.40 GB)
    Data Offset : 272 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : ee4ac68b:2152463c:b0d72a12:4da24489

    Update Time : Sun Nov  2 10:23:32 2014
       Checksum : 59d06a2 - correct
         Events : 667118

         Layout : left-symmetric
     Chunk Size : 256K

   Device Role : Active device 3
   Array State : AAAAAAAAAA. ('A' == active, '.' == missing)

/dev/sdg1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 6958450b:e4dfa2f3:259ff733:f343a584
           Name : server:6  (local to host server)
  Creation Time : Sat Apr 23 06:22:23 2011
     Raid Level : raid6
   Raid Devices : 11

 Avail Dev Size : 3907024896 (1863.01 GiB 2000.40 GB)
     Array Size : 17581609728 (16767.13 GiB 18003.57 GB)
  Used Dev Size : 3907024384 (1863.01 GiB 2000.40 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : 72ee0230:51b42c7a:3327c930:302be14e

    Update Time : Sun Nov  2 08:35:01 2014
       Checksum : cbfacb4a - correct
         Events : 667100

         Layout : left-symmetric
     Chunk Size : 256K

   Device Role : Active device 10
   Array State : .AAAAAAAAAA ('A' == active, '.' == missing)

/dev/sdh1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 6958450b:e4dfa2f3:259ff733:f343a584
           Name : server:6  (local to host server)
  Creation Time : Sat Apr 23 06:22:23 2011
     Raid Level : raid6
   Raid Devices : 11

 Avail Dev Size : 3907024896 (1863.01 GiB 2000.40 GB)
     Array Size : 17581609728 (16767.13 GiB 18003.57 GB)
  Used Dev Size : 3907024384 (1863.01 GiB 2000.40 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : active
    Device UUID : 429cfff7:ecadc967:40f73261:bef9656e

    Update Time : Fri Nov  7 00:37:26 2014
       Checksum : d17f38ee - correct
         Events : 667126

         Layout : left-symmetric
     Chunk Size : 256K

   Device Role : Active device 9
   Array State : A.A.AA.AAA. ('A' == active, '.' == missing)

/dev/sdi1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 6958450b:e4dfa2f3:259ff733:f343a584
           Name : server:6  (local to host server)
  Creation Time : Sat Apr 23 06:22:23 2011
     Raid Level : raid6
   Raid Devices : 11

 Avail Dev Size : 3907024896 (1863.01 GiB 2000.40 GB)
     Array Size : 17581609728 (16767.13 GiB 18003.57 GB)
  Used Dev Size : 3907024384 (1863.01 GiB 2000.40 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : active
    Device UUID : 6dea792a:f1117c0c:ac16951c:a8b61783

    Update Time : Fri Nov  7 00:37:26 2014
       Checksum : 78bfc76c - correct
         Events : 667126

         Layout : left-symmetric
     Chunk Size : 256K

   Device Role : Active device 4
   Array State : A.A.AA.AAA. ('A' == active, '.' == missing)

/dev/sdj1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 6958450b:e4dfa2f3:259ff733:f343a584
           Name : server:6  (local to host server)
  Creation Time : Sat Apr 23 06:22:23 2011
     Raid Level : raid6
   Raid Devices : 11

 Avail Dev Size : 3907026672 (1863.02 GiB 2000.40 GB)
     Array Size : 17581609728 (16767.13 GiB 18003.57 GB)
  Used Dev Size : 3907024384 (1863.01 GiB 2000.40 GB)
    Data Offset : 272 sectors
   Super Offset : 8 sectors
          State : active
    Device UUID : 4b37d852:2236e8e6:15c52c77:4214f7de

    Update Time : Fri Nov  7 00:37:26 2014
       Checksum : 32014484 - correct
         Events : 667126

         Layout : left-symmetric
     Chunk Size : 256K

   Device Role : Active device 7
   Array State : A.A.AA.AAA. ('A' == active, '.' == missing)

/dev/sdk1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 6958450b:e4dfa2f3:259ff733:f343a584
           Name : server:6  (local to host server)
  Creation Time : Sat Apr 23 06:22:23 2011
     Raid Level : raid6
   Raid Devices : 11

 Avail Dev Size : 3907024896 (1863.01 GiB 2000.40 GB)
     Array Size : 17581609728 (16767.13 GiB 18003.57 GB)
  Used Dev Size : 3907024384 (1863.01 GiB 2000.40 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : active
    Device UUID : aa149905:9cd207c4:4bb4c244:3f502348

    Update Time : Fri Nov  7 00:37:26 2014
       Checksum : f8a3e98f - correct
         Events : 667126

         Layout : left-symmetric
     Chunk Size : 256K

   Device Role : Active device 8
   Array State : A.A.AA.AAA. ('A' == active, '.' == missing)

/dev/sdl1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 6958450b:e4dfa2f3:259ff733:f343a584
           Name : server:6  (local to host server)
  Creation Time : Sat Apr 23 06:22:23 2011
     Raid Level : raid6
   Raid Devices : 11

 Avail Dev Size : 3907024896 (1863.01 GiB 2000.40 GB)
     Array Size : 17581609728 (16767.13 GiB 18003.57 GB)
  Used Dev Size : 3907024384 (1863.01 GiB 2000.40 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : active
    Device UUID : 59a2393b:27209cc2:1f6fa576:5ed6e2a7

    Update Time : Fri Nov  7 00:37:26 2014
       Checksum : be7b7d99 - correct
         Events : 667126

         Layout : left-symmetric
     Chunk Size : 256K

   Device Role : Active device 5
   Array State : A.A.AA.AAA. ('A' == active, '.' == missing)

There is something strange: what you report being the output of
'--detail' from July:

       Array Size : 17581609728 (16767.13 GiB 18003.57 GB)
    Used Dev Size : 1953512192 (1863.01 GiB 2000.40 GB)

and the output of '--detail' for the re-created:

       Array Size : 17580439296 (16766.01 GiB 18002.37 GB)
    Used Dev Size : 1953382144 (1862.89 GiB 2000.26 GB)

Both numbers don't match. They are*slightly*  different. In
particular it is rather strange that the "Used Dev Size" is
different. How is that possible? Have the disks shrunk a little
in the meantime?

Peter, that is an excellent observation! Indeed, the above -E/--examine 
data confirms that some disks have a 272 offset, while most others have 
a 2048 offset, for example:

/dev/sdj1:
Data Offset : 272 sectors
Super Offset : 8 sectors

/dev/sdk1:
Data Offset: 2048 sectors
Super Offset: 8 sectors

The current breakdown:

# grep "272 sectors" mdadm.e.bak | wc -l
2

# grep "2048 sectors" mdadm.e.bak | wc -l
9

Therefore, based on the backup -E/--examine data, two out of the 11 
total disks have an offset of 272, while the remaining nine are using 2048.

Could this explain the discrepancy you observed?

For the record, every disk is GUID GPT partitioned, with the same sector 
size for all partitions. All partitions are identical in sector size, 
regardless of the Seagate HDD disk model.

Here is a sample of the partition data:

# parted /dev/sdb unit s print | grep -A1 Number
Number  Start  End          Size         File system  Name     Flags
 1      2048s  3907028991s  3907026944s               primary  raid

# parted /dev/sdc unit s print | grep -A1 Number
Number  Start  End          Size         File system  Name     Flags
 1      2048s  3907028991s  3907026944s               primary  raid

# parted /dev/sdd unit s print | grep -A1 Number
Number  Start  End          Size         File system  Name     Flags
 1      2048s  3907028991s  3907026944s               primary  raid

# parted /dev/sde unit s print | grep -A1 Number
Number  Start  End          Size         File system  Name     Flags
 1      2048s  3907028991s  3907026944s               primary  raid

# parted /dev/sdf unit s print | grep -A1 Number
Number  Start  End          Size         File system  Name     Flags
 1      2048s  3907028991s  3907026944s               primary  raid

# parted /dev/sdg unit s print | grep -A1 Number
Number  Start  End          Size         File system  Name     Flags
 1      2048s  3907028991s  3907026944s               primary  raid

# parted /dev/sdh unit s print | grep -A1 Number
Number  Start  End          Size         File system  Name     Flags
 1      2048s  3907028991s  3907026944s  ntfs         primary  raid

# parted /dev/sdi unit s print | grep -A1 Number
Number  Start  End          Size         File system  Name     Flags
 1      2048s  3907028991s  3907026944s               primary  raid

# parted /dev/sdj unit s print | grep -A1 Number
Number  Start  End          Size         File system  Name     Flags
 1      2048s  3907028991s  3907026944s               primary  raid

# parted /dev/sdk unit s print | grep -A1 Number
Number  Start  End          Size         File system  Name     Flags
 1      2048s  3907028991s  3907026944s               primary  raid

# parted /dev/sdl unit s print | grep -A1 Number
Number  Start  End          Size         File system  Name     Flags
 1      2048s  3907028991s  3907026944s               primary  raid

My only explanation is that the cause of this offset discrepancy may 
have something to do with the age of the array.  The array had an 
original creation time of year 2011.

This server was originally running Ubuntu 10.04 LTS (I believe) before 
being eventually upgraded to 12.04 LTS--although the server has been 
running healthy on 12.04 LTS for several years without issue(s).

If memory serves, the older version of mdadm that shipped with 10.04 LTS 
did a myriad of things differently regarding the location of the 
superblock(s), offset(s), etc. but I can't say for sure. Did older mdadm 
builds on 12.04 LTS ever use offsets of 272, rather than 2048?

Perhaps Neil could comment? :-)

I do hope that supplying the -E/--examine information will be useful to 
you all. What's the next step?

Thank you for all your efforts and for your keen eyes.

-xar
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html