mdadm expanded 8 disk raid 6 fails in new server, 5 original devices show no md superblock

Großkreutz, Julian <Julian.Grosskreutz@xxxxxxxxxxxxxxx> · Sat, 11 Jan 2014 06:42:28 +0000

Dear all, dear Neil (thanks for pointing me to this list),

I am in desperate need of help. mdadm is fantastic work, and I have
relied on mdadm for years to run very stable server systems, never had
major problems I could not solve.

This time its different:

On a Centos 6.x (can't remember) initially in 2012:

parted to create GPT partitions on 5 Seagate drives 3TB each

Model: ATA ST3000DM001-9YN1 (scsi)
Disk /dev/sda: 5860533168s  # sd[bcde] identical
Sector size (logical/physical): 512B/4096B
Partition Table: gpt

Number  Start     End          Size         File system  Name     Flags
1      2048s     1953791s     1951744s     ext4                  boot
2      1955840s  5860532223s  5858576384s               primary  raid

I used an unknown mdadm version including unknown offset parameters for
4k alignment to create

/dev/sd[abcde]1 as /dev/md0 raid 1 for booting (1 GB)
/dev/sd[abcde]2 as /dev/md1 raid 6 for data (9 TB) lvm physical drive

Later added 3 more 3T identical Seagate drives with identical partition
layout, but later firmware.

Using likely a different newer version of mdadm I expanded RAID 6 by 2
drives and added 1 spare.

/dev/md1 was at 15 TB gross, 13 TB usable, expanded pv

Ran fine

Then I moved the 8 disks to a new server with an hba and backplane,
array did not start because mdadm did not find the superblocks on the
original 5 devices /dev/sd[abcde]2. Moving the disks back to the old
server the error did not vanish. Using a centos 6.3 livecd, I got the
following:

[root@livecd ~]# mdadm -Evvvvs /dev/sd[abcdefgh]2
mdadm: No md superblock detected on /dev/sda2.
mdadm: No md superblock detected on /dev/sdb2.
mdadm: No md superblock detected on /dev/sdc2.
mdadm: No md superblock detected on /dev/sdd2.
mdadm: No md superblock detected on /dev/sde2.

/dev/sdf2:
              Magic : a92b4efc
            Version : 1.2
        Feature Map : 0x0
         Array UUID : 32d82f84:fe30ac2e:f589aaef:bdd3e4c7
               Name : 1
      Creation Time : Wed Jul 31 18:24:38 2013
         Raid Level : raid6
       Raid Devices : 7

     Avail Dev Size : 5858314240 (2793.46 GiB 2999.46 GB)
         Array Size : 29285793280 (13964.55 GiB 14994.33 GB)
      Used Dev Size : 5857158656 (2792.91 GiB 2998.87 GB)
        Data Offset : 262144 sectors
       Super Offset : 8 sectors
              State : active
        Device UUID : d5a16cb2:ff41b9a5:cbbf12b7:3750026d

        Update Time : Mon Dec 16 01:16:26 2013
           Checksum : ee921c43 - correct
             Events : 327

             Layout : left-symmetric
         Chunk Size : 256K

      Device Role : Active device 5
      Array State : A.AAAAA ('A' == active, '.' == missing)

/dev/sdg2:
              Magic : a92b4efc
            Version : 1.2
        Feature Map : 0x0
         Array UUID : 32d82f84:fe30ac2e:f589aaef:bdd3e4c7
               Name : 1
      Creation Time : Wed Jul 31 18:24:38 2013
         Raid Level : raid6
       Raid Devices : 7

     Avail Dev Size : 5858314240 (2793.46 GiB 2999.46 GB)
         Array Size : 29285793280 (13964.55 GiB 14994.33 GB)
      Used Dev Size : 5857158656 (2792.91 GiB 2998.87 GB)
        Data Offset : 262144 sectors
       Super Offset : 8 sectors
              State : active
        Device UUID : a1e1e51b:d8912985:e51207a9:1d718292

        Update Time : Mon Dec 16 01:16:26 2013
           Checksum : 4ef01fe9 - correct
             Events : 327

             Layout : left-symmetric
         Chunk Size : 256K

        Device Role : Active device 6
        Array State : A.AAAAA ('A' == active, '.' == missing)

/dev/sdh2:
              Magic : a92b4efc
            Version : 1.2
        Feature Map : 0x0
         Array UUID : 32d82f84:fe30ac2e:f589aaef:bdd3e4c7
               Name : 1
      Creation Time : Wed Jul 31 18:24:38 2013
         Raid Level : raid6
       Raid Devices : 7

     Avail Dev Size : 5858314240 (2793.46 GiB 2999.46 GB)
         Array Size : 29285793280 (13964.55 GiB 14994.33 GB)
      Used Dev Size : 5857158656 (2792.91 GiB 2998.87 GB)
        Data Offset : 262144 sectors
       Super Offset : 8 sectors
              State : active
        Device UUID : 030cb9a7:76a48b3c:b3448369:fcf013e1

        Update Time : Mon Dec 16 01:16:26 2013
           Checksum : a1330e97 - correct
             Events : 327

             Layout : left-symmetric
         Chunk Size : 256K

       Device Role : spare
       Array State : A.AAAAA ('A' == active, '.' == missing)

I suspect that the superblock of the original 5 devices is at a
different location, possibly because they where created with a different
mdadm version, i.e. at the end of the partitions. Booting the drives
with the hba in IT (non-raid) mode on the new server may have introduced
an initialization on the first five drive at the end of the partitions
because I can hexdump something with "EFI PART" in the last 64 kb in all
8 partitions used for the raid 6, which may not have affected the 3
added drives which show metadata 1.2.

If any of You can help me sort this I would greatly appreciate it. I
guess I need the mdadm version where I can set the data offset
differently for each device, but it doesn't compile with an error in
sha1.c:

sha1.h:29:22: Fehler: ansidecl.h: Datei oder Verzeichnis nicht gefunden
(didn't find ansidecl.h, error in German)

What would be the best way to proceed? There is critical data on this
raid, not fully backed up.

(UPD'T)

Thanks for getting back.

Yes, it's bad, I know, also tweaking without keeping exact records of
versions and offsets.

I am, however, rather sure that nothing was written to the disks when I
plugged them into the NEW server, unless starting up a live cd causes an
automatic assemble attempt with an update to the superblocks. That I
cannot exclude.

What I did so far w/o writing to the disks

get non-00 data at the beginning of sda2:

dd if=/dev/sda skip=1955840 bs=512 count=10 | hexdump -C | grep [^00]

gives me

00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
|................|
        *
00001000  1e b5 54 51 20 4c 56 4d  32 20 78 5b 35 41 25 72  |..TQ LVM2
x[5A%r|
00001010  30 4e 2a 3e 01 00 00 00  00 10 00 00 00 00 00 00  |
0N*>............|
00001020  00 00 02 00 00 00 00 00  00 00 00 00 00 00 00 00
|................|
00001030  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
|................|
*
00001200  76 67 5f 6e 65 64 69 67  73 30 32 20 7b 0a 69 64  |vg_nedigs02
{.id|
00001210  20 3d 20 22 32 4c 62 48  71 64 2d 72 67 42 74 2d  | =
"2LbHqd-rgBt-|
00001220  45 4a 75 31 2d 32 52 36  31 2d 41 35 7a 74 2d 6e  |
EJu1-2R61-A5zt-n|
00001230  49 58 53 2d 66 79 4f 36  33 73 22 0a 73 65 71 6e  |
IXS-fyO63s".seqn|
00001240  6f 20 3d 20 37 0a 66 6f  72 6d 61 74 20 3d 20 22  |o =
7.format = "|
00001250  6c 76 6d 32 22 20 23 20  69 6e 66 6f 72 6d 61 74  |lvm2" #
informat|
(cont'd)

but on /dev/sdb

00000000  5f 80 00 00 5f 80 01 00  5f 80 02 00 5f 80 03 00  |
_..._..._..._...|
00000010  5f 80 04 00 5f 80 0c 00  5f 80 0d 00 00 00 00 00  |
_..._..._.......|
00000020  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
|................|
*
00001000  60 80 00 00 60 80 01 00  60 80 02 00 60 80 03 00  |
`...`...`...`...|
00001010  60 80 04 00 60 80 0c 00  60 80 0d 00 00 00 00 00  |
`...`...`.......|
00001020  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
|................|
*
00001400

so my initial guess that the data may start at 00001000 did not pan out.

Does anybody have an idea of how to reliably identify an mdadm
superblock in a hexdump of the drive ?

And second, have I got my numbers right ? In parted I see the block
count, and when I multiply 512 (not 4096!) with the total count I get 3
TB, so I think I have to use bs=512 in dd to get teh partition
boundaries correct.

As for the last state: one drive was set faulty, apparently, but the
spare had not been integrated. I may have gotten caught in a bug
described by Neil Brown, where on shutdown disk were wrongly reported,
and subsequently superblock information was overwritten.

I don't have NAS/SAN storage space to make identical copies of 5x3 TB,
but maybe I should buy 5 more disks and do a dd mirror so I have a
backup of the current state.

Again, any help / ideas welcome, especially building an mdadm version
with offset_data options ...

Julian

Universitätsklinikum Jena - Bachstrasse 18 - D-07743 Jena
Die gesetzlichen Pflichtangaben finden Sie unter http://www.uniklinikum-jena.de/Pflichtangaben.html
��.n��������+%������w��{.n�����{����w��ܨ}���Ơz�j:+v�����w����ޙ��&�)ߡ�a����z�ޗ���ݢj��w�f