Re: mdadm expanded 8 disk raid 6 fails in new server, 5 original devices show no md superblock

Großkreutz, Julian <Julian.Grosskreutz@xxxxxxxxxxxxxxx> · Tue, 14 Jan 2014 10:31:53 +0000

Hi Phil,

thanks again for bearing with me.

> >
> >>> Model: ATA ST3000DM001-9YN1 (scsi)
>
> Aside: This model looks familiar.  I'm pretty sure these drives are
> desktop models that lack scterc support.  Meaning they are *not*
> generally suitable for raid duty.  Search the archives for combinations
> of "timeout mismatch", "scterc", "URE", and "scrub" for a full
> explanation.  If I've guessed correctly, you *must* use the driver
> timeout work-around before proceeding.
>

Yes I did, and smartctl showed no significant problems. The 10 year old
server (supermicro enterprise grade dual Xeon with 8 GB ECC RAM) had
started to create problems early January which is why I wanted to move
the drives to a new server in the first place, to then transfer the data
to a new set of enterprise grade disks. I had checked the memory and the
disks in a burn in for several days including time out and power saving
before I set up the raid 2012/2013, and did not have any issues then.

One of the reasons I tend use mdadm is that I am able to utilize
existing hardware to create bridging solutions until money comes in for
better hardware, and moving an mdadm raid has so far never created a
serious problem.

> > So attached You will find hexdumps of 64k of /sda/sd[a-h]2 at sector 0
> > and 262144 which shows the superblock 1.2 on sd[fgh]2, not on sd[a-e]2,
> > but may help to identify data_offset; I suspect it is 2048 on sd[a-e]2
> > and 262144 on sd[fgh]2.
> >
>
> Jackpot!  LVM2 embedded backup data at the correct location for mdadm
> data offset == 262144.  And on /dev/sda2, which is the only device that
> should have it (first device in the raid).
>
> From /dev/sda2 @ 262144:
>
> > 00001200  76 67 5f 6e 65 64 69 67  73 30 32 20 5d 0a 69 64  |vg_nedigs02 ].id|
> > 00001210  20 3d 20 22 32 4c 62 48  71 64 2d 72 67 42 9f 6e  | = "2LbHqd-rgB.n|
> > 00001220  45 4a 75 31 2d 32 52 36  31 2d 41 35 f5 75 2d 6e  |EJu1-2R61-A5.u-n|
> > 00001230  49 58 53 2d 66 79 4f 36  33 73 22 0a 73 65 3a 01  |IXS-fyO63s".se:.|
> > 00001240  6f 20 3d 20 33 36 0a 66  6f 72 6d 61 ca 24 3d 20  |o = 36.forma.$= |
> > 00001250  22 6c 76 6d 32 22 20 23  20 69 6e 66 6f 72 6b ac  |"lvm2" # infork.|
> ...
> > 00001a70  20 31 33 37 35 32 38 37  39 37 39 09 23 20 d2 32  | 1375287979.# .2|
> > 00001a80  64 20 4a 75 6c 20 33 31  20 31 38 3a af 37 3a 31  |d Jul 31 18:.7:1|
> > 00001a90  39 20 32 30 31 33 0a 0a  00 00 00 00 00 00 ee 12  |9 2013..........|
>
> Note the creation date/time at the end (with a corrupted byte):
>
> Jul 31 18:?7:19 2013
>
> There are other corrupted bytes scattered around.  I'd be worried about
> the RAM in this machine.  Since you are using non-enterprise drives, I'm
> going to go out on a limb here and guess that the server doesn't have
> ECC ram...
see above
> Consider performing an extended memcheck run to see what's going on.
> Maybe move the entire stack of disks to another server.
>
Thats what I did initially, moved it back because it failed, now will
move again into the new server before proceeding.

> Based on the signature discovered above, we should be able to --create
> --assume-clean with the modern default data offset.  We know the
> following device roles:
>
> /dev/sda2 == 0
> /dev/sdf2 == 5
> /dev/sdg2 == 6
> /dev/sdh2 == spare
>
> So /dev/sdh2 should be left out until the array is working.
>
> Please re-execute the "mdadm -E" reports for /dev/sd[fgh]2 and show them
> uncut.  (Use the lasted mdadm.)  That should fill in the likely device
> order of the remaining drives.

[root@livecd mnt]# mdadm -E /dev/sd[fgh]2

/dev/sdf2:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 32d82f84:fe30ac2e:f589aaef:bdd3e4c7
           Name : 1
  Creation Time : Wed Jul 31 18:24:38 2013
     Raid Level : raid6
   Raid Devices : 7

 Avail Dev Size : 5858314240 (2793.46 GiB 2999.46 GB)
     Array Size : 29285793280 (13964.55 GiB 14994.33 GB)
  Used Dev Size : 5857158656 (2792.91 GiB 2998.87 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
          State : active
    Device UUID : d5a16cb2:ff41b9a5:cbbf12b7:3750026d

    Update Time : Mon Dec 16 01:16:26 2013
       Checksum : ee921c43 - correct
         Events : 327

         Layout : left-symmetric
     Chunk Size : 256K

   Device Role : Active device 5
   Array State : A.AAAAA ('A' == active, '.' == missing)
/dev/sdg2:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 32d82f84:fe30ac2e:f589aaef:bdd3e4c7
           Name : 1
  Creation Time : Wed Jul 31 18:24:38 2013
     Raid Level : raid6
   Raid Devices : 7

 Avail Dev Size : 5858314240 (2793.46 GiB 2999.46 GB)
     Array Size : 29285793280 (13964.55 GiB 14994.33 GB)
  Used Dev Size : 5857158656 (2792.91 GiB 2998.87 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
          State : active
    Device UUID : a1e1e51b:d8912985:e51207a9:1d718292

    Update Time : Mon Dec 16 01:16:26 2013
       Checksum : 4ef01fe9 - correct
         Events : 327

         Layout : left-symmetric
     Chunk Size : 256K

   Device Role : Active device 6
   Array State : A.AAAAA ('A' == active, '.' == missing)

/dev/sdh2:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 32d82f84:fe30ac2e:f589aaef:bdd3e4c7
           Name : 1
  Creation Time : Wed Jul 31 18:24:38 2013
     Raid Level : raid6
   Raid Devices : 7

 Avail Dev Size : 5858314240 (2793.46 GiB 2999.46 GB)
     Array Size : 29285793280 (13964.55 GiB 14994.33 GB)
  Used Dev Size : 5857158656 (2792.91 GiB 2998.87 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
          State : active
    Device UUID : 030cb9a7:76a48b3c:b3448369:fcf013e1

    Update Time : Mon Dec 16 01:16:26 2013
       Checksum : a1330e97 - correct
         Events : 327

         Layout : left-symmetric
     Chunk Size : 256K

   Device Role : spare
   Array State : A.AAAAA ('A' == active, '.' == missing)

> Also, it is important that you document which drive serial numbers are
> currently occupying the different device names.  An excerpt from "ls -l
> /dev/disk/by-id/" would do.

scsi-SATA_ST3000DM001-9YN_S1F026VJ -> ../../sda
scsi-SATA_ST3000DM001-9YN_W1F0TB3C -> ../../sdb
scsi-SATA_ST3000DM001-9YN_S1F04KAK -> ../../sdc
scsi-SATA_ST3000DM001-9YN_W1F0RWJY -> ../../sdd
scsi-SATA_ST3000DM001-9YN_S1F08N7Q -> ../../sde
scsi-SATA_ST3000DM001-9YN_Z1F1F3TC -> ../../sdf
scsi-SATA_ST3000DM001-9YN_W1F1ZZ9T -> ../../sdg
scsi-SATA_ST3000DM001-9YN_Z1F1X0AC -> ../../sdh

> I have to admit that I'm very concerned about your corrupted LVM
> signature at offset 262144.  LVM probably won't recognize your PV once
> the array is assembled correctly, making it difficult to
> non-destructively test the filesystems on your logical volumes.  You may
> have to duplicate your disks onto new ones so that an LVM restore can be
> safely attempted.

> Do *not* buy desktop drives!  You need raid-capable drives like the WD
> Red at the least.

;-) Already ordered WD reds, will be delivered any time now. I guess I
have now reached that level after years of making do with very limited
budgets.

I am a bit more relaxed now because I found that a scheduled transfer of
the data to the university tape robot had completed before christmas. So
this local archive mirror is (luckily) not critical. I still want to
understand whether all this is just a result of shaky hardware, or an
mdadm (misuse) issue. Losing (all superblocks on) five drives in a large
software raid 6 instead of bytes is not something I would like to repeat
any time soon by ie. mishandling mdadm.

We have then

Wed Jul 31 18:24:38 2013 on sdf-h2 for creation of the raid6 and
wed Jul 31 18:?7:19 2013 for creation of the lvm group

could well be.

So I will move the disks to the new server, make 1:1 copies to new
drives and then attempt an assembly using --assume-clean in which
order ?

Thanks so much, I have learned a lot already.

Regards

Julian

Universitätsklinikum Jena - Bachstrasse 18 - D-07743 Jena
Die gesetzlichen Pflichtangaben finden Sie unter http://www.uniklinikum-jena.de/Pflichtangaben.html
��.n��������+%������w��{.n�����{����w��ܨ}���Ơz�j:+v�����w����ޙ��&�)ߡ�a����z�ޗ���ݢj��w�f