Re: md metadata nightmare

NeilBrown <neilb@xxxxxxx> · Wed, 23 Nov 2011 11:47:43 +1100

On Tue, 22 Nov 2011 18:05:21 -0600 Kenneth Emerson
<kenneth.emerson@xxxxxxxxx> wrote:

> NOTE: I have set the linux-raid flag on all of the partitions in the
> GPT. I think I have read in the linux-raid archives that this is not
> recommended. Could this have had an affect on what transpired?

Net recommended, but also totally ineffective.  The Linux-RAID partition type
is only recognised in MS-DOS partition tables.

> 
> So my question is:
> 
> Is there a way, short of backing up the data, completely rebuilding
> the arrays, and restoring the data (a real PIA) to rewrite the
> metadata given the existing array configurations in the running
> system?  Also, is there an explanation as to why the metadata seems so
> screwed up that the arrays cannot be assembled automatically by the
> kernel?

There appear to be two problems here.  Both could be resolved by converting to
v1.0 metadata.  But there are other approaches.  And converting to v1.0 is
not trivial (not enough developers to work on all the tasks!).

One problem is the final partition on at least some of your disks is at a 64K
alignment.  This means that the superblock looks valid for both the whole
device and for the partition.
You can confirm this by running
  mdadm --examine /dev/sda
  mdadm --examine /dev/sda4

(ditto for b,c,d,e,...)

The "sdX4" should show a superblock.  The 'sdX' should not.
I think it will show exactly the same superblock.  It could show a different
superblock... that would be interesting.

If I am correct here then you can "fix" this by changing mdadm.conf to read:

DEVICES /dev/sda? /dev/sdb? /dev/sdc? /dev/sdd? /dev/sde?
or 
DEVICES /dev/sd[abcde][1-4]

or similar.  i.e. tell it to ignore the whole devices.

The other problem is that v0.90 metadata isn't good with very large devices.
It has 32bits to record kilobytes per device.
This show allow 4TB per device but due to a bug (relating to sign bits) it
only works well with 2TB per device.  This bug was introduced in 2.6.29 and
removed in 3.1.

So if you can run a 3.1.2 kernel, that would be best.

You could convert to v1.0 if you want.  You only need to do this for the last
partition (sdX4).

Assuming nothing has changed since the "--detail" output you provided, you
should:

 mdadm -S /dev/md3
 mdadm -C /dev/md3 --metadata=1.0 --chunk=64k --level=6 --raid-devices=5 \
      missing /dev/sdb4 /dev/sdc4 /dev/sda4 /dev/sdd4 \
      --assume-clean

The order of the disks is import.  You should compare it with the output
of "mdadm --detail" before you start to ensure that it is correct and I have
made any typos.  You should of course check the rest as well.
After doing this (and possibly before) you should 'fsck' to ensure the
transition was successful.  If anything goes wrong, ask before risking
further breakage.

Good luck.

NeilBrown

Attachment:
signature.asc

Description: PGP signature