Re: RAID1 assembled broken array

Doug Ledford <dledford@xxxxxxxxxx> · Tue, 15 Sep 2009 16:00:30 -0400

On Sep 15, 2009, at 3:22 PM, Matthias Urlichs wrote:
I had a somewhat strange error today.

One of my servers has a RAID1 array. Two partitions at the end of the
disk; the RAID superblocks are at the end of the partition.

After a hard reboot today, one of the disks managed to not have its
partition table scanned correctly, most probably because the disk was
hung and the ("intelligent") controller got confused about it. After  
the
initial scan, however, it came up correctly.

This error caused mdadm to "successfully" build a RAID1 from /dev/sda3
and /dev/sdb (instead of /dev/sdb3). Needless to say, the resulting
volume was somewhat unuseable. To say the least.

My server's mdadm.conf has a 'DEVICE=partitions' line. I suppose that
replacing these with a pattern that explicitly only matches  
partitions,
not disks, would make the problem go away, and that the lesson from
today's disaster recovery effort is to always explicitly list the  
allowed
partition names, instead of being lazy and using 'DEVICE=partitions'.

Wrong lesson.  The correct lesson to gather from this is to prefer  
version 1.1 or 1.2 superblocks wherever possible.  Superblocks at the  
beginning of the device disappear when there is no partition table,  
superblocks at the end can be confused for superblocks belonging to  
the whole device when there is no partition table.

--

Doug Ledford <dledford@xxxxxxxxxx>

GPG KeyID: CFBFF194
http://people.redhat.com/dledford

InfiniBand Specific RPMS
http://people.redhat.com/dledford/Infiniband

Attachment:
PGP.sig

Description: This is a digitally signed message part