RAID6 Reshape Gone Awry

Flynn <flynn@xxxxxxxxxxx> · Fri, 03 Aug 2012 01:27:29 -0400

Apologies in advance if this is the wrong place for this...

I'd been running a RAID6 with 5 1.5TB drives on CentOS 5.ancient for quite 
awhile.  Last week, I wanted to add a drive, and promptly ran into issues 
with my CentOS mdadm being unable to do the obvious thing with mdadm 
--grow, so I upgraded to Ubuntu 12.04 LTS.

All was well, briefly.

My RAID6 is actually a little bit odd in that the drives are split into 10 
partitions.  All the partition 5's are a RAID6; all the partition 6's are a 
RAID6; etc.  There's an LVM layer that sits on top.  This turned out to be 
handy when I changed the size of the drives in the RAID, so I stuck with it.

This means I have to actually do 10 mdadm --grow commands.  My original 
cunning plan was to issue one, wait for that partition to reshape, issue 
another, etc.  I scripted this -- and made a mistake, so the 'wait' step 
didn't happen.  I ended up with all ten partitions grown to 6 drives, and 
most of them marked pending reshape.

Again, all was well.

But you can guess what happened next, can't you?  That's right, the machine 
crashed.  On reboot, the reshape that had been underway at the time 
(partition 7) picked up and carried on just fine.  But partition 8 didn't. 
Nor anything after.

So at this point I have partitions 5, 6, and 7 happy; 8 - 14 are marked 
inactive.  The initial mdadm --grow reported that it passed the critical 
section long before the machine crashed, for all partitions.  mdadm 
--examine on the individual drives shows that each of these partitions 
believes that they are part of a RAID6 with 6 drives, correct checksums 
everywhere, event counters the same, but:

1)  Trying e.g.

   sudo mdadm --assemble --force /dev/md8 /dev/sd[bdefgh]8

says

mdadm: Failed to restore critical section for reshape, sorry.
     Possibly you needed to specify the --backup-file

Given that I didn't specify --backup-file to the initial mdadm --grow, this 
seems... perhaps not entirely helpful.

2)  In a working partition, I always see the 'this' entry in mdadm 
--examine's output matching up with the drive being read (e.g. /dev/sde5 
will say 'this' is /dev/sde5).  In a _non_-working partition, that's not 
the case (e.g. /dev/sdb7 says 'this' is /dev/sdg7).

3)  Finally, all the working partitions show that their superblocks are 
version 0.90.00, but all the non-working partitions show 0.91.00.

I've been beating my head on this for awhile, Googling around, learning a 
fair amount but not getting very far.  In theory there's nothing on this 
array that's irreplaceable (it's meant as a backup, not a primary store) 
but, well, it'd be nice to repair it rather than blowing it away.

This is mdadm 3.2.3.  Suggestions very welcome.  I can provide output to 
whatever people'd like to see, of course, but figured I'd wait for 
requests...

Thanks!

-- Flynn

--
Never let your sense of morals get in the way of doing what's right.
                                                           (Isaac Asimov)

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html