RE: RAID-6 aborted reshape

Colt Boyd <coltboyd@xxxxxxxxx> · Fri, 14 Jun 2019 19:58:14 -0500

Good news, I was able to recover the data. I am actively copying it off
right now.

This was the drive layout:
RAID Drive	Start sector
0		6144
1		6144
2		4096
3		6144
4		4096

Once I got the order right (played around in UFS Explorer) the XFS file
system was cleanly readable.

I think when the array was originally created in early 2014 it started out
as a 3 disk raid 5. It was later grown to a 4 disk raid 6 or raid 5, and
finally grown to a 5 disk raid 6. It had previously had 1 failed disk on
raid drive 2 as well.
Best guess is that the original 3 drives and the added fourth had 2048
sectors of padding past the super block or that their super block was this
much larger.
The last drive added had a sector start immediately following the 4096
sectors used for the super block. The same is true for the raid drive 2
replacement.

Is there any way to re-create the array (keeping the data intact) with this
same layout so I could access the data via linux? While I am currently
copying the data off via UFS explorer, I'd prefer to mount it under linux.

Thanks,
-Colt

-----Original Message-----
From: Andreas Klauer <Andreas.Klauer@xxxxxxxxxxxxxx> 
Sent: Tuesday, June 11, 2019 5:06 PM
To: Colt Boyd <coltboyd@xxxxxxxxx>
Cc: linux-raid@xxxxxxxxxxxxxxx
Subject: Re: RAID-6 aborted reshape

On Tue, Jun 11, 2019 at 12:22:53PM -0500, Colt Boyd wrote:
> > Jun  6 10:12:25 OMV1 kernel: [    2.142877] md/raid:md0: raid level 6
> > active with 5 out of 5 devices, algorithm 2
> > Jun  6 10:12:25 OMV1 kernel: [    2.196783] md0: detected capacity
> > change from 0 to 8998697828352
> > Jun  6 10:12:25 OMV1 kernel: [    3.885628] XFS (md0): Mounting V4
Filesystem
> > Jun  6 10:12:25 OMV1 kernel: [    4.213947] XFS (md0): Ending clean
mount
> 
> There is also these:
> 
> Jun  6 10:44:47 OMV1 kernel: [  449.554738] md0: detected capacity 
> change from 0 to 11998263771136 Jun  6 10:44:48 OMV1 
> postfix/smtp[2514]: 9672F6B4: replace: header
> Subject: DegradedArray event on /dev/md0:OMV1: Subject:
> [OMV1.veldanet.local] DegradedArray event on /dev/md0:OMV1 Jun  6 
> 10:46:25 OMV1 kernel: [  547.047912] XFS (md0): Mounting V4 Filesystem 
> Jun  6 10:46:28 OMV1 kernel: [  550.226215] XFS (md0): Log 
> inconsistent (didn't find previous header) Jun  6 10:46:28 OMV1 
> kernel: [  550.226224] XFS (md0): failed to find log head

See, this is very odd.

You had a md0 device with 8998697828352 capacity.

In a 5 disk RAID-6 that would come down to 2999565942784 per disk.

But then (halfhour later) you have a RAID-6 with 11998263771136 capacity.

Went up by 2999565942784... one disk worth.

Now, the way growing RAID works, you only get to use the added capacity once
the rebuild is finished. Cause otherwise you still have old data sitting in
the place new data has to go to and it would overwrite each other. So you
can't use extra capacity before finishing rebuild.

So for some reason, your RAID believed the rebuild to be completed, whether
or not that was actually the case - the mount failure suggests it went very
wrong somehow.

So it didn't work as 6-drive, and neither when re-creating as 5-drive, guess
you have to look at raw data to figure out if it makes any sense at all
(find an offset that has non-zero non-identical data across all drives, see
if the parity looks like a 5-disk or 6-disk array).

If it's both (6 drive for lower and 5 drive for higher offsets) then it
would still be stuck in mid-reshape after all and you'd have to create both
(two sets of overlays), find the reshape position and stitch it together
with dm-linear

Or some such method... that's all assuming it wasn't mounted, didn't corrupt
in other ways while it was in a bad state, had no bugs in the kernel itself
and all that.

Good luck

Andreas Klauer