Re: RAID5->RAID6 reshape remains stuck at 0% (does nothing, not even start)

Phil Turmel <philip@xxxxxxxxxx> · Thu, 1 Oct 2020 14:21:46 -0400

Hi David,

Let me add some history from my memory:

On 10/1/20 11:04 AM, David Madore wrote:
On Thu, Oct 01, 2020 at 03:10:21PM +0100, Wols Lists wrote:
Except is this the problem? If the reshape fails to start, I don't quite
see how the restart service-file can be to blame?

I'm confident this is the problem.  I've changed the service file and
the reshape now works fine for loopback devices on my system (I even
tried it on a few small partitions to make sure).

Yes, but see below.

As far as I understand it, here's what happens: when mdadm is given a
reshape command on a system with systemd (and unless
MDADM_NO_SYSTEMCTL is set), instead of handling the reshape itself, it
calls (via the continue_via_systemd() function in Grow.c) "systemctl
restart mdadm-grow-continue@${device}.service" (where ${device} is the
md device base name).  This is defined via a systemd template file
distributed by mdadm, namely
/lib/systemd/system/mdadm-grow-continue@.service which itself calls
(ExecStart) "/sbin/mdadm --grow --continue /dev/%I" (where %I is,
again, the md device base name).  This does not pass a --backup-file
parameter so, when the initial call needed one, this service
immediately terminates with an error message, which is lost because
standard input/output/error are redirected to /dev/null by the service
file.  So the reshape never starts.

The original problem that service file attempts to solve is that mdmadm 
doesn't ever do the reshape itself.  In the absence of systemd, mdadm 
always forked a process to do the reshape in the background, passing 
everything necessary.  Systemd likes to kill off child processes when a 
main process ends, so *poof*, no reshape.

I think the way to fix this would be to rewrite the systemd service
file so that it first checks the existence of
/run/mdadm/backup_file-%I and, if it exists, adds it as --backup-file
parameter.  (I don't know how to do this.  For my own system I wrote a
quick fix which assumes that --backup-file will always be present,
which is just as wrong as assuming that it will always be absent.)

Meanwhile, at the time this was fixed, mdadm's defaults pretty much 
ensure that a backup file is never needed.  The temporary space provided 
by the backup file is now only needed when there isn't any leeway in the 
data offsets of the member devices.  Avoiding the backup file is also 
twice as fast.  So the systemd hack service was created without 
allowance for a backup file.

However, your solution to use the ram-backed /run directory is another 
disaster in the making, as that folder is destroyed on shutdown, totally 
breaking the whole point of the backup file.  It needs to go somewhere 
else, outside of the raid being reshaped and persistent through system 
crashes/shutdown.

But I have no idea whose responsability it is to maintain this file,
or indeed where it came from.  If you know where I should bug-report,
or if you can pass the information to whoever is in charge, I'd be
grateful.

Well, this list is the development list for MD and mdadm, so you're in 
the right place.  I think we've narrowed down what needs fixing.

Oh - and as for backup files - newer arrays by default don't need or use
them. So that again could be part of the problem ...

Well, the metadata versions with superblock at the end still need them, 
as they have to maintain data offset == 0.

How do newer arrays get around the need for a backup file when doing a
RAID5 -> RAID6 (with N -> N+1 disks) reshape?

Move the data offsets.  The background task maintains a boundary line 
within the array during reshape--as stripes are moved and reshaped, the 
boundary is moved.  One stripe at a time is frozen..

Phil