Hi David,
Let me add some history from my memory:
On 10/1/20 11:04 AM, David Madore wrote:
On Thu, Oct 01, 2020 at 03:10:21PM +0100, Wols Lists wrote:
Except is this the problem? If the reshape fails to start, I don't quite
see how the restart service-file can be to blame?
I'm confident this is the problem. I've changed the service file and
the reshape now works fine for loopback devices on my system (I even
tried it on a few small partitions to make sure).
Yes, but see below.
As far as I understand it, here's what happens: when mdadm is given a
reshape command on a system with systemd (and unless
MDADM_NO_SYSTEMCTL is set), instead of handling the reshape itself, it
calls (via the continue_via_systemd() function in Grow.c) "systemctl
restart mdadm-grow-continue@${device}.service" (where ${device} is the
md device base name). This is defined via a systemd template file
distributed by mdadm, namely
/lib/systemd/system/mdadm-grow-continue@.service which itself calls
(ExecStart) "/sbin/mdadm --grow --continue /dev/%I" (where %I is,
again, the md device base name). This does not pass a --backup-file
parameter so, when the initial call needed one, this service
immediately terminates with an error message, which is lost because
standard input/output/error are redirected to /dev/null by the service
file. So the reshape never starts.
The original problem that service file attempts to solve is that mdmadm
doesn't ever do the reshape itself. In the absence of systemd, mdadm
always forked a process to do the reshape in the background, passing
everything necessary. Systemd likes to kill off child processes when a
main process ends, so *poof*, no reshape.
I think the way to fix this would be to rewrite the systemd service
file so that it first checks the existence of
/run/mdadm/backup_file-%I and, if it exists, adds it as --backup-file
parameter. (I don't know how to do this. For my own system I wrote a
quick fix which assumes that --backup-file will always be present,
which is just as wrong as assuming that it will always be absent.)
Meanwhile, at the time this was fixed, mdadm's defaults pretty much
ensure that a backup file is never needed. The temporary space provided
by the backup file is now only needed when there isn't any leeway in the
data offsets of the member devices. Avoiding the backup file is also
twice as fast. So the systemd hack service was created without
allowance for a backup file.
However, your solution to use the ram-backed /run directory is another
disaster in the making, as that folder is destroyed on shutdown, totally
breaking the whole point of the backup file. It needs to go somewhere
else, outside of the raid being reshaped and persistent through system
crashes/shutdown.
But I have no idea whose responsability it is to maintain this file,
or indeed where it came from. If you know where I should bug-report,
or if you can pass the information to whoever is in charge, I'd be
grateful.
Well, this list is the development list for MD and mdadm, so you're in
the right place. I think we've narrowed down what needs fixing.
Oh - and as for backup files - newer arrays by default don't need or use
them. So that again could be part of the problem ...
Well, the metadata versions with superblock at the end still need them,
as they have to maintain data offset == 0.
How do newer arrays get around the need for a backup file when doing a
RAID5 -> RAID6 (with N -> N+1 disks) reshape?
Move the data offsets. The background task maintains a boundary line
within the array during reshape--as stripes are moved and reshaped, the
boundary is moved. One stripe at a time is frozen..
Phil