On Wed, Sep 30, 2020 at 08:03:32PM +0100, Wols Lists wrote: > On 30/09/20 19:58, David Madore wrote: > > mdadm - v4.1 - 2018-10-01 > > > > - which I think is roughly contemporaneous to the kernel version I'm > > using. But the problem still persists (with the exact same symptoms > > and details). > > Except that mdadm is NOT the problem. The problem is that the kernel and > mdadm are not matched date-wise, and because the kernel is a > franken-kernel you need to use a different kernel. I don't understand what you mean by "matched date-wise". The kernel I'm using is a longterm support branch (4.9) which was frozen at the same approximate date as the mdadm I just installed. And it was also the same longterm support branch that was used in the Debian oldstable (9 aka stretch). Do you mean that there is no mdadm version which is compatible with the 4.9 kernels? How often does the mdadm-kernel interface break compatibility? > Use a rescue disk!!! That way, you get a kernel and an mdadm that are > the same approximate date. As it stands, your frankenkernel is too new > for mdadm 3.4, but too ancient for a modern kernel. Using a rescue disk would mean taking the system down for longer than I can afford (I can afford to have this particular partition down for a long time, but not the whole system... which unfortunately resides on the same disks). So I'd like to keep this as a very last resort, or at least, not consider it until I've fully understood what's going on. (It's especially problematic that I have absolutely no idea of the speed at which I can expect the reshape to take place, compared to an ordinary resync. If you could give me a ballpark figure, it would help me decide. My disks resync at ~120MB/sec, and the RAID array I wish to reshape is ~900GB in per partition, so it takes a few hours to do an "ordinary" resync: I assume a reshape will take much longer, but how much longer are we talking?) But I made another discovery in the mean time: when I run the --grow command, something starts a systemd service called mdadm-grow-continue@<device>.service (so in my case mdadm-grow-continue@md112.service; I wasn't able to understand exactly who the caller is), a unit which contains ExecStart=/sbin/mdadm --grow --continue /dev/%I so it ran /sbin/mdadm --grow --continue /dev/md112 which failed with mdadm: array: Cannot grow - need backup-file mdadm: Please provide one with "--backup=..." Now if I override this service to read ExecStart=/sbin/mdadm --grow --continue /dev/%I --backup=/run/mdadm/backup_file-%I then it seems to work correctly, at least on my toy example with loopback devices (but then I suppose it will break the reshape cases where no backup file is needed?). I'm very confused as to what's going on here: was this file supposed to work in the first place? Why is it needed? Whence does it come from? Am I permitted to run mdadm --continue myself? Supposed to? How did all of this work before systemd came in? PS: Oh, there's already a Debian bug for this: #884719 https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=884719 - but it's not marked as fixed. Is array reshaping broken on Debian? Cheers, -- David A. Madore ( http://www.madore.org/~david/ )