Re: Failed to grow RAID - "failed to restore critical section for reshape" after reboot

Christian <christian+lr@xxxxxxxxx> · Wed, 18 Jul 2018 10:15:07 +0200

Hello again,

On 17.07.2018 15:04, Christian wrote:
> Hello all,
>
> today I tried to grow a RAID 5 from 4 to 5 disks but failed to do so.
> The additional disk was prepared by writing a gpt disklabel using parted on it, creating a primary partition and setting
> raid to "on".
>
> Then I added it to the array and tried to grow it using
> # mdadm /dev/md0 --add /dev/sdc1
> # mdadm --grow --raid-devices=5 /dev/md0 --backup-file=/root/md0.bak
>
> /proc/mdstat showed an increasing ETA with 0kb/s progress.
> Then I noticed, that I accidentially inserted the new disk into the server on-line, only rebooting the virtual machine
> (which has direct access to the SATA-controller, as it is passed through) and not the host...
> After that I rebooted the VM-Host (XenServer). The VM didn't boot, because it failed to mount a filesystem, which is
> located on a LVM which is on top of the RAID.
>
> After disabling the fstab entry, the system booted up again. But now the md0 gets assembles with all devices marked as
> spares.
>
> Trying to reassemble the RAID fails:
> # mdadm --assemble --scan
> mdadm: Failed to restore critical section for reshape, sorry.
>        Possibly you needed to specify the --backup-file
>
> I tried passing the /root/md0.bak file as --backup-file, but it still doesn't work.
>
> From that point on, all actions were performed on a snapshot of the underlying partitions.
>
> Maybe somebody is able to guide me how to reassemble my array without losing too much data?
>
> Thanks in advance,
>
> Christian
>
>
>
> Some diagnostic output follows:
> [...]

I finally found a solution after investigating further.
The backup file consists of about 6.1MiB zeroes. That's why I skipped the --backup-file parameter for the following
commands.

# mdadm --assemble /dev/md0 --force --verbose --invalid-backup /dev/sda1 /dev/sdd1 /dev/sde1 /dev/sdb1 /dev/sdc1
This command resulted in the following message:

mdadm: failed to RUN_ARRAY /dev/md0: Invalid argument

The syslog contained the following line:
md/raid:md0: reshape_position too early for auto-recovery - aborting.

That led me to the solution to revert the grow command:
# mdadm --assemble /dev/md0 --force --verbose --update=revert-reshape --invalid-backup /dev/sda1 /dev/sdd1 /dev/sde1 /dev/sdb1 /dev/sdc1

Growing the RAID with the initial command fails even after removing the new device from the array (overwriting the disks first 4MB after removal) and re-adding it.

Syslog gives a hint for the failures reason (all messages logged within the same second):
kernel: [68111.425022] RAID conf printout:
kernel: [68111.425027]  --- level:5 rd:5 wd:5
kernel: [68111.425043]  disk 0, o:1, dev:sda1
kernel: [68111.425044]  disk 1, o:1, dev:sdd1
kernel: [68111.425045]  disk 2, o:1, dev:sde1
kernel: [68111.425045]  disk 3, o:1, dev:sdb1
kernel: [68111.425046]  disk 4, o:1, dev:sdc1
kernel: [68111.425358] md: reshape of RAID array md0
kernel: [68111.425359] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
kernel: [68111.425359] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for reshape.
kernel: [68111.425362] md: using 128k window, over a total of 5860391424k.
systemd[1]: Created slice system-mdadm\x2dgrow\x2dcontinue.slice.
systemd[1]: Started Manage MD Reshape on /dev/md0.
systemd[1]: mdadm-grow-continue@md0.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
systemd[1]: mdadm-grow-continue@md0.service: Unit entered failed state.
systemd[1]: mdadm-grow-continue@md0.service: Failed with result 'exit-code'.

These lines led me to this [1] Debian Bug report. Removing the --backup-file parameter let me grow the array as intended - problem solved.

Regards,

Christian

[1] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=884719

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html