On Sun, Dec 11, 2022 at 01:55:54PM +0000, Wols Lists wrote:
On 10/12/2022 19:59, Chris Dunlop wrote:
Hi,
When replacing a failed disk with a new one using --add, is it
possible to restart a partially-complete --add, e.g. after a reboot?
I have a raid-6 with a failed disk, and used --add to add a new disk
as a replacement. From /proc/mdstat, "finish" told me it would take
around 24 hours to complete the add.
The machine was rebooted some hours into the add, and on restart the
md was missing the new disk (and the failed disk). I tried to
--re-add the new disk again, but mdadm told me it's "not possible":
mdadm: --re-add for /dev/sdh1 to /dev/md0 is not possible
I ended up --add'ing the disk again, so the 24 hours to complete
started again.
Is this expected, and/or is there a way to restart the --add rather
than starting from the beginning again?
Raid is supposed to be robust, so this surprises me. When it rebooted
it should have known it was part-way through a rebuild. Was it a
controlled reboot, or a crash and restart?
Controlled reboot.
What I would expect is that the array would be rebuilt including sdh1,
and the rebuild would just carry on. So I suspect that whatever went
wrong, it was a bit further back than that - somehow md forgot that
sdh1 was now part of the array.
Yes, I was expecting that the --add would be periodically recording it's
current "synced to" block or offset so on restart it would be able to pick
up where it left off (or a little before).
Weird.
Yup.
Tks,
Chris