On 10/12/2022 19:59, Chris Dunlop wrote:
Hi,
When replacing a failed disk with a new one using --add, is it possible
to restart a partially-complete --add, e.g. after a reboot?
I have a raid-6 with a failed disk, and used --add to add a new disk as
a replacement. From /proc/mdstat, "finish" told me it would take around
24 hours to complete the add.
The machine was rebooted some hours into the add, and on restart the md
was missing the new disk (and the failed disk). I tried to --re-add the
new disk again, but mdadm told me it's "not possible":
mdadm: --re-add for /dev/sdh1 to /dev/md0 is not possible
I ended up --add'ing the disk again, so the 24 hours to complete started
again.
Is this expected, and/or is there a way to restart the --add rather than
starting from the beginning again?
Raid is supposed to be robust, so this surprises me. When it rebooted it
should have known it was part-way through a rebuild. Was it a controlled
reboot, or a crash and restart?
What I would expect is that the array would be rebuilt including sdh1,
and the rebuild would just carry on. So I suspect that whatever went
wrong, it was a bit further back than that - somehow md forgot that sdh1
was now part of the array.
Weird.
Cheers,
Wol