Re: Is it possible to restart --add?

Chris Dunlop <chris@xxxxxxxxxxxx> · Mon, 12 Dec 2022 18:05:30 +1100

On Sun, Dec 11, 2022 at 01:55:54PM +0000, Wols Lists wrote:
On 10/12/2022 19:59, Chris Dunlop wrote:
Hi,

When replacing a failed disk with a new one using --add, is it 
possible to restart a partially-complete --add, e.g. after a reboot?

I have a raid-6 with a failed disk, and used --add to add a new disk 
as a replacement. From /proc/mdstat, "finish" told me it would take 
around 24 hours to complete the add.

The machine was rebooted some hours into the add, and on restart the 
md was missing the new disk (and the failed disk). I tried to 
--re-add the new disk again, but mdadm told me it's "not possible":

mdadm: --re-add for /dev/sdh1 to /dev/md0 is not possible

I ended up --add'ing the disk again, so the 24 hours to complete 
started again.

Is this expected, and/or is there a way to restart the --add rather 
than starting from the beginning again?

Raid is supposed to be robust, so this surprises me. When it rebooted 
it should have known it was part-way through a rebuild. Was it a 
controlled reboot, or a crash and restart?

Controlled reboot.

What I would expect is that the array would be rebuilt including sdh1, 
and the rebuild would just carry on. So I suspect that whatever went 
wrong, it was a bit further back than that - somehow md forgot that 
sdh1 was now part of the array.

Yes, I was expecting that the --add would be periodically recording it's 
current "synced to" block or offset so on restart it would be able to pick 
up where it left off (or a little before).

Weird.

Yup.

Tks,

Chris