Re: Recovering mdraid after node reboot

Lakshmi Narasimhan Sundararajan <lsundararajan@xxxxxxxxxxxxxxx> · Mon, 10 Jun 2024 19:45:22 +0530

Bumping my email here for your attention.
Can anyone confirm that the below recovery in the case highlighted is
reliable and correct.

On Wed, Jun 5, 2024 at 4:33 PM Lakshmi Narasimhan Sundararajan
<lsundararajan@xxxxxxxxxxxxxxx> wrote:
>
> Hi Team,
> A very good day to you all.
>
> I have a below scenario wherein mdraid array (raid0 originally) is
> failing to assemble after a node reboot, while array was expanding
> capacity (hence raid4).
> Can you please confirm that the forced recovery in this below scenario
> is correct and reliable always.
>
> 1/ create md array (raid0)
> example cmd:
> mdadm -C /dev/md/vol0 -n 2 --metadata 1.2 -c 1024 -l 0 /dev/sda /dev/sdb
>
> 2/ array goes full, so need to expand capacity
> Since this is a virtual environment, I am able to resize /dev/sda and
> /dev/sdb disks outside that are attached to my server node.
>
> After increasing the disk capacity backing disks /dev/sda and
> /dev/sdb, I am attempting to grow the array to the max capacity.
>
> 3/ Convert md array to raid 4
> mdadm -G /dev/md/vol0 -l 4
>
> 4/ grow array to max capacity
> mdadm -G /de/vmd/vol0 --size max
>
> 5/ put it back to raid 0
> mdadm -G /de/vmd/vol0 -l 0
>
> In the above sequence, I only expand capacity by resizing the array
> elements and not by adding new disk to the array. While in the above
> sequence, I hit a power fail and node got rebooted.
>
> After node reboot, since array was in raid4 it failed to come online.
> ```
> Jun 03 23:08:35 kernel: md/raid:md126: not clean -- starting
> background reconstruction
> Jun 03 23:08:35 kernel: md/raid:md126: device sda operational as raid disk 0
> Jun 03 23:08:35 kernel: md/raid:md126: device sdb operational as raid disk 1
> Jun 03 23:08:35 kernel: md/raid:md126: cannot start dirty degraded array.
> Jun 03 23:08:35 md/raid:md126: failed to run raid set.
> Jun 03 23:08:35 kernel: md: pers->run() failed ...
> ```
>
> I recovered the array using the below sequence, given the array
> expansion was only through disk capacity resize.
>
> mdadm -S /dev/md126 /// which mapped to /dev/md/vol0 above
> mdadm -C /dev/md/vol0 -n 2 --metadata 1.2 -c 1024 -l 0 --assume-clean
> --force /dev/sda /dev/sdb
>
> This brought up the array back online and contents were okay too.
>
> So in the above sequence of actions, I want to understand the following:
> -- is the above method a reliable way to recover array incase there
> was a node reboot while in raid4 state?
>
> -- Please let me know your thoughts on recovery and data reliability
> on the array while recovering through the above sequence.
>
> Many thanks and kind regards for your help and insights.
> LN