Re: Lockup of (raid5 or raid6) + vdo after taking out a disk under load

Konstantin Kharlamov <Hi-Angel@xxxxxxxxx> · Thu, 01 Aug 2024 17:46:18 +0300

On Wed, 2024-07-31 at 16:41 -0400, Bryan Gurney wrote:
> Hi Konstantin,
>
> This sounds a lot like something that I encountered with md, back in
> 2019, on the old vdo-devel mailing list:
>
> https://listman.redhat.com/archives/vdo-devel/2019-August/000171.html
>
> Basically, I had a RAID-5 md array that was in the process of
> recovery:
>
> $ cat /proc/mdstat
> Personalities : [raid0] [raid6] [raid5] [raid4]
> md0 : active raid5 sde[4] sdd[2] sdc[1] sdb[0]
>       2929890816 blocks super 1.2 level 5, 512k chunk, algorithm 2
> [4/3] [UUU_]
>       [=>...................]  recovery =  9.1% (89227836/976630272)
> finish=85.1min speed=173727K/sec
>       bitmap: 0/8 pages [0KB], 65536KB chunk
>
> Note that the speed of the recovery is 173,727 KB/sec, which is less
> than the sync_speed_max value:
>
> $ grep . /sys/block/md0/md/sync_speed*
> /sys/block/md0/md/sync_speed:171052
> /sys/block/md0/md/sync_speed_max:200000 (system)
> /sys/block/md0/md/sync_speed_min:1000 (system)
>
> ...And when I decreased "sync_speed_max" to "65536", I stopped seeing
> hung task timeouts.
>
> There's a similar setting in dm-raid: the "--maxrecoveryrate" option
> of lvchange.  So, to set the maximum recovery rate to 64 MiB per
> second per device, this would be the command, for an example VG/LV of
> "p_r5/testdmraid5"
>
> # lvchange --maxrecoveryrate 64M p_r5/testdmraid5
>
> (Older hard disk drives may not have a sequential read / write speed
> of more than 100 MiB/sec; this meant that md's default of 200 MiB/sec
> was "too fast", and would result in the recovery I/O starving the VDO
> volume from being able to service I/O.)
>
> The current value of max_recovery_rate for dm-raid can be displayed
> with "lvs -a -o +raid_max_recovery_rate".
>
> By reducing the maximum recovery rate for the dm-raid RAID-5 logical
> volume, does this result in the hung task timeouts for the
> "dm-vdo0-bioQ*" to not appear, and for the fio job to continue
> writing?

Thank you, so, I'm trying this out, and it doesn't seem to be working that well
(unless perhaps something changed in the userspace LVM since the 2.03.11 I am
using?).

So, after having executed the original steps-to-reproduce I have these two volumes:

    $ lvs
      LV                    VG   Attr       LSize   Pool                  Origin Data%  Meta%  Move Log Cpy%Sync Convert
      deco_vol              p_r5 vwi-XXv-X- 100.00g vdo_internal_deco_vol
      vdo_internal_deco_vol p_r5 dwi-XX--X-  20.00g

Executing the suggested lvchange command does nothing on them:

    $ lvchange --maxrecoveryrate 64M p_r5/deco_vol
      Command on LV p_r5/deco_vol uses options that require LV types raid .
      Command not permitted on LV p_r5/deco_vol.
    $ lvchange --maxrecoveryrate 64M p_r5/vdo_internal_deco_vol
      Command on LV p_r5/vdo_internal_deco_vol uses options that require LV types raid .
      Command not permitted on LV p_r5/vdo_internal_deco_vol.

Also, executing a `lvs -a -o +raid_max_recovery_rate` shows emptiness in place of
that field. However, this command shows various internal volumes:

    $ lvs -a -o +raid_max_recovery_rate
      LV                                     VG   Attr       LSize   Pool                  Origin Data%  Meta%  Move Log Cpy%Sync Convert MaxSync
      deco_vol                               p_r5 vwi-XXv-X- 100.00g vdo_internal_deco_vol
      vdo_internal_deco_vol                  p_r5 dwi-XX--X-  20.00g
      [vdo_internal_deco_vol_vdata]          p_r5 rwi-aor---  20.00g                                                     100.00
      [vdo_internal_deco_vol_vdata_rimage_0] p_r5 iwi-aor---  10.00g
      [vdo_internal_deco_vol_vdata_rimage_1] p_r5 iwi-aor---  10.00g
      [vdo_internal_deco_vol_vdata_rimage_2] p_r5 iwi-aor---  10.00g
      [vdo_internal_deco_vol_vdata_rmeta_0]  p_r5 ewi-aor---   4.00m
      [vdo_internal_deco_vol_vdata_rmeta_1]  p_r5 ewi-aor---   4.00m
      [vdo_internal_deco_vol_vdata_rmeta_2]  p_r5 ewi-aor---   4.00m

So I tried executing the command on them:

    $ lvchange --maxrecoveryrate 64M p_r5/{vdo_internal_deco_vol_vdata,vdo_internal_deco_vol_vdata_rimage_0,vdo_internal_deco_vol_vdata_rimage_1,vdo_internal_deco_vol_vdata_rimage_2,vdo_internal_deco_vol_vdata_rmeta_0,vdo_internal_deco_vol_vdata_rmeta_1,vdo_internal_deco_vol_vdata_rmeta_2}
      Command on LV p_r5/vdo_internal_deco_vol_vdata_rimage_0 uses options that require LV types raid .
      Command not permitted on LV p_r5/vdo_internal_deco_vol_vdata_rimage_0.
      Command on LV p_r5/vdo_internal_deco_vol_vdata_rmeta_0 uses options that require LV types raid .
      Command not permitted on LV p_r5/vdo_internal_deco_vol_vdata_rmeta_0.
      Command on LV p_r5/vdo_internal_deco_vol_vdata_rimage_1 uses options that require LV types raid .
      Command not permitted on LV p_r5/vdo_internal_deco_vol_vdata_rimage_1.
      Command on LV p_r5/vdo_internal_deco_vol_vdata_rmeta_1 uses options that require LV types raid .
      Command not permitted on LV p_r5/vdo_internal_deco_vol_vdata_rmeta_1.
      Command on LV p_r5/vdo_internal_deco_vol_vdata_rimage_2 uses options that require LV types raid .
      Command not permitted on LV p_r5/vdo_internal_deco_vol_vdata_rimage_2.
      Command on LV p_r5/vdo_internal_deco_vol_vdata_rmeta_2 uses options that require LV types raid .
      Command not permitted on LV p_r5/vdo_internal_deco_vol_vdata_rmeta_2.
      Logical volume p_r5/vdo_internal_deco_vol_vdata changed.

This resulted in exactly one volume having changed its speed: the `[vdo_internal_deco_vol_vdata]`.

With that done, I tried removing a disk while there's load, and it results in same
old lockup reports:

-----------------------

So to sum up: the lvchange command only managed to change speed on a single internal
volume, but that didn't make the lockup go away.