Re: [PATCH] md/raid6: use valid sector values to determine if an I/O should wait on the reshape

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 2023-11-28 at 13:11 -0500, David Jeffery wrote:
> During a reshape or a RAID6 array such as expanding by adding an
> additional
> disk, I/Os to the region of the array which have not yet been
> reshaped can
> stall indefinitely. This is from errors in the
> stripe_ahead_of_reshape
> function causing md to think the I/O is to a region in the actively
> undergoing the reshape.
> 
> stripe_ahead_of_reshape fails to account for the q disk having a
> sector
> value of 0. By not excluding the q disk from the for loop, raid6 will
> always
> generate a min_sector value of 0, causing a return value which
> stalls.
> 
> The function's max_sector calculation also uses min() when it should
> use
> max(), causing the max_sector value to always be 0. During a
> backwards
> rebuild this can cause the opposite problem where it allows I/O to
> advance
> when it should wait.
> 
> Fixing these errors will allow safe I/O to advance in a timely manner
> and
> delay only I/O which is unsafe due to stripes in the middle of
> undergoing
> the reshape.
> 
> Fixes: 486f60558607 ("md/raid5: Check all disks in a stripe_head for
> reshape progress")
> Signed-off-by: David Jeffery <djeffery@xxxxxxxxxx>
> Tested-by: Laurence Oberman <loberman@xxxxxxxxxx>
> ---
>  drivers/md/raid5.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> index dc031d42f53b..26e1e8a5e941 100644
> --- a/drivers/md/raid5.c
> +++ b/drivers/md/raid5.c
> @@ -5892,11 +5892,11 @@ static bool stripe_ahead_of_reshape(struct
> mddev *mddev, struct r5conf *conf,
>         int dd_idx;
>  
>         for (dd_idx = 0; dd_idx < sh->disks; dd_idx++) {
> -               if (dd_idx == sh->pd_idx)
> +               if (dd_idx == sh->pd_idx || dd_idx == sh->qd_idx)
>                         continue;
>  
>                 min_sector = min(min_sector, sh->dev[dd_idx].sector);
> -               max_sector = min(max_sector, sh->dev[dd_idx].sector);
> +               max_sector = max(max_sector, sh->dev[dd_idx].sector);
>         }
>  
>         spin_lock_irq(&conf->device_lock);

Nice work David as usual, this issue has lurked for a while it seems.
Thanks for all you do for Red Hat and upstream.

Regards
Laurence






[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux