Re: [PATCH] md/raid6: use valid sector values to determine if an I/O should wait on the reshape

Laurence Oberman <loberman@xxxxxxxxxx> · Tue, 28 Nov 2023 14:01:33 -0500

On Tue, 2023-11-28 at 10:44 -0800, Song Liu wrote:
> Hi David and Laurence,
> 
> On Tue, Nov 28, 2023 at 10:13 AM David Jeffery <djeffery@xxxxxxxxxx>
> wrote:
> > 
> > During a reshape or a RAID6 array such as expanding by adding an
> > additional
> > disk, I/Os to the region of the array which have not yet been
> > reshaped can
> > stall indefinitely. This is from errors in the
> > stripe_ahead_of_reshape
> > function causing md to think the I/O is to a region in the actively
> > undergoing the reshape.
> > 
> > stripe_ahead_of_reshape fails to account for the q disk having a
> > sector
> > value of 0. By not excluding the q disk from the for loop, raid6
> > will always
> > generate a min_sector value of 0, causing a return value which
> > stalls.
> > 
> > The function's max_sector calculation also uses min() when it
> > should use
> > max(), causing the max_sector value to always be 0. During a
> > backwards
> > rebuild this can cause the opposite problem where it allows I/O to
> > advance
> > when it should wait.
> > 
> > Fixing these errors will allow safe I/O to advance in a timely
> > manner and
> > delay only I/O which is unsafe due to stripes in the middle of
> > undergoing
> > the reshape.
> > 
> > Fixes: 486f60558607 ("md/raid5: Check all disks in a stripe_head
> > for reshape progress")
> > Signed-off-by: David Jeffery <djeffery@xxxxxxxxxx>
> > Tested-by: Laurence Oberman <loberman@xxxxxxxxxx>
> 
> Thanks for the fix!
> 
> Can we add a test to mdadm/tests to cover this case? (Not sure how
> reliable
> the test will be).
> 
> Song
> 
> > ---
> >  drivers/md/raid5.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> > index dc031d42f53b..26e1e8a5e941 100644
> > --- a/drivers/md/raid5.c
> > +++ b/drivers/md/raid5.c
> > @@ -5892,11 +5892,11 @@ static bool stripe_ahead_of_reshape(struct
> > mddev *mddev, struct r5conf *conf,
> >         int dd_idx;
> > 
> >         for (dd_idx = 0; dd_idx < sh->disks; dd_idx++) {
> > -               if (dd_idx == sh->pd_idx)
> > +               if (dd_idx == sh->pd_idx || dd_idx == sh->qd_idx)
> >                         continue;
> > 
> >                 min_sector = min(min_sector, sh-
> > >dev[dd_idx].sector);
> > -               max_sector = min(max_sector, sh-
> > >dev[dd_idx].sector);
> > +               max_sector = max(max_sector, sh-
> > >dev[dd_idx].sector);
> >         }
> > 
> >         spin_lock_irq(&conf->device_lock);
> > --
> > 2.43.0
> > 
> > 
> 
Hello Song

The way this was discovered is the customer forced a reshape of a raid6
device by adding an additional device. 
The reshape then sess the hang.

# mdadm /dev/md0 -a /dev/nvme18n1
mdadm: added /dev/nvme18n1

# mdadm --grow --raid-devices=6 /dev/md0

The problem was we needed a test kernel just for the customer to
recover, because on reboot the reshape starts again and hangs the
device access.

Such steps could be added for a test but David will know maybe an
easier way for a test.

Regards
Laurence