Re: Experiencing md raid5 hang and CPU lockup on kernel v6.11

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Nov 14, 2024 at 1:54 PM Jinpu Wang <jinpu.wang@xxxxxxxxx> wrote:
>
> On Thu, Nov 14, 2024 at 1:19 PM Yu Kuai <yukuai1@xxxxxxxxxxxxxxx> wrote:
> >
> > Hi,
> >
> > 在 2024/11/14 18:27, Jinpu Wang 写道:
> > > Do you want us to try the following change on top of the md/md-6.13
> > > branch without Xiao's patch and your fixup alone, or combine them all
> > > together?
> >
> > Combine them please, sorry that I forgot to mention it.
> >
> > And for md/md-6.13 there will be conflicts. So try v6.11 is better I
> > think.
> Thanks for clarification.
> I have to chery-pick the following 3 commits to apply clean on v6.11.5
>
> 6f039cc42f21 md/raid5: rename wait_for_overlap to wait_for_reshape
> 0e4aac736666 md/raid5: only add to wq if reshape is in progress
> e6a03207b925 md/raid5: use wait_on_bit() for R5_Overlap
>
> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> index 2868e2e20dea..6df5e9e65494 100644
> --- a/drivers/md/raid5.c
> +++ b/drivers/md/raid5.c
> @@ -5867,17 +5867,6 @@ static int add_all_stripe_bios(struct r5conf *conf,
>                         wait_on_bit(&dev->flags, R5_Overlap,
> TASK_UNINTERRUPTIBLE);
>                         return 0;
>                 }
> -       }
> -
> -       for (dd_idx = 0; dd_idx < sh->disks; dd_idx++) {
> -               struct r5dev *dev = &sh->dev[dd_idx];
> -
> -               if (dd_idx == sh->pd_idx || dd_idx == sh->qd_idx)
> -                       continue;
> -
> -               if (dev->sector < ctx->first_sector ||
> -                   dev->sector >= ctx->last_sector)
> -                       continue;
>
>                 __add_stripe_bio(sh, bi, dd_idx, forwrite, previous);
>                 clear_bit((dev->sector - ctx->first_sector) >>
>
> Will report back the result.

Ran the above patches and changes, and there was no hang.

>
> >
> > >
> > > BTW: we hit similar hung since kernel 4.19.
> >
> > Good to know, I think Xiao's patch alone is fine for 4.19, the
> > BUG_ON() probabaly won't be triggered.
>
> Thx!
> >
> > Thanks,
> > Kuai
> >
> >





[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux