Re: Experiencing md raid5 hang and CPU lockup on kernel v6.11

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

在 2024/11/15 17:26, Haris Iqbal 写道:
On Thu, Nov 14, 2024 at 1:54 PM Jinpu Wang <jinpu.wang@xxxxxxxxx> wrote:

On Thu, Nov 14, 2024 at 1:19 PM Yu Kuai <yukuai1@xxxxxxxxxxxxxxx> wrote:

Hi,

在 2024/11/14 18:27, Jinpu Wang 写道:
Do you want us to try the following change on top of the md/md-6.13
branch without Xiao's patch and your fixup alone, or combine them all
together?

Combine them please, sorry that I forgot to mention it.

And for md/md-6.13 there will be conflicts. So try v6.11 is better I
think.
Thanks for clarification.
I have to chery-pick the following 3 commits to apply clean on v6.11.5

6f039cc42f21 md/raid5: rename wait_for_overlap to wait_for_reshape
0e4aac736666 md/raid5: only add to wq if reshape is in progress
e6a03207b925 md/raid5: use wait_on_bit() for R5_Overlap

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 2868e2e20dea..6df5e9e65494 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -5867,17 +5867,6 @@ static int add_all_stripe_bios(struct r5conf *conf,
                         wait_on_bit(&dev->flags, R5_Overlap,
TASK_UNINTERRUPTIBLE);
                         return 0;
                 }
-       }
-
-       for (dd_idx = 0; dd_idx < sh->disks; dd_idx++) {
-               struct r5dev *dev = &sh->dev[dd_idx];
-
-               if (dd_idx == sh->pd_idx || dd_idx == sh->qd_idx)
-                       continue;
-
-               if (dev->sector < ctx->first_sector ||
-                   dev->sector >= ctx->last_sector)
-                       continue;

                 __add_stripe_bio(sh, bi, dd_idx, forwrite, previous);
                 clear_bit((dev->sector - ctx->first_sector) >>

Will report back the result.

Ran the above patches and changes, and there was no hang.

Thanks for the test! AlthoughI'm not 100% sure for my sulotion for now,
at least the problem is located.

Give me sometime to sort things out. :)

Thanks,
Kuai





BTW: we hit similar hung since kernel 4.19.

Good to know, I think Xiao's patch alone is fine for 4.19, the
BUG_ON() probabaly won't be triggered.

Thx!

Thanks,
Kuai



.






[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux