Re: [REGRESSION] 6.7.1: md: raid5 hang and unresponsive system; successfully bisected

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> I have not root cause this yet, but would like share some findings from 
> the vmcore Dan shared. From what i can see, this doesn't look like a md 
> issue, but something wrong with block layer or below.

Below is one other thing I found that might be of interest. This is
from the original email thread [1] that was linked to in the original
issue from 2022, which the change in question reverts:

On 2022-09-02 17:46, Logan Gunthorpe wrote:
> I've made some progress on this nasty bug. I've got far enough to know it's not
> related to the blk-wbt or the block layer.
> 
> Turns out a bunch of bios are stuck queued in a blk_plug in the md_raid5 
> thread while that thread appears to be stuck in an infinite loop (so it never
> schedules or does anything to flush the plug). 
> 
> I'm still debugging to try and find out the root cause of that infinite loop, 
> but I just wanted to send an update that the previous place I was stuck at
> was not correct.
> 
> Logan

This certainly sounds like it has some similarities to what we are
seeing when that change is reverted. The md0_raid5 thread appears to be
in an infinite loop, consuming 100% CPU, but not actually doing any
work.

-- Dan

[1] https://lore.kernel.org/r/7f3b87b6-b52a-f737-51d7-a4eec5c44112@xxxxxxxxxxxx




[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux