Re: Intermittent stalling of all MD IO, Debian buster (4.19.0-16)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Guoqing,

Thanks for looking at this.

On Wed, Jun 16, 2021 at 11:57:33AM +0800, Guoqing Jiang wrote:
> The above looks like the bio for sb write was throttled by wbt, which caused
> the first calltrace.
> I am wondering if there  were intensive IOs happened to the
> underlying device of md5, which triggered wbt to throttle sb
> write, or can you access the underlying device directly?

Next time it occurs I can check if I am able to read from the SSDs
that make up the MD device, if that information would be helpful.

I have never been able to replicate the problem in a test
environment so it is likely that it needs to be under heavy load for
it to happen.

> And there was a report [1] for raid5 which may related to wbt throttle as
> well, not sure if the
> change [2] could help or not.
> 
> [1]. https://lore.kernel.org/linux-raid/d3fced3f-6c2b-5ffa-fd24-b24ec6e7d4be@xxxxxxxxxxxx/
> [2]. https://lore.kernel.org/linux-raid/cb0f312e-55dc-cdc4-5d2e-b9b415de617f@xxxxxxxxx/

All of my MD arrays tend to be RAID-1 or RAID-10, two devices, no
journal, internal bitmap. I see the reporter of this problem was
using RAID-6 with an external write journal. I can still build a
kernel with this patch and try it out, if you think it could possibly
help. The long time between incidents obviously makes things
extra challenging.

The next step I have taken is to put the buster-backports kernel
package (5.10.24-1~bpo10+1) on two test servers, and will also boot
the production hosts into this if they should experience the problem
again.

Thanks,
Andy



[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux