Re: stalling IO regression since linux 5.12, through 5.18

Ming Lei <ming.lei@xxxxxxxxxx> · Wed, 17 Aug 2022 23:34:49 +0800

On Wed, Aug 17, 2022 at 11:02:25AM -0400, Chris Murphy wrote:
> 
> 
> On Wed, Aug 17, 2022, at 10:53 AM, Ming Lei wrote:
> > On Wed, Aug 17, 2022 at 10:34:38AM -0400, Chris Murphy wrote:
> >> 
> >> 
> >> On Wed, Aug 17, 2022, at 8:06 AM, Ming Lei wrote:
> >> 
> >> > blk-mq debugfs log is usually helpful for io stall issue, care to post
> >> > the blk-mq debugfs log:
> >> >
> >> > (cd /sys/kernel/debug/block/$disk && find . -type f -exec grep -aH . {} \;)
> >> 
> >> This is only sda
> >> https://drive.google.com/file/d/1aAld-kXb3RUiv_ShAvD_AGAFDRS03Lr0/view?usp=sharing
> >
> > From the log, there isn't any in-flight IO request.
> >
> > So please confirm that it is collected after the IO stall is triggered.
> 
> Yes, iotop reports no reads or writes at the time of collection. IO pressure 99% for auditd, systemd-journald, rsyslogd, and postgresql, with increasing pressure from all the qemu processes.
> 
> Keep in mind this is a raid10, so maybe it's enough for just one block device IO to stall and the whole thing stops? That's why I included all block devices.
> 

>From the 2nd log of blockdebugfs-all.txt, still not see any in-flight IO on
request based block devices, but sda is _not_ included in this log, and
only sdi, sdg and sdf are collected, is that expected?

BTW, all request based block devices should be observed in blk-mq debugfs.

thanks,
Ming