Re: Tasks blocking forever with XFS stack traces

Carlos Maiolino <cmaiolino@xxxxxxxxxx> · Tue, 5 Nov 2019 11:36:52 +0100

Hi Sitsofe.

...
> <snip>
> > >
> > > Other directories on the same filesystem seem fine as do other XFS
> > > filesystems on the same system.
> >
> > The fact you mention other directories seems to work, and the first stack trace
> > you posted, it sounds like you've been keeping a singe AG too busy to almost
> > make it unusable. But, you didn't provide enough information we can really make
> > any progress here, and to be honest I'm more inclined to point the finger to
> > your MD device.
> 
> Let's see if we can pinpoint something :-)
> 
> > Can you describe your MD device? RAID array? What kind? How many disks?
> 
> RAID6 8 disks.

> 
> > What's your filesystem configuration? (xfs_info <mount point>)
> 
> meta-data=/dev/md126             isize=512    agcount=32, agsize=43954432 blks
>          =                       sectsz=4096  attr=2, projid32bit=1
>          =                       crc=1        finobt=1 spinodes=0 rmapbt=0
>          =                       reflink=0
> data     =                       bsize=4096   blocks=1406538240, imaxpct=5
>          =                       sunit=128    swidth=768 blks

> naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
> log      =internal               bsize=4096   blocks=521728, version=2
>          =                       sectsz=4096  sunit=1 blks, lazy-count=1
						^^^^^^  This should have been
							configured to 8 blocks, not 1

> Yes there's more. See a slightly elided dmesg from a longer run on
> https://sucs.org/~sits/test/kern-20191024.log.gz .

At a first quick look, it looks like you are having lots of IO contention in the
log, and this is slowing down the rest of the filesystem. What caught my
attention at first was the wrong configured log striping for the filesystem and
I wonder if this isn't the responsible for the amount of IO contention you are
having in the log. This might well be generating lots of RMW cycles while
writing to the log generating the IO contention and slowing down the rest of the
filesystem, I'll try to take a more careful look later on.

I can't say anything if there is any bug related with the issue first because I
honestly don't remember, second because you are using an old distro kernel which
I have no idea to know which bug fixes have been backported or not. Maybe
somebody else can remember of any bug that might be related, but the amount of
threads you have waiting for log IO, and that misconfigured striping for the log
smells smoke to me.

I let you know if I can identify anything else later.

Cheers.

-- 
Carlos