Re: XFS mount timeout in linux-6.9.11

Dave Chinner <david@xxxxxxxxxxxxx> · Tue, 13 Aug 2024 19:19:28 +1000

On Mon, Aug 12, 2024 at 03:03:49PM +0200, Anders Blomdell wrote:
> On 2024-08-12 02:04, Dave Chinner wrote:
> > 
> > Ok, can you run the same series of commands but this time in another
> > shell run this command and leave it running for the entire
> > mount/unmount/mount/unmount sequence:
> > 
> > # trace-cmd record -e xfs\* -e printk

[snip location of trace]

> > That will tell me what XFS is doing different at mount time on the
> > different kernels.
> Looks like a timing issue, a trylock fails and brings about a READ_AHEAD burst.

Not timing - it is definitely a bug in the commit the bisect pointed
at.

However, it's almost impossible to actually see until someone or
something (the trace) points it out directly.

The trace confirmed what I suspected - the READ_AHEAD stuff you see
is an inode btree being walked. I knew that we walk the free inode
btrees during mount unless you have a specific feature bit set, but
I didn't think your filesystem is new enough to have that feature
set according to the xfs_info output.

However, I couldn't work out why the free inode btrees would take
that long to walk as the finobt generally tends towards empty on any
filesystem that is frequently allocating inodes. The mount time on
the old kernel indicates they are pretty much empty, because the
mount time is under a second and it's walked all 8 finobts *twice*
during mount.

What the trace pointed out was that the finobt walk to calculate
AG reserve space wasn't actually walking the finobt - it was walking
the inobt. That indexes all allocated inodes, so mount was walking
the btrees that index the ~30 million allocated inodes in the
filesystem. That takes a lot of IO, and that's the 450s pause 
to calculate reserves before we run log recovery, and then the
second 450s pause occurs after log recovery because we have to
recalculate the reserves once all the intents and unlinked inodes
have been replayed.