Re: XFS mount timeout in linux-6.9.11

Anders Blomdell <anders.blomdell@xxxxxxxxx> · Tue, 13 Aug 2024 14:01:57 +0200

On 2024-08-13 11:19, Dave Chinner wrote:
On Mon, Aug 12, 2024 at 03:03:49PM +0200, Anders Blomdell wrote:
On 2024-08-12 02:04, Dave Chinner wrote:

Ok, can you run the same series of commands but this time in another
shell run this command and leave it running for the entire
mount/unmount/mount/unmount sequence:

# trace-cmd record -e xfs\* -e printk

[snip location of trace]

That will tell me what XFS is doing different at mount time on the
different kernels.
Looks like a timing issue, a trylock fails and brings about a READ_AHEAD burst.

Not timing - it is definitely a bug in the commit the bisect pointed
at.

However, it's almost impossible to actually see until someone or
something (the trace) points it out directly.

The trace confirmed what I suspected - the READ_AHEAD stuff you see
is an inode btree being walked. I knew that we walk the free inode
btrees during mount unless you have a specific feature bit set, but
I didn't think your filesystem is new enough to have that feature
set according to the xfs_info output.

However, I couldn't work out why the free inode btrees would take
that long to walk as the finobt generally tends towards empty on any
filesystem that is frequently allocating inodes. The mount time on
the old kernel indicates they are pretty much empty, because the
mount time is under a second and it's walked all 8 finobts *twice*
during mount.

What the trace pointed out was that the finobt walk to calculate
AG reserve space wasn't actually walking the finobt - it was walking
the inobt. That indexes all allocated inodes, so mount was walking
the btrees that index the ~30 million allocated inodes in the
filesystem. That takes a lot of IO, and that's the 450s pause
to calculate reserves before we run log recovery, and then the
second 450s pause occurs after log recovery because we have to
recalculate the reserves once all the intents and unlinked inodes
have been replayed.

 From that observation, it was just a matter of tracking down the
code that is triggering the walk and working out why it was running
down the wrong inobt....

In hindsight, this was a wholly avoidable bug - a single patch made
two different API modifications that only differed by a single
letter, and one of the 23 conversions missed a single letter. If
that was two patches - one for the finobt conversion, the second for
the inobt conversion, the bug would have been plainly obvious during
review....

Anders, can you try the patch below? It should fix your issue.
Works like a charm! Thanks for the help!

I take it that this patch goes into linux-stable (and linux-next) quite soon!

/Anders