On Mon, Aug 12, 2024 at 03:03:49PM +0200, Anders Blomdell wrote: > On 2024-08-12 02:04, Dave Chinner wrote: > > > > Ok, can you run the same series of commands but this time in another > > shell run this command and leave it running for the entire > > mount/unmount/mount/unmount sequence: > > > > # trace-cmd record -e xfs\* -e printk [snip location of trace] > > That will tell me what XFS is doing different at mount time on the > > different kernels. > Looks like a timing issue, a trylock fails and brings about a READ_AHEAD burst. Not timing - it is definitely a bug in the commit the bisect pointed at. However, it's almost impossible to actually see until someone or something (the trace) points it out directly. The trace confirmed what I suspected - the READ_AHEAD stuff you see is an inode btree being walked. I knew that we walk the free inode btrees during mount unless you have a specific feature bit set, but I didn't think your filesystem is new enough to have that feature set according to the xfs_info output. However, I couldn't work out why the free inode btrees would take that long to walk as the finobt generally tends towards empty on any filesystem that is frequently allocating inodes. The mount time on the old kernel indicates they are pretty much empty, because the mount time is under a second and it's walked all 8 finobts *twice* during mount. What the trace pointed out was that the finobt walk to calculate AG reserve space wasn't actually walking the finobt - it was walking the inobt. That indexes all allocated inodes, so mount was walking the btrees that index the ~30 million allocated inodes in the filesystem. That takes a lot of IO, and that's the 450s pause to calculate reserves before we run log recovery, and then the second 450s pause occurs after log recovery because we have to recalculate the reserves once all the intents and unlinked inodes have been replayed.