Re: XFS mount timeout in linux-6.9.11

Dave Chinner <david@xxxxxxxxxxxxx> · Sun, 11 Aug 2024 09:11:44 +1000

On Sat, Aug 10, 2024 at 10:29:38AM +0200, Anders Blomdell wrote:
> 
> 
> On 2024-08-10 00:55, Dave Chinner wrote:
> > On Fri, Aug 09, 2024 at 07:08:41PM +0200, Anders Blomdell wrote:
> > > With a filesystem that contains a very large amount of hardlinks
> > > the time to mount the filesystem skyrockets to around 15 minutes
> > > on 6.9.11-200.fc40.x86_64 as compared to around 1 second on
> > > 6.8.10-300.fc40.x86_64,
> > 
> > That sounds like the filesystem is not being cleanly unmounted on
> > 6.9.11-200.fc40.x86_64 and so is having to run log recovery on the
> > next mount and so is recovering lots of hardlink operations that
> > weren't written back at unmount.
> > 
> > Hence this smells like an unmount or OS shutdown process issue, not
> > a mount issue. e.g. if something in the shutdown scripts hangs,
> > systemd may time out the shutdown and power off/reboot the machine
> > wihtout completing the full shutdown process. The result of this is
> > the filesystem has to perform recovery on the next mount and so you
> > see a long mount time because of some other unrelated issue.
> > 
> > What is the dmesg output for the mount operations? That will tell us
> > if journal recovery is the difference for certain.  Have you also
> > checked to see what is happening in the shutdown/unmount process
> > before the long mount times occur?
> echo $(uname -r) $(date +%H:%M:%S) > /dev/kmsg
> mount /dev/vg1/test /test
> echo $(uname -r) $(date +%H:%M:%S) > /dev/kmsg
> umount /test
> echo $(uname -r) $(date +%H:%M:%S) > /dev/kmsg
> mount /dev/vg1/test /test
> echo $(uname -r) $(date +%H:%M:%S) > /dev/kmsg
> 
> [55581.470484] 6.8.0-rc4-00129-g14dd46cf31f4 09:17:20
> [55581.492733] XFS (dm-7): Mounting V5 Filesystem e2159bbc-18fb-4d4b-a6c5-14c97b8e5380
> [56048.292804] XFS (dm-7): Ending clean mount
> [56516.433008] 6.8.0-rc4-00129-g14dd46cf31f4 09:32:55

So it took ~450s to determine that the mount was clean, then another
450s to return to userspace?

> [56516.434695] XFS (dm-7): Unmounting Filesystem e2159bbc-18fb-4d4b-a6c5-14c97b8e5380
> [56516.925145] 6.8.0-rc4-00129-g14dd46cf31f4 09:32:56
> [56517.039873] XFS (dm-7): Mounting V5 Filesystem e2159bbc-18fb-4d4b-a6c5-14c97b8e5380
> [56986.017144] XFS (dm-7): Ending clean mount
> [57454.876371] 6.8.0-rc4-00129-g14dd46cf31f4 09:48:34

Same again.

Can you post the 'xfs_info /mnt/pt' for that filesystem?

> And rebooting to the kernel before the offending commit:
> 
> [   60.177951] 6.8.0-rc4-00128-g8541a7d9da2d 10:23:00
> [   61.009283] SGI XFS with ACLs, security attributes, realtime, scrub, quota, no debug enabled
> [   61.017422] XFS (dm-7): Mounting V5 Filesystem e2159bbc-18fb-4d4b-a6c5-14c97b8e5380
> [   61.351100] XFS (dm-7): Ending clean mount
> [   61.366359] 6.8.0-rc4-00128-g8541a7d9da2d 10:23:01
> [   61.367673] XFS (dm-7): Unmounting Filesystem e2159bbc-18fb-4d4b-a6c5-14c97b8e5380
> [   61.444552] 6.8.0-rc4-00128-g8541a7d9da2d 10:23:01
> [   61.459358] XFS (dm-7): Mounting V5 Filesystem e2159bbc-18fb-4d4b-a6c5-14c97b8e5380
> [   61.513938] XFS (dm-7): Ending clean mount
> [   61.524056] 6.8.0-rc4-00128-g8541a7d9da2d 10:23:01

Yeah, that's what I'd expect to see.

But, hold on, the kernel version you are testing is apparently is in
the middle of 6.8-rc4. This commit wasn't merged until 6.9-rc1 and
there were no XFS changes merged in the between 6.8-rc3 and 6.8-rc6.
So as the bisect is walking back in time through the XFS commits,
the base kernel is also changing. Hence there's a lot more change
in the kernel being tested by each bisect step than just the XFS
commits, right?

This smells like a bisect jumping randomly backwards in time as it
lands inside merges rather than bisecting the order in which commits
were merged into the main tree. Can you post the full bisect log?

-Dave.

-- 
Dave Chinner
david@xxxxxxxxxxxxx