Re: XFS mount timeout in linux-6.9.11

Anders Blomdell <anders.blomdell@xxxxxxxxx> · Sat, 10 Aug 2024 10:29:38 +0200

On 2024-08-10 00:55, Dave Chinner wrote:
On Fri, Aug 09, 2024 at 07:08:41PM +0200, Anders Blomdell wrote:
With a filesystem that contains a very large amount of hardlinks
the time to mount the filesystem skyrockets to around 15 minutes
on 6.9.11-200.fc40.x86_64 as compared to around 1 second on
6.8.10-300.fc40.x86_64,

That sounds like the filesystem is not being cleanly unmounted on
6.9.11-200.fc40.x86_64 and so is having to run log recovery on the
next mount and so is recovering lots of hardlink operations that
weren't written back at unmount.

Hence this smells like an unmount or OS shutdown process issue, not
a mount issue. e.g. if something in the shutdown scripts hangs,
systemd may time out the shutdown and power off/reboot the machine
wihtout completing the full shutdown process. The result of this is
the filesystem has to perform recovery on the next mount and so you
see a long mount time because of some other unrelated issue.

What is the dmesg output for the mount operations? That will tell us
if journal recovery is the difference for certain.  Have you also
checked to see what is happening in the shutdown/unmount process
before the long mount times occur?
echo $(uname -r) $(date +%H:%M:%S) > /dev/kmsg
mount /dev/vg1/test /test
echo $(uname -r) $(date +%H:%M:%S) > /dev/kmsg
umount /test
echo $(uname -r) $(date +%H:%M:%S) > /dev/kmsg
mount /dev/vg1/test /test
echo $(uname -r) $(date +%H:%M:%S) > /dev/kmsg

[55581.470484] 6.8.0-rc4-00129-g14dd46cf31f4 09:17:20
[55581.492733] XFS (dm-7): Mounting V5 Filesystem e2159bbc-18fb-4d4b-a6c5-14c97b8e5380
[56048.292804] XFS (dm-7): Ending clean mount
[56516.433008] 6.8.0-rc4-00129-g14dd46cf31f4 09:32:55
[56516.434695] XFS (dm-7): Unmounting Filesystem e2159bbc-18fb-4d4b-a6c5-14c97b8e5380
[56516.925145] 6.8.0-rc4-00129-g14dd46cf31f4 09:32:56
[56517.039873] XFS (dm-7): Mounting V5 Filesystem e2159bbc-18fb-4d4b-a6c5-14c97b8e5380
[56986.017144] XFS (dm-7): Ending clean mount
[57454.876371] 6.8.0-rc4-00129-g14dd46cf31f4 09:48:34

And rebooting to the kernel before the offending commit:

[   60.177951] 6.8.0-rc4-00128-g8541a7d9da2d 10:23:00
[   61.009283] SGI XFS with ACLs, security attributes, realtime, scrub, quota, no debug enabled
[   61.017422] XFS (dm-7): Mounting V5 Filesystem e2159bbc-18fb-4d4b-a6c5-14c97b8e5380
[   61.351100] XFS (dm-7): Ending clean mount
[   61.366359] 6.8.0-rc4-00128-g8541a7d9da2d 10:23:01
[   61.367673] XFS (dm-7): Unmounting Filesystem e2159bbc-18fb-4d4b-a6c5-14c97b8e5380
[   61.444552] 6.8.0-rc4-00128-g8541a7d9da2d 10:23:01
[   61.459358] XFS (dm-7): Mounting V5 Filesystem e2159bbc-18fb-4d4b-a6c5-14c97b8e5380
[   61.513938] XFS (dm-7): Ending clean mount
[   61.524056] 6.8.0-rc4-00128-g8541a7d9da2d 10:23:01

this of course makes booting drop
into emergency mode if the filesystem is in /etc/fstab. A git bisect
nails the offending commit as 14dd46cf31f4aaffcf26b00de9af39d01ec8d547.

Commit 14dd46cf31f4 ("xfs: split xfs_inobt_init_cursor") doesn't
seem like a candidate for any sort of change of behaviour. It's just
a refactoring patch that doesn't change any behaviour at all. 
Are you sure the reproducer you used for the bisect is reliable?
Yes.

The filesystem is a collection of daily snapshots of a live filesystem
collected over a number of years, organized as a storage of unique files,
that are reflinked to inodes that contain the actual {owner,group,permission,
mtime}, and these inodes are hardlinked into the daily snapshot trees.

So it's reflinks and hardlinks. Recovering a reflink takes a lot
more CPU time and journal traffic than recovering a hardlink, so
that will also be a contributing factor.

The numbers for the filesystem are:

   Total file size:           3.6e+12 bytes

3.6TB, not a large data set by any measurement.

   Unique files:             12.4e+06

12M files, not a lot.

   Reflink inodes:           18.6e+06

18M inodes with shared extents, not a huge number, either.

   Hardlinks:                15.7e+09

Ok, 15.7 billion hardlinks is a *lot*.
:-)

And by a lot, I mean that's the largest number of hardlinks in an
XFS filesystem I've personally ever heard about in 20 years.
Glad to be of service.

As a warning: hope like hell you never have a disaster with that
storage and need to run xfs_repair on that filesystem. It you don't
have many, many TBs of RAM, just checking the hardlinks resolve
correctly could take billions of IOs...
I hope so as well :-), but it is not a critical system (used for testing
and statistics, will take about a month to rebuild though :-/).

-Dave.