[RFC] [PATCH 0/4] xfs: reactivate inodes immediately in xfs_iget

Dave Chinner <david@xxxxxxxxxxxxx> · Thu, 1 Feb 2024 11:30:12 +1100

Currently xfs_iget() will flush inodes queued for inactivation
rather than recycling them. Flushing the inodegc queues causes
inactivation to run and the inodes transistion to reclaimable where
they can then be recycled. the xfs_iget() code spins for a short
while before trying the lookup again, and will continue to do
so until the inode has moved to the reclaimable state, at which
point it will recycle the inode.

However, if the filesystem is frozen, we cannot flush the inode gc
queues because we can't make modifications during a freeze and so
inodegc is not running. Hence inodes that need inactivation that
they VFS then tries to reference again (e.g. shrinker reclaimed them
just before they were accessed), xfs_iget() will just spin on the
inode waiting for the freeze to go away so the inode can be flushed.

This can be triggered by creating a bunch of files with post-eof
blocks and stalling them on the inodegc queues like so:

# cp a-set-1MB-files* /mnt/xfs
# xfs_io -xr -c "freeze" /mnt/xfs
# echo 3 > /proc/sys/vm/drop_caches
# ls -l /mnt/test

If the timing is just right, then the 'ls -l' will hang spinning
on inodes as they are now sitting in XFS_NEED_INACTIVE state on
the inodegc queues and won't be processed until the filesystem is
thawed.

Instead of flushing the inode, we could just recycle the inode
immediately. That, however, is complicated by the use of lockless
single linked lists for the inodegc queues. We can't just remove
them, so we need to enable lazy removal from the inodegc queue.

To do this, we make the lockless list addition and removal atomic
w.r.t. the inode state changes via the ip->i_flags_lock. This lock
is also held during xfs_iget() lookups, so it serialises the inodegc
list processing against inode lookup as well.

This then enables us to use the XFS_NEED_INACTIVE state flag to
determine if the inode should be inactivated when removing it from
the inodegc list during inodegc work. i.e. the inodegc worker
decides if inactivation should take place, not the context that is
queuing the inode to the inodegc list.

Hence by clearing the XFS_NEED_INACTIVE flag, we can leave inodes on
the inodegc lists and know that they will not be inactivated when
the worker next runs and sees that inode. It will just remove it
from the list and skip over it.

This gives us lazy list removal, and now we can immediately
reactivate the inode during lookup. This is similar to the recycling
of reclaimable inodes, but just a little bit different. I haven't
tried to combine the implementations - it could be done, but I think
that gets in the way of seeing how reactivation is different from
recycling.

By doing this, it means that the above series of operations will no
longer hang waiting for a thaw to occur. Indeed, we can see the
inode recycle stat getting bumped when the above reproducer is run -
it reactivates the inodes instead of hanging:

# xfs_stats.pl | grep recycle
    xs_ig_frecycle.......            75    vn_reclaim............           304
# cp a-set-1MB-files* /mnt/xfs
# xfs_io -xr -c "freeze" /mnt/xfs
# echo 3 > /proc/sys/vm/drop_caches
# ls -l /mnt/test > /dev/null
# xfs_stats.pl | grep recycle
    xs_ig_frecycle.......           100    vn_reclaim............           330
# xfs_io -xr -c "thaw" /mnt/xfs
# rm -rf /mnt/test/a-set*
# umount /mnt/test
#

This is all pretty simple - I don't think I've missed anything and
it runs the full fstests auto group with a few different configs
without regressions.

Comments, thoughts?