Re: [PATCH 08/21] xfs: defer iput on certain inodes while scrub / repair are running

Dave Chinner <david@xxxxxxxxxxxxx> · Fri, 29 Jun 2018 09:37:21 +1000

On Sun, Jun 24, 2018 at 12:24:20PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@xxxxxxxxxx>
> 
> Destroying an incore inode sometimes requires some work to be done on
> the inode.  For example, post-EOF blocks on a non-PREALLOC inode are
> trimmed, and copy-on-write staging extents are freed.  This work is done
> in separate transactions, which is bad for scrub and repair because (a)
> we already have a transaction and can't nest them, and (b) if we've
> frozen the filesystem for scrub/repair work, that (regular) transaction
> allocation will block on the freeze.
> 
> Therefore, if we detect that work has to be done to destroy the incore
> inode, we'll just hang on to the reference until after the scrub is
> finished.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@xxxxxxxxxx>

Darrick, I'll just repeat what we discussed on #xfs here so we have
in it the archive and everyone else knows why this is probably going
to be done differently.

I think we should move deferred inode inactivation processing into
the background reclaim radix tree walker rather than introduce a
special new "don't iput this inode yet" state. We're really only
trying to prevent the transactions that xfs_inactive() may run
throught iput() when the filesystem is frozen, and we already stop
background reclaim processing when the fs is frozen.

I've always intended that xfs_fs_destroy_inode() basically becomes a
no-op that just queues the inode for final inactivation, freeing and
reclaim - right now it ony does the reclaim work in the background.
I first proposed this back in ~2008 here:

http://xfs.org/index.php/Improving_inode_Caching#Inode_Unlink

At this point, it really only requires a new inode flag to indicate
that it has an inactivation pending - we set that if xfs_inactive
needs to do work before the inode can be reclaimed, and have a
separate per-ag work queue that walks the inode radix tree finding
reclaimable inodes that have the NEED_INACTIVATION inode flag set.
This way background reclaim doesn't get stuck on them.

This has benefits for many operations e.g. bulk processing of
inode inactivation and freeing either concurrently or after rm -rf
rather than at unlink syscall exit, VFS inode cache shrinker never
blocks on inactivation needing to run transactions, etc.

It also allows us to turn off inactivation on a per-AG basis,
meaning that when we are rebuilding an AG structure in repair (e.g.
the rmap btree) we can turn off inode inactivation and reclaim for
that AG rather than needing to freeze the entire filesystem....

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html