From: Darrick J. Wong <djwong@xxxxxxxxxx> Back when I wrote commit a03297a0ca9f2, I had thought that we'd be doing users a favor by only marking inodes dontcache at the end of a scrub operation, and only if there's only one reference to that inode. This was more or less true back when I_DONTCACHE was an XFS iflag and the only thing it did was change the outcome of xfs_fs_drop_inode to 1. Note: If there are dentries pointing to the inode when scrub finishes, the inode will have positive i_count and stay around in cache until dentry reclaim. But now we have d_mark_dontcache, which cause the inode *and* the dentries attached to it all to be marked I_DONTCACHE, which means that we drop the dentries ASAP, which drops the inode ASAP. This is bad if scrub found problems with the inode, because now they can be scheduled for inactivation, which can cause inodegc to trip on it and shut down the filesystem. Even if the inode isn't bad, this is still suboptimal because phases 3-7 each initiate inode scans. Dropping the inode immediately during phase 3 is silly because phase 5 will reload it and drop it immediately, etc. It's fine to mark the inodes dontcache, but if there have been accesses to the file that set up dentries, we should keep them. I validated this by setting up ftrace to capture xfs_iget_recycle* tracepoints and ran xfs/285 for 30 seconds. With current djwong-wtf I saw ~30,000 recycle events. I then dropped the d_mark_dontcache calls and set XFS_IGET_DONTCACHE, and the recycle events dropped to ~5,000 per 30 seconds. Therefore, grab the inode with XFS_IGET_DONTCACHE, which only has the effect of setting I_DONTCACHE for cache misses. Remove the d_mark_dontcache call that can happen in xchk_irele. Fixes: a03297a0ca9f2 ("xfs: manage inode DONTCACHE status at irele time") Signed-off-by: Darrick J. Wong <djwong@xxxxxxxxxx> --- fs/xfs/scrub/common.c | 12 +++--------- fs/xfs/scrub/iscan.c | 13 +++++++++++-- fs/xfs/scrub/scrub.h | 7 +++++++ 3 files changed, 21 insertions(+), 11 deletions(-) diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c index 48302532d10d1..c3612419c9f8a 100644 --- a/fs/xfs/scrub/common.c +++ b/fs/xfs/scrub/common.c @@ -782,7 +782,7 @@ xchk_iget( { ASSERT(sc->tp != NULL); - return xfs_iget(sc->mp, sc->tp, inum, XFS_IGET_UNTRUSTED, 0, ipp); + return xfs_iget(sc->mp, sc->tp, inum, XCHK_IGET_FLAGS, 0, ipp); } /* @@ -833,8 +833,8 @@ xchk_iget_agi( if (error) return error; - error = xfs_iget(mp, tp, inum, - XFS_IGET_NORETRY | XFS_IGET_UNTRUSTED, 0, ipp); + error = xfs_iget(mp, tp, inum, XFS_IGET_NORETRY | XCHK_IGET_FLAGS, 0, + ipp); if (error == -EAGAIN) { /* * The inode may be in core but temporarily unavailable and may @@ -1061,12 +1061,6 @@ xchk_irele( spin_lock(&VFS_I(ip)->i_lock); VFS_I(ip)->i_state &= ~I_DONTCACHE; spin_unlock(&VFS_I(ip)->i_lock); - } else if (atomic_read(&VFS_I(ip)->i_count) == 1) { - /* - * If this is the last reference to the inode and the caller - * permits it, set DONTCACHE to avoid thrashing. - */ - d_mark_dontcache(VFS_I(ip)); } xfs_irele(ip); diff --git a/fs/xfs/scrub/iscan.c b/fs/xfs/scrub/iscan.c index c380207702e26..cf9d983667cec 100644 --- a/fs/xfs/scrub/iscan.c +++ b/fs/xfs/scrub/iscan.c @@ -407,6 +407,15 @@ xchk_iscan_iget_retry( return -EAGAIN; } +/* + * For an inode scan, we hold the AGI and want to try to grab a batch of + * inodes. Holding the AGI prevents inodegc from clearing freed inodes, + * so we must use noretry here. For every inode after the first one in the + * batch, we don't want to wait, so we use retry there too. Finally, use + * dontcache to avoid polluting the cache. + */ +#define ISCAN_IGET_FLAGS (XFS_IGET_NORETRY | XFS_IGET_DONTCACHE) + /* * Grab an inode as part of an inode scan. While scanning this inode, the * caller must ensure that no other threads can modify the inode until a call @@ -434,7 +443,7 @@ xchk_iscan_iget( ASSERT(iscan->__inodes[0] == NULL); /* Fill the first slot in the inode array. */ - error = xfs_iget(sc->mp, sc->tp, ino, XFS_IGET_NORETRY, 0, + error = xfs_iget(sc->mp, sc->tp, ino, ISCAN_IGET_FLAGS, 0, &iscan->__inodes[idx]); trace_xchk_iscan_iget(iscan, error); @@ -507,7 +516,7 @@ xchk_iscan_iget( ASSERT(iscan->__inodes[idx] == NULL); - error = xfs_iget(sc->mp, sc->tp, ino, XFS_IGET_NORETRY, 0, + error = xfs_iget(sc->mp, sc->tp, ino, ISCAN_IGET_FLAGS, 0, &iscan->__inodes[idx]); if (error) break; diff --git a/fs/xfs/scrub/scrub.h b/fs/xfs/scrub/scrub.h index 4e7e3edb6350c..1da10182f7f42 100644 --- a/fs/xfs/scrub/scrub.h +++ b/fs/xfs/scrub/scrub.h @@ -60,6 +60,13 @@ static inline int xchk_maybe_relax(struct xchk_relax *widget) #define XCHK_GFP_FLAGS ((__force gfp_t)(GFP_KERNEL | __GFP_NOWARN | \ __GFP_RETRY_MAYFAIL)) +/* + * For opening files by handle for fsck operations, we don't trust the inumber + * or the allocation state; therefore, perform an untrusted lookup. We don't + * want these inodes to pollute the cache, so mark them for immediate removal. + */ +#define XCHK_IGET_FLAGS (XFS_IGET_UNTRUSTED | XFS_IGET_DONTCACHE) + /* Type info and names for the scrub types. */ enum xchk_type { ST_NONE = 1, /* disabled */