Re: [PATCH 2/3] xfs: evict all inodes involved with log redo item recovery

Brian Foster <bfoster@xxxxxxxxxx> · Thu, 10 Aug 2017 13:54:25 -0400

On Thu, Aug 10, 2017 at 10:18:51AM -0700, Darrick J. Wong wrote:
> On Thu, Aug 10, 2017 at 10:51:26AM -0400, Brian Foster wrote:
> > On Wed, Aug 09, 2017 at 10:23:51PM -0700, Darrick J. Wong wrote:
> > > From: Darrick J. Wong <darrick.wong@xxxxxxxxxx>
> > > 
> > > When we introduced the bmap redo log items, we set MS_ACTIVE on the
> > > mountpoint and XFS_IRECOVERY on the inode to prevent unlinked inodes
> > > from being truncated prematurely during log recovery.  This also had the
> > > effect of putting linked inodes on the lru instead of evicting them.
> > > 
> > > Unfortunately, we neglected to find all those unreferenced lru inodes
> > > and evict them after finishing log recovery, which means that we leak
> > > them if anything goes wrong in the rest of xfs_mountfs, because the lru
> > > is only cleaned out on unmount.
> > > 
> > > Therefore, look for unreferenced inodes in the lru list immediately
> > > after clearing MS_ACTIVE.  If we find any, igrab/iput them so that
> > > they're evicted from memory.
> > > 
> > > Fixes: 17c12bcd30 ("xfs: when replaying bmap operations, don't let unlinked inodes get reaped")
> > > Signed-off-by: Darrick J. Wong <darrick.wong@xxxxxxxxxx>
> > > ---
> > >  fs/xfs/xfs_mount.c |   29 +++++++++++++++++++++++++++++
> > >  1 file changed, 29 insertions(+)
> > > 
> > > 
> > > diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
> > > index d63a367..8da5155 100644
> > > --- a/fs/xfs/xfs_mount.c
> > > +++ b/fs/xfs/xfs_mount.c
> > > @@ -616,6 +616,34 @@ xfs_default_resblks(xfs_mount_t *mp)
> > >  }
> > >  
> > >  /*
> > > + * Find inodes that were put on the LRU during the last part of log
> > > + * recovery that are no longer referenced by anything, and igrab/iput
> > > + * them so that they'll be evicted now.
> > > + *
> > > + * We let all inodes involved in redo item processing end up on the LRU
> > > + * instead of being evicted immediately so that if we do something to an
> > > + * unlinked inode, the irele won't cause premature truncation and
> > > + * freeing of the inode, which results in log recovery failure.  We have
> > > + * to evict the unreferenced lru inodes here because we don't otherwise
> > > + * clean up the lru if there's a subsequent failure in xfs_mountfs,
> > > + * which leads to us leaking the inodes if nothing else (e.g. quotacheck)
> > > + * references the inodes before the failure occurs.
> > > + */
> > > +STATIC void
> > > +xfs_mountfs_evict_log_redo_inodes(
> > > +	struct xfs_mount	*mp)
> > > +{
> > > +	struct super_block	*sb = mp->m_super;
> > > +	struct inode		*inode;
> > > +	struct inode		*next;
> > > +
> > 
> > Perhaps assert that MS_ACTIVE is cleared here as a bit of
> > self-documentation..?
> 
> Huh?  That's the previous patch; MS_ACTIVE should already be clear when
> we get here.
> 

Right.. I mean add something like the following at the top of the
function:

ASSERT(!(mp->m_super->s_flags & MS_ACTIVE));

... so it's clear that this function only serves a purpose when the
filesystem is not active.

> > > +	list_for_each_entry_safe(inode, next, &sb->s_inodes, i_sb_list) {
> > > +		if (!atomic_read(&inode->i_count))
> > > +			iput(igrab(inode));
> > > +	}
> > 
> > I assume we don't need s_inodes_list_lock here since nothing else should
> > be messing with the list..?
> 
> Nothing should be messing with the list, though the inode_lru_list_del
> in iput_final will take s_inode_list_lock.
> 
> > If so, this looks fine otherwise. Though any reason you didn't end up
> > calling evict_inodes() here? Because it's not exported..?
> 
> I felt there's less messing around with vfs internals by sticking to
> igrab and iput (and letting them deal with the internal inode state)
> since that's what xfs uses elsewhere to manage inode life cycles.
> 

Ok, sounds good:

Reviewed-by: Brian Foster <bfoster@xxxxxxxxxx>

> --D
> 
> > 
> > Brian
> > 
> > > +}
> > > +
> > > +/*
> > >   * This function does the following on an initial mount of a file system:
> > >   *	- reads the superblock from disk and init the mount struct
> > >   *	- if we're a 32-bit kernel, do a size check on the superblock
> > > @@ -960,6 +988,7 @@ xfs_mountfs(
> > >  	mp->m_super->s_flags |= MS_ACTIVE;
> > >  	error = xfs_log_mount_finish(mp);
> > >  	mp->m_super->s_flags &= ~MS_ACTIVE;
> > > +	xfs_mountfs_evict_log_redo_inodes(mp);
> > >  	if (error) {
> > >  		xfs_warn(mp, "log mount finish failed");
> > >  		goto out_rtunmount;
> > > 
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > > the body of a message to majordomo@xxxxxxxxxxxxxxx
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@xxxxxxxxxxxxxxx
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html