Re: [PATCH 2/3] xfs: evict all inodes involved with log redo item recovery

Brian Foster <bfoster@xxxxxxxxxx> · Thu, 10 Aug 2017 10:51:26 -0400

On Wed, Aug 09, 2017 at 10:23:51PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@xxxxxxxxxx>
> 
> When we introduced the bmap redo log items, we set MS_ACTIVE on the
> mountpoint and XFS_IRECOVERY on the inode to prevent unlinked inodes
> from being truncated prematurely during log recovery.  This also had the
> effect of putting linked inodes on the lru instead of evicting them.
> 
> Unfortunately, we neglected to find all those unreferenced lru inodes
> and evict them after finishing log recovery, which means that we leak
> them if anything goes wrong in the rest of xfs_mountfs, because the lru
> is only cleaned out on unmount.
> 
> Therefore, look for unreferenced inodes in the lru list immediately
> after clearing MS_ACTIVE.  If we find any, igrab/iput them so that
> they're evicted from memory.
> 
> Fixes: 17c12bcd30 ("xfs: when replaying bmap operations, don't let unlinked inodes get reaped")
> Signed-off-by: Darrick J. Wong <darrick.wong@xxxxxxxxxx>
> ---
>  fs/xfs/xfs_mount.c |   29 +++++++++++++++++++++++++++++
>  1 file changed, 29 insertions(+)
> 
> 
> diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
> index d63a367..8da5155 100644
> --- a/fs/xfs/xfs_mount.c
> +++ b/fs/xfs/xfs_mount.c
> @@ -616,6 +616,34 @@ xfs_default_resblks(xfs_mount_t *mp)
>  }
>  
>  /*
> + * Find inodes that were put on the LRU during the last part of log
> + * recovery that are no longer referenced by anything, and igrab/iput
> + * them so that they'll be evicted now.
> + *
> + * We let all inodes involved in redo item processing end up on the LRU
> + * instead of being evicted immediately so that if we do something to an
> + * unlinked inode, the irele won't cause premature truncation and
> + * freeing of the inode, which results in log recovery failure.  We have
> + * to evict the unreferenced lru inodes here because we don't otherwise
> + * clean up the lru if there's a subsequent failure in xfs_mountfs,
> + * which leads to us leaking the inodes if nothing else (e.g. quotacheck)
> + * references the inodes before the failure occurs.
> + */
> +STATIC void
> +xfs_mountfs_evict_log_redo_inodes(
> +	struct xfs_mount	*mp)
> +{
> +	struct super_block	*sb = mp->m_super;
> +	struct inode		*inode;
> +	struct inode		*next;
> +

Perhaps assert that MS_ACTIVE is cleared here as a bit of
self-documentation..?

> +	list_for_each_entry_safe(inode, next, &sb->s_inodes, i_sb_list) {
> +		if (!atomic_read(&inode->i_count))
> +			iput(igrab(inode));
> +	}

I assume we don't need s_inodes_list_lock here since nothing else should
be messing with the list..? If so, this looks fine otherwise. Though any
reason you didn't end up calling evict_inodes() here? Because it's not
exported..?

Brian

> +}
> +
> +/*
>   * This function does the following on an initial mount of a file system:
>   *	- reads the superblock from disk and init the mount struct
>   *	- if we're a 32-bit kernel, do a size check on the superblock
> @@ -960,6 +988,7 @@ xfs_mountfs(
>  	mp->m_super->s_flags |= MS_ACTIVE;
>  	error = xfs_log_mount_finish(mp);
>  	mp->m_super->s_flags &= ~MS_ACTIVE;
> +	xfs_mountfs_evict_log_redo_inodes(mp);
>  	if (error) {
>  		xfs_warn(mp, "log mount finish failed");
>  		goto out_rtunmount;
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html