Re: [PATCH 4/9] xfs: introduce background inode reclaim work

Alex Elder <aelder@xxxxxxx> · Thu, 07 Apr 2011 16:16:22 -0500

On Thu, 2011-04-07 at 11:57 +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@xxxxxxxxxx>
> 
> Background inode reclaim needs to run more frequently that the XFS
> syncd work is run as 30s is too long between optimal reclaim runs.
> Add a new periodic work item to the xfs syncd workqueue to run a
> fast, non-blocking inode reclaim scan.
> 
> Background inode reclaim is kicked by the act of marking inodes for
> reclaim.  When an AG is first marked as having reclaimable inodes,
> the background reclaim work is kicked. It will continue to run
> periodically untill it detects that there are no more reclaimable
> inodes. It will be kicked again when the first inode is queued for
> reclaim.
> 
> To ensure shrinker based inode reclaim throttles to the inode
> cleaning and reclaim rate but still reclaim inodes efficiently, make it kick the
> background inode reclaim so that when we are low on memory we are
> trying to reclaim inodes as efficiently as possible. This kick shoul
> d not be necessary, but it will protect against failures to kick the
> background reclaim when inodes are first dirtied.
> 
> To provide the rate throttling, make the shrinker pass do
> synchronous inode reclaim so that it blocks on inodes under IO. This
> means that the shrinker will reclaim inodes rather than just
> skipping over them, but it does not adversely affect the rate of
> reclaim because most dirty inodes are already under IO due to the
> background reclaim work the shrinker kicked.
> 
> These two modifications solve one of the two OOM killer invocations
> Chris Mason reported recently when running a stress testing script.
> The particular workload trigger for the OOM killer invocation is
> where there are more threads than CPUs all unlinking files in an
> extremely memory constrained environment. Unlike other solutions,
> this one does not have a performance impact on performance when
> memory is not constrained or the number of concurrent threads
> operating is <= to the number of CPUs.
> 
> Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
> Reviewed-by: Christoph Hellwig <hch@xxxxxx>

Looks good.

Reviewed-by: Alex Elder <aelder@xxxxxxx>

> @@ -470,6 +469,52 @@ xfs_sync_worker(
>  }
>  
>  /*
> + * Queue a new inode reclaim pass if there are reclaimable inodes and there
> + * isn't a reclaim pass already in progress. By default it runs every 5s based
> + * on the xfs syncd work default of 30s. Perhaps this should have it's own

Agreed--I was going to say that but then I noticed your comment.

> + * tunable, but that can be done if this method proves to be ineffective or too
> + * aggressive.
> + */
> +static void
> +xfs_syncd_queue_reclaim(
> +	struct xfs_mount        *mp)
> +{
> +
> +	/*
> +	 * We can have inodes enter reclaim after we've shut down the syncd
> +	 * workqueue during unmount, so don't allow reclaim work to be queued
> +	 * during unmount.
> +	 */
> +	if (!(mp->m_super->s_flags & MS_ACTIVE))
> +		return;
> +
> +	rcu_read_lock();
> +	if (radix_tree_tagged(&mp->m_perag_tree, XFS_ICI_RECLAIM_TAG)) {
> +		queue_delayed_work(xfs_syncd_wq, &mp->m_reclaim_work,
> +			msecs_to_jiffies(xfs_syncd_centisecs / 6 * 10));

Probably better to do the multiply before the divide here.
(But whatever... it's heuristic.)

> +	}
> +	rcu_read_unlock();
> +}
> +
. . .

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs