Re: [PATCH v3] xfs: cache minimum realtime summary level

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Nov 13, 2018 at 11:28:59AM -0800, Omar Sandoval wrote:
> From: Omar Sandoval <osandov@xxxxxx>
> 
> The realtime summary is a two-dimensional array on disk, effectively:
> 
> u32 rsum[log2(number of realtime extents) + 1][number of blocks in the bitmap]
> 
> rsum[log][bbno] is the number of extents of size 2**log which start in
> bitmap block bbno.
> 
> xfs_rtallocate_extent_near() uses xfs_rtany_summary() to check whether
> rsum[log][bbno] != 0 for any log level. However, the summary array is
> stored in row-major order (i.e., like an array in C), so all of these
> entries are not adjacent, but rather spread across the entire summary
> file. In the worst case (a full bitmap block), xfs_rtany_summary() has
> to check every level.
> 
> This means that on a moderately-used realtime device, an allocation will
> waste a lot of time finding, reading, and releasing buffers for the
> realtime summary. In particular, one of our storage services (which runs
> on servers with 8 very slow CPUs and 15 8 TB XFS realtime filesystems)
> spends almost 5% of its CPU cycles in xfs_rtbuf_get() and
> xfs_trans_brelse() called from xfs_rtany_summary().
> 
> One solution would be to also store the summary with the dimensions
> swapped. However, this would require a disk format change to a very old
> component of XFS.
> 
> Instead, we can cache the minimum size which contains any extents. We do
> so lazily; rather than guaranteeing that the cache contains the precise
> minimum, it always contains a loose lower bound which we tighten when we
> read or update a summary block. This only uses a few kilobytes of memory
> and is already serialized via the realtime bitmap and summary inode
> locks, so the cost is minimal. With this change, the same workload only
> spends 0.2% of its CPU cycles in the realtime allocator.
> 
> Signed-off-by: Omar Sandoval <osandov@xxxxxx>

Looks good, will put this in my tree for 4.21/5.0.

Reviewed-by: Darrick J. Wong <darrick.wong@xxxxxxxxxx>

--D

> ---
> Based on Linus' master branch.
> 
> Changes from v2:
> - Allow the cache allocation to fail, in which case we just don't use it
> 
> Changes from v1:
> - Clarify comment in xfs_rtmount_inodes().
> - Use kmem_* instead of kvmalloc/kvfree
> 
>  fs/xfs/libxfs/xfs_rtbitmap.c |  6 ++++++
>  fs/xfs/xfs_mount.h           |  7 +++++++
>  fs/xfs/xfs_rtalloc.c         | 25 +++++++++++++++++++++----
>  3 files changed, 34 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_rtbitmap.c b/fs/xfs/libxfs/xfs_rtbitmap.c
> index b228c821bae6..eaaff67e9626 100644
> --- a/fs/xfs/libxfs/xfs_rtbitmap.c
> +++ b/fs/xfs/libxfs/xfs_rtbitmap.c
> @@ -505,6 +505,12 @@ xfs_rtmodify_summary_int(
>  		uint first = (uint)((char *)sp - (char *)bp->b_addr);
>  
>  		*sp += delta;
> +		if (mp->m_rsum_cache) {
> +			if (*sp == 0 && log == mp->m_rsum_cache[bbno])
> +				mp->m_rsum_cache[bbno]++;
> +			if (*sp != 0 && log < mp->m_rsum_cache[bbno])
> +				mp->m_rsum_cache[bbno] = log;
> +		}
>  		xfs_trans_log_buf(tp, bp, first, first + sizeof(*sp) - 1);
>  	}
>  	if (sum)
> diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
> index 7964513c3128..39f04aca8c3a 100644
> --- a/fs/xfs/xfs_mount.h
> +++ b/fs/xfs/xfs_mount.h
> @@ -89,6 +89,13 @@ typedef struct xfs_mount {
>  	int			m_logbsize;	/* size of each log buffer */
>  	uint			m_rsumlevels;	/* rt summary levels */
>  	uint			m_rsumsize;	/* size of rt summary, bytes */
> +	/*
> +	 * Optional cache of rt summary level per bitmap block with the
> +	 * invariant that m_rsum_cache[bbno] <= the minimum i for which
> +	 * rsum[i][bbno] != 0. Reads and writes are serialized by the rsumip
> +	 * inode lock.
> +	 */
> +	uint8_t			*m_rsum_cache;
>  	struct xfs_inode	*m_rbmip;	/* pointer to bitmap inode */
>  	struct xfs_inode	*m_rsumip;	/* pointer to summary inode */
>  	struct xfs_inode	*m_rootip;	/* pointer to root directory */
> diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c
> index 926ed314ffba..aefd63d46397 100644
> --- a/fs/xfs/xfs_rtalloc.c
> +++ b/fs/xfs/xfs_rtalloc.c
> @@ -64,8 +64,12 @@ xfs_rtany_summary(
>  	int		log;		/* loop counter, log2 of ext. size */
>  	xfs_suminfo_t	sum;		/* summary data */
>  
> +	/* There are no extents at levels < m_rsum_cache[bbno]. */
> +	if (mp->m_rsum_cache && low < mp->m_rsum_cache[bbno])
> +		low = mp->m_rsum_cache[bbno];
> +
>  	/*
> -	 * Loop over logs of extent sizes.  Order is irrelevant.
> +	 * Loop over logs of extent sizes.
>  	 */
>  	for (log = low; log <= high; log++) {
>  		/*
> @@ -80,13 +84,17 @@ xfs_rtany_summary(
>  		 */
>  		if (sum) {
>  			*stat = 1;
> -			return 0;
> +			goto out;
>  		}
>  	}
>  	/*
>  	 * Found nothing, return failure.
>  	 */
>  	*stat = 0;
> +out:
> +	/* There were no extents at levels < log. */
> +	if (mp->m_rsum_cache && log > mp->m_rsum_cache[bbno])
> +		mp->m_rsum_cache[bbno] = log;
>  	return 0;
>  }
>  
> @@ -1187,8 +1195,8 @@ xfs_rtmount_init(
>  }
>  
>  /*
> - * Get the bitmap and summary inodes into the mount structure
> - * at mount time.
> + * Get the bitmap and summary inodes and the summary cache into the mount
> + * structure at mount time.
>   */
>  int					/* error */
>  xfs_rtmount_inodes(
> @@ -1211,6 +1219,14 @@ xfs_rtmount_inodes(
>  		return error;
>  	}
>  	ASSERT(mp->m_rsumip != NULL);
> +	/*
> +	 * The rsum cache is initialized to all zeroes, which is trivially a
> +	 * lower bound on the minimum level with any free extents. We can
> +	 * continue without the cache if it couldn't be allocated.
> +	 */
> +	mp->m_rsum_cache = kmem_zalloc_large(sbp->sb_rbmblocks, KM_SLEEP);
> +	if (!mp->m_rsum_cache)
> +		xfs_warn(mp, "could not allocate realtime summary cache");
>  	return 0;
>  }
>  
> @@ -1218,6 +1234,7 @@ void
>  xfs_rtunmount_inodes(
>  	struct xfs_mount	*mp)
>  {
> +	kmem_free(mp->m_rsum_cache);
>  	if (mp->m_rbmip)
>  		xfs_irele(mp->m_rbmip);
>  	if (mp->m_rsumip)
> -- 
> 2.19.1
> 



[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux