Re: [PATCH 8/9] xfs: reap large AG metadata extents when possible

Dave Chinner <david@xxxxxxxxxxxxx> · Tue, 20 Jun 2023 15:47:08 +1000

On Thu, May 25, 2023 at 05:45:03PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@xxxxxxxxxx>
> 
> When we're freeing extents that have been set in a bitmap, break the
> bitmap extent into multiple sub-extents organized by fate, and reap the
> extents.  This enables us to dispose of old resources more efficiently
> than doing them block by block.
> 
> While we're at it, rename the reaping functions to make it clear that
> they're reaping per-AG extents.
> 
> Signed-off-by: Darrick J. Wong <djwong@xxxxxxxxxx>
> ---

.....
> +xreap_agextent_binval(
> +	struct xreap_state	*rs,
> +	xfs_agblock_t		agbno,
> +	xfs_extlen_t		*aglenp)
>  {
> -	struct xfs_buf		*bp = NULL;
> -	int			error;
> +	struct xfs_scrub	*sc = rs->sc;
> +	struct xfs_perag	*pag = sc->sa.pag;
> +	struct xfs_mount	*mp = sc->mp;
> +	xfs_agnumber_t		agno = sc->sa.pag->pag_agno;
> +	xfs_agblock_t		agbno_next = agbno + *aglenp;
> +	xfs_agblock_t		bno = agbno;
>  
>  	/*
> -	 * If there's an incore buffer for exactly this block, invalidate it.
>  	 * Avoid invalidating AG headers and post-EOFS blocks because we never
>  	 * own those.
>  	 */
> -	if (!xfs_verify_fsbno(sc->mp, fsbno))
> +	if (!xfs_verify_agbno(pag, agbno) ||
> +	    !xfs_verify_agbno(pag, agbno_next - 1))
>  		return;
>  
>  	/*
> -	 * We assume that the lack of any other known owners means that the
> -	 * buffer can be locked without risk of deadlocking.
> +	 * If there are incore buffers for these blocks, invalidate them.  We
> +	 * assume that the lack of any other known owners means that the buffer
> +	 * can be locked without risk of deadlocking.  The buffer cache cannot
> +	 * detect aliasing, so employ nested loops to scan for incore buffers
> +	 * of any plausible size.
>  	 */
> -	error = xfs_buf_incore(sc->mp->m_ddev_targp,
> -			XFS_FSB_TO_DADDR(sc->mp, fsbno),
> -			XFS_FSB_TO_BB(sc->mp, 1), XBF_BCACHE_SCAN, &bp);
> -	if (error)
> -		return;
> -
> -	xfs_trans_bjoin(sc->tp, bp);
> -	xfs_trans_binval(sc->tp, bp);
> +	while (bno < agbno_next) {
> +		xfs_agblock_t	fsbcount;
> +		xfs_agblock_t	max_fsbs;
> +
> +		/*
> +		 * Max buffer size is the max remote xattr buffer size, which
> +		 * is one fs block larger than 64k.
> +		 */
> +		max_fsbs = min_t(xfs_agblock_t, agbno_next - bno,
> +				xfs_attr3_rmt_blocks(mp, XFS_XATTR_SIZE_MAX));
> +
> +		for (fsbcount = 1; fsbcount < max_fsbs; fsbcount++) {
> +			struct xfs_buf	*bp = NULL;
> +			xfs_daddr_t	daddr;
> +			int		error;
> +
> +			daddr = XFS_AGB_TO_DADDR(mp, agno, bno);
> +			error = xfs_buf_incore(mp->m_ddev_targp, daddr,
> +					XFS_FSB_TO_BB(mp, fsbcount),
> +					XBF_BCACHE_SCAN, &bp);
> +			if (error)
> +				continue;
> +
> +			xfs_trans_bjoin(sc->tp, bp);
> +			xfs_trans_binval(sc->tp, bp);
> +			rs->invalidated++;

Hmmm. O(N^2) brute force lookup walk to find any buffer at that
specific daddr?  That's going to have an impact on running systems
by hammering the perag hash lock.

I didn't know this was being done before I suggested XBF_ANY_MATCH,
but now I'm wondering how can we even have multiple buffers in the
cache at a given address? The whole point of the ASSERT() in the
match function is to indicate this should not ever happen.

i.e. xfs_buf_find_insert() uses rhashtable_lookup_get_insert_fast(),
which will return an existing buffer only if it has a match length.
The compare function used at insert time will throw an assert fail
if any other buffer exists at that address that has a mismatched
length that is not stale. There is no way to avoid that ASSERT check
on insert.

Hence, AFAICT, the only way we can get multiple buffers into the
cache at the same daddr with different lengths is for all the
existing buffers at that daddr to all be stale at insert time.  In
which case, why do we need to invalidate buffers that are already
stale (i.e. been invalidated)?

What am I missing here? i.e. this appears to cater for behaviour
that doesn't currently exist in the buffer cache, and I'm not sure
we even want to allow to occur in the buffer cache given that it
generally indicates in-memory metadata corruption.....

Hmmmm. What if the buffer was already stale? The lookup will then
clear all the stale state from it, leave it with no valid contents
which we then invalidate again and log a new buffer cancel item for
it. What are the implications of that? (My brain is too full of
stuff trying to understand what this code is doing to be able to
think this through right now.)

-Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx