Re: [PATCH 8/9] xfs: reap large AG metadata extents when possible

"Darrick J. Wong" <djwong@xxxxxxxxxx> · Fri, 7 Jul 2023 14:50:45 -0700

On Tue, Jun 20, 2023 at 03:47:08PM +1000, Dave Chinner wrote:
> On Thu, May 25, 2023 at 05:45:03PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@xxxxxxxxxx>
> > 
> > When we're freeing extents that have been set in a bitmap, break the
> > bitmap extent into multiple sub-extents organized by fate, and reap the
> > extents.  This enables us to dispose of old resources more efficiently
> > than doing them block by block.
> > 
> > While we're at it, rename the reaping functions to make it clear that
> > they're reaping per-AG extents.
> > 
> > Signed-off-by: Darrick J. Wong <djwong@xxxxxxxxxx>
> > ---
> 
> .....
> > +xreap_agextent_binval(
> > +	struct xreap_state	*rs,
> > +	xfs_agblock_t		agbno,
> > +	xfs_extlen_t		*aglenp)
> >  {
> > -	struct xfs_buf		*bp = NULL;
> > -	int			error;
> > +	struct xfs_scrub	*sc = rs->sc;
> > +	struct xfs_perag	*pag = sc->sa.pag;
> > +	struct xfs_mount	*mp = sc->mp;
> > +	xfs_agnumber_t		agno = sc->sa.pag->pag_agno;
> > +	xfs_agblock_t		agbno_next = agbno + *aglenp;
> > +	xfs_agblock_t		bno = agbno;
> >  
> >  	/*
> > -	 * If there's an incore buffer for exactly this block, invalidate it.
> >  	 * Avoid invalidating AG headers and post-EOFS blocks because we never
> >  	 * own those.
> >  	 */
> > -	if (!xfs_verify_fsbno(sc->mp, fsbno))
> > +	if (!xfs_verify_agbno(pag, agbno) ||
> > +	    !xfs_verify_agbno(pag, agbno_next - 1))
> >  		return;
> >  
> >  	/*
> > -	 * We assume that the lack of any other known owners means that the
> > -	 * buffer can be locked without risk of deadlocking.
> > +	 * If there are incore buffers for these blocks, invalidate them.  We
> > +	 * assume that the lack of any other known owners means that the buffer
> > +	 * can be locked without risk of deadlocking.  The buffer cache cannot
> > +	 * detect aliasing, so employ nested loops to scan for incore buffers
> > +	 * of any plausible size.
> >  	 */
> > -	error = xfs_buf_incore(sc->mp->m_ddev_targp,
> > -			XFS_FSB_TO_DADDR(sc->mp, fsbno),
> > -			XFS_FSB_TO_BB(sc->mp, 1), XBF_BCACHE_SCAN, &bp);
> > -	if (error)
> > -		return;
> > -
> > -	xfs_trans_bjoin(sc->tp, bp);
> > -	xfs_trans_binval(sc->tp, bp);
> > +	while (bno < agbno_next) {
> > +		xfs_agblock_t	fsbcount;
> > +		xfs_agblock_t	max_fsbs;
> > +
> > +		/*
> > +		 * Max buffer size is the max remote xattr buffer size, which
> > +		 * is one fs block larger than 64k.
> > +		 */
> > +		max_fsbs = min_t(xfs_agblock_t, agbno_next - bno,
> > +				xfs_attr3_rmt_blocks(mp, XFS_XATTR_SIZE_MAX));
> > +
> > +		for (fsbcount = 1; fsbcount < max_fsbs; fsbcount++) {
> > +			struct xfs_buf	*bp = NULL;
> > +			xfs_daddr_t	daddr;
> > +			int		error;
> > +
> > +			daddr = XFS_AGB_TO_DADDR(mp, agno, bno);
> > +			error = xfs_buf_incore(mp->m_ddev_targp, daddr,
> > +					XFS_FSB_TO_BB(mp, fsbcount),
> > +					XBF_BCACHE_SCAN, &bp);
> > +			if (error)
> > +				continue;
> > +
> > +			xfs_trans_bjoin(sc->tp, bp);
> > +			xfs_trans_binval(sc->tp, bp);
> > +			rs->invalidated++;
> 
> Hmmm. O(N^2) brute force lookup walk to find any buffer at that
> specific daddr?  That's going to have an impact on running systems
> by hammering the perag hash lock.
> 
> I didn't know this was being done before I suggested XBF_ANY_MATCH,
> but now I'm wondering how can we even have multiple buffers in the
> cache at a given address? The whole point of the ASSERT() in the
> match function is to indicate this should not ever happen.
> 
> i.e. xfs_buf_find_insert() uses rhashtable_lookup_get_insert_fast(),
> which will return an existing buffer only if it has a match length.
> The compare function used at insert time will throw an assert fail
> if any other buffer exists at that address that has a mismatched
> length that is not stale. There is no way to avoid that ASSERT check
> on insert.
> 
> Hence, AFAICT, the only way we can get multiple buffers into the
> cache at the same daddr with different lengths is for all the
> existing buffers at that daddr to all be stale at insert time.  In
> which case, why do we need to invalidate buffers that are already
> stale (i.e. been invalidated)?
> 
> What am I missing here? i.e. this appears to cater for behaviour

The same braino that I mentioned in my previous replies -- I failed to
state that we're looking for buffers within a specific range of fsblocks
that do /not/ start at the same addr.

> that doesn't currently exist in the buffer cache, and I'm not sure
> we even want to allow to occur in the buffer cache given that it
> generally indicates in-memory metadata corruption.....
> 
> Hmmmm. What if the buffer was already stale? The lookup will then

Urk.  Yes, that results in the buffer's stale state being cleared and
then the buffer gets cancelled *again*.  It's unnecessary and we can
skip the second invalidation.

I think I'll change the name back to its former name XBF_LIVESCAN, which
means "only return non-stale buffers, and don't complain about finding
something with the same daddr but a different length from what the
caller asked for".

--D

> clear all the stale state from it, leave it with no valid contents
> which we then invalidate again and log a new buffer cancel item for
> it. What are the implications of that? (My brain is too full of
> stuff trying to understand what this code is doing to be able to
> think this through right now.)
> 
> -Dave.
> -- 
> Dave Chinner
> david@xxxxxxxxxxxxx