Re: [PATCH] xfs: bmap scrub should only scrub records once

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Aug 23, 2019 at 11:02:21AM -0400, Brian Foster wrote:
> On Fri, Aug 16, 2019 at 07:06:51PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@xxxxxxxxxx>
> > 
> > The inode block mapping scrub function does more work for btree format
> > extent maps than is absolutely necessary -- first it will walk the bmbt
> > and check all the entries, and then it will load the incore tree and
> > check every entry in that tree.
> > 
> > Reduce the run time of the ondisk bmbt walk if the incore tree is loaded
> > by checking that the incore tree has an exact match for the bmbt extent.
> > Similarly, skip the incore tree walk if we have to load it from the
> > bmbt, since we just checked that.
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@xxxxxxxxxx>
> > ---
> >  fs/xfs/scrub/bmap.c |   40 +++++++++++++++++++++++++++++++++++++---
> >  1 file changed, 37 insertions(+), 3 deletions(-)
> > 
> > diff --git a/fs/xfs/scrub/bmap.c b/fs/xfs/scrub/bmap.c
> > index 1bd29fdc2ab5..6170736fa94f 100644
> > --- a/fs/xfs/scrub/bmap.c
> > +++ b/fs/xfs/scrub/bmap.c
> > @@ -384,6 +384,7 @@ xchk_bmapbt_rec(
> >  	struct xfs_inode	*ip = bs->cur->bc_private.b.ip;
> >  	struct xfs_buf		*bp = NULL;
> >  	struct xfs_btree_block	*block;
> > +	struct xfs_ifork	*ifp = XFS_IFORK_PTR(ip, info->whichfork);
> >  	uint64_t		owner;
> >  	int			i;
> >  
> > @@ -402,8 +403,30 @@ xchk_bmapbt_rec(
> >  		}
> >  	}
> >  
> > -	/* Set up the in-core record and scrub it. */
> > +	/*
> > +	 * If the incore bmap cache is already loaded, check that it contains
> > +	 * an extent that matches this one exactly.  We validate those cached
> > +	 * bmaps later, so we don't need to check here.
> > +	 *
> > +	 * If the cache is /not/ loaded, we need to validate the bmbt records
> > +	 * now.
> > +	 */
> >  	xfs_bmbt_disk_get_all(&rec->bmbt, &irec);
> > +        if (ifp->if_flags & XFS_IFEXTENTS) {
> 
> ^ looks like whitespace damage right here.

Oops.  Fixed.

> > +		struct xfs_bmbt_irec	iext_irec;
> > +		struct xfs_iext_cursor	icur;
> > +
> > +		if (!xfs_iext_lookup_extent(ip, ifp, irec.br_startoff, &icur,
> > +					&iext_irec) ||
> > +		    irec.br_startoff != iext_irec.br_startoff ||
> > +		    irec.br_startblock != iext_irec.br_startblock ||
> > +		    irec.br_blockcount != iext_irec.br_blockcount ||
> > +		    irec.br_state != iext_irec.br_state)
> > +			xchk_fblock_set_corrupt(bs->sc, info->whichfork,
> > +					irec.br_startoff);
> > +		return 0;
> > +	}
> > +
> 
> Ok, so right now the bmbt walk makes no consideration of in-core state.
> With this change, we correlate every on-disk record with an in-core
> counterpart (if cached) and skip the additional extent checks...
> 
> >  	return xchk_bmap_extent(ip, bs->cur, info, &irec);
> >  }
> >  
> > @@ -671,11 +694,22 @@ xchk_bmap(
> >  	if (sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT)
> >  		goto out;
> >  
> > -	/* Now try to scrub the in-memory extent list. */
> > +	/*
> > +	 * If the incore bmap cache isn't loaded, then this inode has a bmap
> > +	 * btree and we already walked it to check all of the mappings.  Load
> > +	 * the cache now and skip ahead to rmap checking (which requires the
> > +	 * bmap cache to be loaded).  We don't need to check twice.
> > +	 *
> > +	 * If the cache /is/ loaded, then we haven't checked any mappings, so
> > +	 * iterate the incore cache and check the mappings now, because the
> > +	 * bmbt iteration code skipped the checks, assuming that we'd do them
> > +	 * here.
> > +	 */
> >          if (!(ifp->if_flags & XFS_IFEXTENTS)) {
> >  		error = xfs_iread_extents(sc->tp, ip, whichfork);
> >  		if (!xchk_fblock_process_error(sc, whichfork, 0, &error))
> >  			goto out;
> > +		goto out_check_rmap;
> 
> ... because we end up doing that here. Otherwise, the bmbt walk did the
> extent checks, so we can skip it here.

Yep.  On the stress test case (which is bmapbtd checking of mdrestore'd
sparse images of large filesystems), only doing the extent walk + check
once can cut down the runtime by ~30%.

> I think I follow, but I'm a little confused by the need for such split
> logic when we follow up with an unconditional read of the extent tree
> anyways. Maybe I'm missing something, but couldn't we just read the
> extent tree a little earlier and always do the extent checks in one
> place?

The original goal was that if the extent cache isn't loaded, we want to
check the bmbt records before we even bother to call xfs_iread_extents,
so that someone could find out from the trace data exactly where in the
bmbt was the corruption found.

Granted, since we're reducing the scrub code to the bare minimum needed
to decide if something's good or bad due to the primary interface being
a bit field... I could unconditionally load the extent map earlier,
unconditionally check the iext records, and then the bmbt walk only
needs to check that the tree shape is ok and that each bmbt record
corresponds to an iext record.

The other way to go would be to convert xchk_bmap_check_rmaps to use a
bmbt cursor if the iext isn't loaded, in which case we wouldn't need to
load the iext cache at all.  That would reduce the kernel slab
perturbations at a cost of extra code complexity.

Thoughts?

--D

> Brian
> 
> >  	}
> >  
> >  	/* Find the offset of the last extent in the mapping. */
> > @@ -689,7 +723,7 @@ xchk_bmap(
> >  	for_each_xfs_iext(ifp, &icur, &irec) {
> >  		if (xchk_should_terminate(sc, &error) ||
> >  		    (sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT))
> > -			break;
> > +			goto out;
> >  		if (isnullstartblock(irec.br_startblock))
> >  			continue;
> >  		if (irec.br_startoff >= endoff) {



[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux