Re: [PATCH 05/15] xfs: support dynamic btree cursor heights

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Oct 13, 2021 at 04:31:22PM +1100, Dave Chinner wrote:
> On Tue, Oct 12, 2021 at 04:33:01PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@xxxxxxxxxx>
> > 
> > Split out the btree level information into a separate struct and put it
> > at the end of the cursor structure as a VLA.  The realtime rmap btree
> > (which is rooted in an inode) will require the ability to support many
> > more levels than a per-AG btree cursor, which means that we're going to
> > create two btree cursor caches to conserve memory for the more common
> > case.
> > 
> > Signed-off-by: Darrick J. Wong <djwong@xxxxxxxxxx>
> > Reviewed-by: Chandan Babu R <chandan.babu@xxxxxxxxxx>
> > Reviewed-by: Christoph Hellwig <hch@xxxxxx>
> > ---
> >  fs/xfs/libxfs/xfs_alloc.c |    6 +-
> >  fs/xfs/libxfs/xfs_bmap.c  |   10 +--
> >  fs/xfs/libxfs/xfs_btree.c |  168 +++++++++++++++++++++++----------------------
> >  fs/xfs/libxfs/xfs_btree.h |   28 ++++++--
> >  fs/xfs/scrub/bitmap.c     |   22 +++---
> >  fs/xfs/scrub/bmap.c       |    2 -
> >  fs/xfs/scrub/btree.c      |   47 +++++++------
> >  fs/xfs/scrub/trace.c      |    7 +-
> >  fs/xfs/scrub/trace.h      |   10 +--
> >  fs/xfs/xfs_super.c        |    2 -
> >  fs/xfs/xfs_trace.h        |    2 -
> >  11 files changed, 164 insertions(+), 140 deletions(-)
> 
> Hmmm - subject of the patch doesn't really match the changes being
> made - there's nothing here that makes the btree cursor heights
> dynamic. It's just a structure layout change...

"xfs: prepare xfs_btree_cur for dynamic cursor heights" ?

> 
> > @@ -415,9 +415,9 @@ xfs_btree_dup_cursor(
> >  	 * For each level current, re-get the buffer and copy the ptr value.
> >  	 */
> >  	for (i = 0; i < new->bc_nlevels; i++) {
> > -		new->bc_ptrs[i] = cur->bc_ptrs[i];
> > -		new->bc_ra[i] = cur->bc_ra[i];
> > -		bp = cur->bc_bufs[i];
> > +		new->bc_levels[i].ptr = cur->bc_levels[i].ptr;
> > +		new->bc_levels[i].ra = cur->bc_levels[i].ra;
> > +		bp = cur->bc_levels[i].bp;
> >  		if (bp) {
> >  			error = xfs_trans_read_buf(mp, tp, mp->m_ddev_targp,
> >  						   xfs_buf_daddr(bp), mp->m_bsize,
> > @@ -429,7 +429,7 @@ xfs_btree_dup_cursor(
> >  				return error;
> >  			}
> >  		}
> > -		new->bc_bufs[i] = bp;
> > +		new->bc_levels[i].bp = bp;
> >  	}
> >  	*ncur = new;
> >  	return 0;
> 
> ObHuh: that dup_cursor code seems like a really obtuse way of doing:
> 
> 	bip = cur->bc_levels[i].bp->b_log_item;
> 	bip->bli_recur++;
> 	new->bc_levels[i] = cur->bc_levels[i];
> 
> But that's not a problem this patch needs to solve. Just something
> that made me go hmmmm...

Yeah, I noticed that too while I was checking the results of my sed
script.

> > @@ -922,11 +922,11 @@ xfs_btree_readahead(
> >  	    (lev == cur->bc_nlevels - 1))
> >  		return 0;
> >  
> > -	if ((cur->bc_ra[lev] | lr) == cur->bc_ra[lev])
> > +	if ((cur->bc_levels[lev].ra | lr) == cur->bc_levels[lev].ra)
> >  		return 0;
> 
> That's whacky logic. Surely that's just:
> 
> 	if (cur->bc_levels[lev].ra & lr)
> 		return 0;

This is an early-exit test, which means the careful check is necessary.

If (some day) someone calls this function with (LEFTRA|RIGHTRA) to
readahead both siblings on a btree level where one sibling has been ra'd
but not the other, we must avoid taking the branch.

> > diff --git a/fs/xfs/libxfs/xfs_btree.h b/fs/xfs/libxfs/xfs_btree.h
> > index 1018bcc43d66..f31f057bec9d 100644
> > --- a/fs/xfs/libxfs/xfs_btree.h
> > +++ b/fs/xfs/libxfs/xfs_btree.h
> > @@ -212,6 +212,19 @@ struct xfs_btree_cur_ino {
> >  #define	XFS_BTCUR_BMBT_INVALID_OWNER	(1 << 1)
> >  };
> >  
> > +struct xfs_btree_level {
> > +	/* buffer pointer */
> > +	struct xfs_buf		*bp;
> > +
> > +	/* key/record number */
> > +	uint16_t		ptr;
> > +
> > +	/* readahead info */
> > +#define XFS_BTCUR_LEFTRA	1	/* left sibling has been read-ahead */
> > +#define XFS_BTCUR_RIGHTRA	2	/* right sibling has been read-ahead */
> > +	uint16_t		ra;
> > +};
> 
> The ra variable is a bit field. Can we define the values obviously
> as bit fields with (1 << 0) and (1 << 1) instead of 1 and 2?

Done.

> > @@ -242,8 +250,17 @@ struct xfs_btree_cur
> >  		struct xfs_btree_cur_ag	bc_ag;
> >  		struct xfs_btree_cur_ino bc_ino;
> >  	};
> > +
> > +	/* Must be at the end of the struct! */
> > +	struct xfs_btree_level	bc_levels[];
> >  };
> >  
> > +static inline size_t
> > +xfs_btree_cur_sizeof(unsigned int nlevels)
> > +{
> > +	return struct_size((struct xfs_btree_cur *)NULL, bc_levels, nlevels);
> > +}
> 
> Ooooh, yeah, we really need comments explaining how many btree
> levels these VLAs are tracking, because this one doesn't have a "-
> 1" in it like the previous one I commented on....

/*
 * Compute the size of a btree cursor that can handle a btree of a given
 * height.  The bc_levels array handles node and leaf blocks, so its
 * size is exactly nlevels.
 */


> > diff --git a/fs/xfs/scrub/trace.c b/fs/xfs/scrub/trace.c
> > index c0ef53fe6611..816dfc8e5a80 100644
> > --- a/fs/xfs/scrub/trace.c
> > +++ b/fs/xfs/scrub/trace.c
> > @@ -21,10 +21,11 @@ xchk_btree_cur_fsbno(
> >  	struct xfs_btree_cur	*cur,
> >  	int			level)
> >  {
> > -	if (level < cur->bc_nlevels && cur->bc_bufs[level])
> > +	if (level < cur->bc_nlevels && cur->bc_levels[level].bp)
> >  		return XFS_DADDR_TO_FSB(cur->bc_mp,
> > -				xfs_buf_daddr(cur->bc_bufs[level]));
> > -	if (level == cur->bc_nlevels - 1 && cur->bc_flags & XFS_BTREE_LONG_PTRS)
> > +				xfs_buf_daddr(cur->bc_levels[level].bp));
> > +	else if (level == cur->bc_nlevels - 1 &&
> > +		 cur->bc_flags & XFS_BTREE_LONG_PTRS)
> 
> No need for an else there as the first if () clause returns.
> Also, needs more () around that "a & b" second line.

TBH I think we check the wrong flag, and that last bit should be:

	if (level == cur->bc_nlevels - 1 &&
	    (cur->bc_flags & XFS_BTREE_ROOT_IN_INODE))
		return XFS_INO_TO_FSB(cur->bc_mp, cur->bc_ino.ip->i_ino);

	return NULLFSBLOCK;

But for now I'll stick to the straight replacement and tack on another
patch to fix that.

--D

> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@xxxxxxxxxxxxx



[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux