Re: [PATCH 08/10] xfs: hoist the code that moves the incore inode fork broot memory

"Darrick J. Wong" <djwong@xxxxxxxxxx> · Thu, 29 Aug 2024 16:35:07 -0700

On Thu, Aug 29, 2024 at 12:54:32PM +1000, Dave Chinner wrote:
> On Tue, Aug 27, 2024 at 04:35:48PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@xxxxxxxxxx>
> > 
> > Whenever we change the size of the memory buffer holding an inode fork
> > btree root block, we have to copy the contents over.  Refactor all this
> > into a single function that handles both, in preparation for making
> > xfs_iroot_realloc more generic.
> > 
> > Signed-off-by: Darrick J. Wong <djwong@xxxxxxxxxx>
> > ---
> >  fs/xfs/libxfs/xfs_inode_fork.c |   87 ++++++++++++++++++++++++++--------------
> >  1 file changed, 56 insertions(+), 31 deletions(-)
> > 
> > 
> > diff --git a/fs/xfs/libxfs/xfs_inode_fork.c b/fs/xfs/libxfs/xfs_inode_fork.c
> > index 60646a6c32ec7..307207473abdb 100644
> > --- a/fs/xfs/libxfs/xfs_inode_fork.c
> > +++ b/fs/xfs/libxfs/xfs_inode_fork.c
> > @@ -387,6 +387,50 @@ xfs_iroot_free(
> >  	ifp->if_broot = NULL;
> >  }
> >  
> > +/* Move the bmap btree root from one incore buffer to another. */
> > +static void
> > +xfs_ifork_move_broot(
> > +	struct xfs_inode	*ip,
> > +	int			whichfork,
> > +	struct xfs_btree_block	*dst_broot,
> > +	size_t			dst_bytes,
> > +	struct xfs_btree_block	*src_broot,
> > +	size_t			src_bytes,
> > +	unsigned int		numrecs)
> > +{
> > +	struct xfs_mount	*mp = ip->i_mount;
> > +	void			*dptr;
> > +	void			*sptr;
> > +
> > +	ASSERT(xfs_bmap_bmdr_space(src_broot) <= xfs_inode_fork_size(ip, whichfork));
> 
> We pass whichfork just for this debug check. Can you pull this up
> to the callers?

I guess I could do that, but the rtrmap patchset adds its own broot
shrink/grow function specific to rtrmap btree inodes:

static void
xfs_rtrmapbt_broot_move(...)
{
	...
	ASSERT(xfs_rtrmap_droot_space(src_broot) <=
	       xfs_inode_fork_size(ip, whichfork));

so I didn't want to add yet another indirect call just for an assertion.

> > +
> > +	/*
> > +	 * We always have to move the pointers because they are not butted
> > +	 * against the btree block header.
> > +	 */
> > +	if (numrecs) {
> > +		sptr = xfs_bmap_broot_ptr_addr(mp, src_broot, 1, src_bytes);
> > +		dptr = xfs_bmap_broot_ptr_addr(mp, dst_broot, 1, dst_bytes);
> > +		memmove(dptr, sptr, numrecs * sizeof(xfs_fsblock_t));
> > +	}
> > +
> > +	if (src_broot == dst_broot)
> > +		return;
> 
> Urk. So this is encoding caller logic directly into this function.
> ie. the grow cases uses krealloc() which copies the keys and
> pointers but still needs the pointers moved. The buffer is large
> enough for that, so it passes src and dst as the same buffer and
> this code then jumps out after copying the ptrs (a second time) to
> their final resting place.

<nod>

> > +	/*
> > +	 * If the root is being totally relocated, we have to migrate the block
> > +	 * header and the keys that come after it.
> > +	 */
> > +	memcpy(dst_broot, src_broot, xfs_bmbt_block_len(mp));
> > +
> > +	/* Now copy the keys, which come right after the header. */
> > +	if (numrecs) {
> > +		sptr = xfs_bmbt_key_addr(mp, src_broot, 1);
> > +		dptr = xfs_bmbt_key_addr(mp, dst_broot, 1);
> > +		memcpy(dptr, sptr, numrecs * sizeof(struct xfs_bmbt_key));
> > +	}
> 
> And here we do the key copy for the shrink case where we technically
> don't need separate buffers but we really want to minimise memory
> usage if we can so we reallocate a smaller buffer and free the
> original larger one.
> 
> Given this, I think this code is more natural by doing all the
> allocate/free/copy ourselves instead of using krealloc() and it's
> implicit copy for one of the cases.
> 
> i.e. rename this function xfs_ifork_realloc_broot() and make it do
> this:
> 
> {
> 	struct xfs_btree_block *src = ifp->if_broot;
> 	struct xfs_btree_block *dst = NULL;
> 
> 	if (!numrecs)
> 		goto out_free_src;
> 
> 	dst = kmalloc(new_size);
> 
> 	/* copy block header */
> 	memcpy(dst, src, xfs_bmbt_block_len(mp));

I'm not sure I like replacing krealloc with kmalloc here.  For a grow
operation, if the new and old object sizes are close enough that we
reuse the existing slab object, then we only have to move the pointers.
In the best case, the object expands, so all the bytes we had before are
still live and we touch fewer cachelines.  In the worst case we get a
new object, but that's roughly exponential.

For a shrink operation, we definitely want the alloc -> copy -> free
logic because there's no way to guarantee that krealloc-down isn't a nop
operation, which wastes memory.

But I see that this function isn't very cohesive and could be split into
separate ->grow and ->shrink functions that do their own allocations.

Or seeing how the only callers of xfs_iroot_realloc are the btree code
itself, maybe I should just refactor this into a single ->broot_realloc
function in the btree ops which will cut out a lot of indirect calls
from the iroot code.

Yeah.  I'm gonna go do that.  Disregard patch 5 onwards.

> 	/* copy records */
> 	sptr = xfs_bmbt_key_addr(mp, src, 1);
> 	dptr = xfs_bmbt_key_addr(mp, dst, 1);
> 	memcpy(dptr, sptr, numrecs * sizeof(struct xfs_bmbt_key));
> 
> 	/* copy pointers */
> 	sptr = xfs_bmap_broot_ptr_addr(mp, src_broot, 1, src_bytes);
> 	dptr = xfs_bmap_broot_ptr_addr(mp, dst_broot, 1, dst_bytes);
> 	memmove(dptr, sptr, numrecs * sizeof(xfs_fsblock_t));
> 
> out_free_src:
> 	kfree(src);
> 	ifp->if_broot = dst;
> 	ifp->if_broot_bytes = new_size;
> }
> 
> And the callers are now both:
> 
> 	xfs_ifork_realloc_broot(mp, ifp, new_size, old_size, numrecs);
> 
> This also naturally handles the "reduce to zero size" without
> needing any special case code, it avoids the double pointer copy on
> grow, and the operation logic is simple, obvious and easy to
> understand...

Hmm.  The rtrmap patchset starts by moving xfs_ifork_move_broot to
xfs_bmap_btree.c and virtualizes the broot grow/shrink operation to
become a per-btree type operation.  The rtreflink series expands this
usage.

--D

> -Dave.
> -- 
> Dave Chinner
> david@xxxxxxxxxxxxx
>