Re: [PATCH 2/2] xfs: Fix log reservation calculation for xattr insert operation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tuesday, January 14, 2020 2:56 AM Darrick J. Wong wrote: 
> On Fri, Jan 10, 2020 at 09:59:53AM +0530, Chandan Rajendra wrote:
> > Log space reservation for xattr insert operation can be divided into two
> > parts,
> > 1. Mount time
> >    - Inode
> >    - Superblock for accounting space allocations
> >    - AGF for accounting space used be count, block number, rmapbt and refcnt
> >      btrees.
> > 
> > 2. The remaining log space can only be calculated at run time because,
> >    - A local xattr can be large enough to cause a double split of the dabtree.
> >    - The value of the xattr can be large enough to be stored in remote
> >      blocks. The contents of the remote blocks are not logged.
> > 
> >    The log space reservation would be,
> >    - 2 * XFS_DA_NODE_MAXDEPTH number of blocks. Additional XFS_DA_NODE_MAXDEPTH
> >      number of blocks are required if xattr is large enough to cause another
> >      split of the dabtree path from root to leaf block.
> >    - BMBT blocks for storing (2 * XFS_DA_NODE_MAXDEPTH) record
> >      entries. Additional XFS_DA_NODE_MAXDEPTH number of blocks are required in
> >      case of a double split of the dabtree path from root to leaf blocks.
> >    - Space for logging blocks of count, block number, rmap and refcnt btrees.
> > 
> > This commit refactors xfs_attr_calc_size() to calculate the log reservation
> > space and also the FS reservation space. It then replaces the erroneous
> > calculation inside xfs_log_calc_max_attrsetm_res() with an invocation of
> > xfs_attr_calc_size().
> 
> Uh, what was the error that you saw?

In xfs_log_calc_max_attrsetm_res(), we have,

1. XFS_DAENTER_SPACE_RES() calculates the number of blocks required to add
   an xattr. It comprises of,
   - XFS_DA_NODE_MAXDEPTH
     Number of blocks required for a single split of the dabtree from the root
     to leaf.
   - XFS_DA_NODE_MAXDEPTH * XFS_BM_MAXLEVELS
     For each dabtree block allocated, the above statement assumes that we
     would need to allocate (in the worst case) blocks from the root to
     the leaf of a bmbt tree.
   This does not take into consideration the double-split that is
   possible for large sized local xattrs.
2. XFS_NEXTENTADD_SPACE_RES() once again calculates the number of bmbt blocks
   that needs to be reserved. This is already taken into consideration by
   XFS_DAENTER_SPACE_RES() macro.

   Also, the second argument to XFS_NEXTENTADD_SPACE_RES() is in units
   of bytes. This should have been specified in units of blocks.

   This is what led me to start looking into this function more thoroughly
   i.e. there was no workload that failed with the existing logic.

3. We then multiply the summation of the above numbers with the runtime
   reservation i.e. superblock + nr blocks from root to leaf of a bmbt. 
   - Superblock log space reservation is required only once per transaction.
     However, Here we end up reserving superblock log space
     for each of the blocks that we had calculated.
   - Number of blocks from root to leaf of a bmbt.
     XFS_DAENTER_SPACE_RES() already computed the number of bmbt blocks
     that needs to be allocated.
     The number of log blocks required to be reserved in the worst case should
     be (XFS_BM_MAXLEVELS + number of bmbt blocks computed by
     XFS_DAENTER_SPACE_RES()).
> 
> > Signed-off-by: Chandan Rajendra <chandanrlinux@xxxxxxxxx>
> > ---
> >  fs/xfs/libxfs/xfs_attr.c       | 107 +++++++++++++++++++++------------
> >  fs/xfs/libxfs/xfs_attr.h       |   4 +-
> >  fs/xfs/libxfs/xfs_log_rlimit.c |  15 ++---
> >  fs/xfs/libxfs/xfs_trans_resv.c |  34 ++---------
> >  fs/xfs/libxfs/xfs_trans_resv.h |   2 +
> >  5 files changed, 86 insertions(+), 76 deletions(-)
> > 
> > diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
> > index 1eae1db74f6c..067661e61286 100644
> > --- a/fs/xfs/libxfs/xfs_attr.c
> > +++ b/fs/xfs/libxfs/xfs_attr.c
> > @@ -183,43 +183,6 @@ xfs_attr_get(
> >  	return 0;
> >  }
> >  
> > -/*
> > - * Calculate how many blocks we need for the new attribute,
> > - */
> > -STATIC int
> > -xfs_attr_calc_size(
> > -	struct xfs_da_args	*args,
> > -	int			*local)
> > -{
> > -	struct xfs_mount	*mp = args->dp->i_mount;
> > -	int			size;
> > -	int			nblks;
> > -
> > -	/*
> > -	 * Determine space new attribute will use, and if it would be
> > -	 * "local" or "remote" (note: local != inline).
> > -	 */
> > -	size = xfs_attr_leaf_newentsize(mp, args->namelen, args->valuelen,
> > -					local);
> > -	nblks = XFS_DAENTER_SPACE_RES(mp, XFS_ATTR_FORK);
> > -	if (*local) {
> > -		if (size > (args->geo->blksize / 2)) {
> > -			/* Double split possible */
> > -			nblks *= 2;
> > -		}
> > -	} else {
> > -		/*
> > -		 * Out of line attribute, cannot double split, but
> > -		 * make room for the attribute value itself.
> > -		 */
> > -		uint	dblocks = xfs_attr3_rmt_blocks(mp, args->valuelen);
> > -		nblks += dblocks;
> > -		nblks += XFS_NEXTENTADD_SPACE_RES(mp, dblocks, XFS_ATTR_FORK);
> > -	}
> > -
> > -	return nblks;
> > -}
> > -
> >  STATIC int
> >  xfs_attr_try_sf_addname(
> >  	struct xfs_inode	*dp,
> > @@ -248,6 +211,69 @@ xfs_attr_try_sf_addname(
> >  	return error ? error : error2;
> >  }
> >  
> > +/*
> > + * Calculate how many blocks we need for the new attribute,
> > + */
> > +void
> > +xfs_attr_calc_size(
> > +	struct xfs_mount	*mp,
> > +	int			namelen,
> > +	int			valuelen,
> > +	int			*local,
> > +	unsigned int		*log_blks,
> 
> I see the xfs_calc_buf_res() calls below, which means this value ends up
> with the number of *log bytes* needed to log all the blocks we think
> we're going to need for the new attr.

Sorry, My bad. I will fix this up.

> 
> > +	unsigned int		*total_blks)
> > +{
> > +	unsigned int		blksize;
> > +	int			dabtree_blks;
> > +	int			bmbt_blks;
> > +	int			size;
> > +	int			dblks;
> > +
> > +	blksize = mp->m_dir_geo->blksize;
> > +	dblks = 0;
> > +	*log_blks = 0;
> > +	*total_blks = 0;
> > +
> > +	/*
> > +	 * Determine space new attribute will use, and if it would be
> > +	 * "local" or "remote" (note: local != inline).
> > +	 */
> > +	size = xfs_attr_leaf_newentsize(mp, namelen, valuelen, local);
> > +
> > +	dabtree_blks = XFS_DAENTER_BLOCKS(mp, XFS_ATTR_FORK);
> > +	bmbt_blks = XFS_DAENTER_BMAPS(mp, XFS_ATTR_FORK);
> 
> Ok, so this calculates the number of blocks we need to add one attr
> block to the dabtree + whatever dabtree split we might need to insert
> the leaf block + whatever bmbt splits we might need to insert each of
> the new dabtree blocks into the attr fork.
> 
> I guess this calculation handles the case where we have to add an attr
> leaf block to the dabtree...?

Yes, That is correct.

> 
> > +
> > +	*log_blks = xfs_calc_buf_res(2 * dabtree_blks, blksize);
> > +	*log_blks += xfs_calc_buf_res(2 * bmbt_blks, XFS_FSB_TO_B(mp, 1));
> 
> Why reserve log space for twice as many blocks as we just calculated?

Lets say that a leaf block is split. This would mean that the contents of two
blocks would have changed. Hence we would need to log both the blocks. In the
worst case we would need to log changes for all the splits from root block to
the leaf block.

> 
> > +	if (*local) {
> > +		if (size > (blksize / 2)) {
> > +			/* Double split possible */
> > +			*log_blks += xfs_calc_buf_res(dabtree_blks, blksize);
> > +			*log_blks += xfs_calc_buf_res(bmbt_blks,
> > +						XFS_FSB_TO_B(mp, 1));
> > +
> > +			dabtree_blks *= 2;
> > +			bmbt_blks *= 2;
> 
> This appears to be the new version of the old code "nblks *= 2", which
> doubled the resource counts if it anticipates a possible second dabtree
> split (i.e. the attr entry size is more than half a block), right?

Yes, that is correct.

> 
> You really ought to use the proper macro to update bmbt_blocks instead
> of assuming that doubling it will be fine.

XFS_NEXTENTADD_SPACE_RES() seems to be right macro to invoke.

> 
> > +		}
> > +	} else {
> > +		/*
> > +		 * Out of line attribute, cannot double split, but
> > +		 * make room for the attribute value itself.
> > +		 */
> > +		dblks = xfs_attr3_rmt_blocks(mp, valuelen);
> > +		bmbt_blks += XFS_NEXTENTADD_SPACE_RES(mp, dblks, XFS_ATTR_FORK);
> > +		*log_blks += xfs_calc_buf_res(bmbt_blks, XFS_FSB_TO_B(mp, 1));
> 
> This adds enough log bytes reservation to handle bmbt splits to add all
> the (unlogged) remote attr value blocks to the attr fork...
> 
> > +	}
> > +
> > +	*log_blks += xfs_calc_buf_res(xfs_allocfree_log_count(mp, dabtree_blks),
> > +				XFS_FSB_TO_B(mp, 1));
> 
> ...this adds log bytes reservation for all the AG space btree updates
> that might be necessary to all all those blocks to the dabtree...
> 
> > +	*log_blks += xfs_calc_buf_res(xfs_allocfree_log_count(mp, dblks),
> > +				XFS_FSB_TO_B(mp, 1));
> 
> ...and this one does the same for all the bmbt blocks needed to map in
> the new dabtree blocks.
> 
> > +	*total_blks = dabtree_blks + bmbt_blks + dblks;
> 
> This calculation is the worst case number of blocks we think we'll need,
> which gets fed to _trans_alloc as well as args.total.  This is the same
> behavior as before this patchset.

Yes, The behaviour is same for filesystem space reservation
calculation. However, The idea behind this patch is to make
xfs_attr_calc_size() to also calculate the log space reservation.

Without this patch, xfs_log_calc_max_attrsetm_res() would return 555768 blocks
as the log space reservation. After fixing the second argument of
XFS_NEXTENTADD_SPACE_RES() to be a "block count" instead of "byte count",
xfs_log_calc_max_attrsetm_res() would return 262904 blocks as the log space
reservation.

With this patch applied (along with converting the return value of
xfs_calc_buf_res() to blocks), xfs_log_calc_max_attrsetm_res() returns 3187
blocks as the log space reservation.

> 
> Before this series, we compute the number of bytes of log space needed
> to record all the new metadata:
> 
> 	tr_attrsetm.tr_logres + (args.total * tr_attrsetrt.tr_logres)
> 
> tr_attrsetm.tr_logres is large enough to log the first of the dabtree
> splits, but ... doesn't seem to do so for the double split case?  Hmm.
> 
> tr_attrsetrt.tr_logres is set to the number of log bytes needed
> to log the AGF and BMBT updates needed to add one block to the tree, but
> that doesn't factor in all of the log bytes needed to record the changes
> to the bnobt, cntbt, and rmapbt.
> 
> So my guess is that the problem you saw was that you're running some
> workload that exercises the attribute setting routines heavily, and at
> some point the dabtree gets full enough and/or the fs fragments enough
> that we exceed the log bytes reservation and the log goes offline?

Actually, I found the flaw in xfs_log_calc_max_attrsetm_res() when debugging
an xfstests failure after extending the per-inode attribute extent counter to
32-bits.

> 
> Assuming I've gotten all that correct, I think I see a better way to
> structure this.  For one thing, I think we should keep the log space
> reservation calculations functions in xfs_trans_resv.c and not spread
> them into xfs_attr.c.
> 
> xfs_attr_calc_size should return both the (max) number of *dabtree*
> blocks that we're going to log and the (max) number of *dabtree* blocks
> that we could be allocating.  Next, add a pair of functions to
> xfs_trans_resv.c to compute the actual log space and log blocks
> reservations given the above two outputs.
> 
> xfs_calc_attr_res(log_dablocks, total_dablocks)
> {
> 	bmbt_blks = XFS_NEXTENTADD_SPACE_RES(total_dablocks)
> 	space_blks = xfs_allocfree_log_count(total_dablocks + bmbt_blks)
> 
> 	return tr_attrsetm.tr_logres +
> 	       xfs_calc_buf_res(log_dablocks, blksize) +
> 	       xfs_calc_buf_res(bmbt_blks, blksize) +
> 	       xfs_calc_buf_res(space_blks, blksize) +
> }
> 
> xfs_calc_attr_blocks(total_dablocks)
> {
> 	return total_dablocks + XFS_NEXTENTADD_SPACE_RES(total_dablocks)
> }
> 
> and fix the atttrsetm.tr_logres calculation not to include the dablocks
> reservation.

Yes, the above looks correct. I will make the required changes and post the
report the patches. Thanks for the review comments.

-- 
chandan






[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux