Re: [PATCH V4 RESEND 7/7] xfs: Fix log reservation calculation for xattr insert operation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thursday, February 27, 2020 12:20 AM Brian Foster wrote: 
> On Wed, Feb 26, 2020 at 04:51:13PM +0530, Chandan Rajendra wrote:
> > On Tuesday, February 25, 2020 10:49 PM Brian Foster wrote: 
> > > On Mon, Feb 24, 2020 at 09:30:44AM +0530, Chandan Rajendra wrote:
> > > > Log space reservation for xattr insert operation can be divided into two
> > > > parts,
> > > > 1. Mount time
> > > >    - Inode
> > > >    - Superblock for accounting space allocations
> > > >    - AGF for accounting space used by count, block number, rmap and refcnt
> > > >      btrees.
> > > > 
> > > > 2. The remaining log space can only be calculated at run time because,
> > > >    - A local xattr can be large enough to cause a double split of the dabtree.
> > > >    - The value of the xattr can be large enough to be stored in remote
> > > >      blocks. The contents of the remote blocks are not logged.
> > > > 
> > > >    The log space reservation could be,
> > > >    - 2 * XFS_DA_NODE_MAXDEPTH number of blocks. Additional XFS_DA_NODE_MAXDEPTH
> > > >      number of blocks are required if xattr is large enough to cause another
> > > >      split of the dabtree path from root to leaf block.
> > > >    - BMBT blocks for storing (2 * XFS_DA_NODE_MAXDEPTH) record
> > > >      entries. Additional XFS_DA_NODE_MAXDEPTH number of blocks are required in
> > > >      case of a double split of the dabtree path from root to leaf blocks.
> > > >    - Space for logging blocks of count, block number, rmap and refcnt btrees.
> > > > 
> > > > Presently, mount time log reservation includes block count required for a
> > > > single split of the dabtree. The dabtree block count is also taken into
> > > > account by xfs_attr_calc_size().
> > > > 
> > > > Also, AGF log space reservation isn't accounted for.
> > > > 
> > > > Due to the reasons mentioned above, log reservation calculation for xattr
> > > > insert operation gives an incorrect value.
> > > > 
> > > > Apart from the above, xfs_log_calc_max_attrsetm_res() passes byte count as
> > > > an argument to XFS_NEXTENTADD_SPACE_RES() instead of block count.
> > > > 
> > > > To fix these issues, this commit changes xfs_attr_calc_size() to also
> > > > calculate the number of dabtree blocks that need to be logged.
> > > > 
> > > > xfs_attr_set() uses the following values computed by xfs_attr_calc_size()
> > > > 1. The number of dabtree blocks that need to be logged.
> > > > 2. The number of remote blocks that need to be allocated.
> > > > 3. The number of dabtree blocks that need to be allocated.
> > > > 4. The number of bmbt blocks that need to be allocated.
> > > > 5. The total number of blocks that need to be allocated.
> > > > 
> > > > ... to compute number of bytes that need to be reserved in the log.
> > > > 
> > > > This commit also modifies xfs_log_calc_max_attrsetm_res() to invoke
> > > > xfs_attr_calc_size() to obtain the number of blocks to be logged which it uses
> > > > to figure out the total number of bytes to be logged.
> > > > 
> > > > Signed-off-by: Chandan Rajendra <chandanrlinux@xxxxxxxxx>
> > > > ---
> > > >  fs/xfs/libxfs/xfs_attr.c       |  7 +++--
> > > >  fs/xfs/libxfs/xfs_attr.h       |  3 ++
> > > >  fs/xfs/libxfs/xfs_log_rlimit.c | 16 +++++------
> > > >  fs/xfs/libxfs/xfs_trans_resv.c | 50 ++++++++++++++++------------------
> > > >  fs/xfs/libxfs/xfs_trans_resv.h |  2 ++
> > > >  5 files changed, 41 insertions(+), 37 deletions(-)
> > > > 
> > > > diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
> > > > index 1d62ce80d7949..f056f8597ee03 100644
> > > > --- a/fs/xfs/libxfs/xfs_attr.c
> > > > +++ b/fs/xfs/libxfs/xfs_attr.c
> > > > @@ -182,9 +182,12 @@ xfs_attr_calc_size(
> > > >  	size = xfs_attr_leaf_newentsize(mp->m_attr_geo, namelen, valuelen,
> > > >  			local);
> > > >  	resv->total_dablks = XFS_DAENTER_BLOCKS(mp, XFS_ATTR_FORK);
> > > > +	resv->log_dablks = 2 * resv->total_dablks;
> > > > +
> > > >  	if (*local) {
> > > >  		if (size > (blksize / 2)) {
> > > >  			/* Double split possible */
> > > > +			resv->log_dablks += resv->total_dablks;
> > > 
> > > I'm not quite clear on why log is double the total above, then we add an
> > > increment of total here. Can you elaborate on this?
> > 
> > - resv->total_dablks = XFS_DAENTER_BLOCKS(mp, XFS_ATTR_FORK); 
> > 
> >   This gives the number of dabtree blocks that we might need to allocate. If
> >   we do indeed allocate resv->total_dablks worth of blocks then it would mean
> >   that we would have split blocks from the root node to the leaf node of the
> >   dabtree. We would then need to log the following blocks,
> >   - Blocks belonging to the original path from root node to the leaf node.
> >   - The new set of blocks from the root node to the leaf node which resulted
> >     from splitting the corresponding blocks in the original path.
> >   Thus the number of blocks to be logged is 2 * resv->total_dablks.
> > 
> 
> Got it.
> 
> > - Double split
> >   If the size of the xattr is large enough to cause another split, then we
> >   would need yet another set of new blocks for representing the path from root
> >   to the leaf.  This is taken into consideration by 'resv->total_dablks *= 2'.
> >   These new blocks might also need to logged. So resv->log_dablks should be
> >   incremented by number of blocks b/w root node and the leaf. I now feel that
> >   the statement could be rewritten in the following manner, 'resv->log_dablks
> >   += XFS_DAENTER_BLOCKS(mp, XFS_ATTR_FORK)' if that makes it easier to
> >   understand.
> > 
> 
> Yeah, it might be clearer overall to not intertwine the fields.
> 
> > > 
> > > >  			resv->total_dablks *= 2;
> > > >  		}
> > > >  		resv->rmt_blks = 0;
> > > > @@ -349,9 +352,7 @@ xfs_attr_set(
> > > >  				return error;
> > > >  		}
> > > >  
> > > > -		tres.tr_logres = M_RES(mp)->tr_attrsetm.tr_logres +
> > > > -				 M_RES(mp)->tr_attrsetrt.tr_logres *
> > > > -					args->total;
> > > > +		tres.tr_logres = xfs_calc_attr_res(mp, &resv);
> > > 
> > > So instead of using the runtime reservation calculation, we call this
> > > new function that uses the block res structure and calculates log
> > > reservation for us. Seems like a decent cleanup, but I'm still having
> > > some trouble determining what is cleanup vs. what is fixing the res
> > > calculation.
> > 
> > Log reservation calculation for xattr set operation in XFS is
> > incorrect because,
> > 
> > - xfs_attr_set() computes the total log space reservation as the
> >   sum of 
> >   1. Mount time log reservation (M_RES(mp)->tr_attrsetm.tr_logres)
> >      Here, XFS incorrectly includes the number of blocks from root
> >      node to the leaf of a dabtree. This is incorrect because the
> >      number of such paths (from root node to leaf) depends on the
> >      size of the xattr. If the xattr is local but large enough to
> >      cause a double split then we would need three times the
> >      number of blocks in a path from root node to leaf. Hence the
> >      number of dabtree blocks that need to be logged is something
> >      that cannot be calculated at mount time.
> > 
> 
> Ok, so the dabtree portion doesn't properly account for dabtree splits
> and cannot do so as part of a mount time calculation.
> 
> >      Also, the mount log reservation calcuation does not account
> >      for log space required for logging the AGF.
> >  
> 
> Ok.
>     
> >   2. Run time log reservatation
> >      This is calculated by multiplying two components,
> >      - The first component consists of sum of the following
> >        - Superblock
> >        - The number of bytes for logging a single path from root
> >          to leaf of a bmbt tree.
> >      - The second component consists of,
> >        - The number of dabtree blocks
> >        - The number of bmbt blocks required for mapping new
> >          dabtree blocks.
> >      Here Bmbt blocks gets accounted twice. Also it does not make sense to
> >      multiply these two components.
> > 
> 
> Hm, Ok. I think I follow, though I'm still curious if this was a
> reproducible problem or not.
> 
> > The other bug is related to calcuation performed in
> > xfs_log_calc_max_attrsetm_res(),
> > 1. The call to XFS_DAENTER_SPACE_RES() macro returns the number of
> >    blocks required for a single split of dabtree and the
> >    corresponding number of bmbt blocks required for mapping the new
> >    dabtree blocks.
> > 2. The space occupied by the attribute value is treated as if it
> >    were a remote attribute and the space for it along with the
> >    corresponding bmbt blocks required is added to the number of
> >    blocks obtained in step 1.
> > 3. This sum is then multiplied with the runtime reservation. Here
> >    again Bmbt blocks gets accounted twice, once from the 'run time
> >    reservation' and the other from the call to
> >    XFS_DAENTER_SPACE_RES().
> > 4. The result from step 3 is added to 'mount time log
> >    reservation'. This means that the log space reservation is
> >    accounted twice for dabtree blocks.
> > 
> > To solve these problems, the fix consists of,
> > 1. Mounting time log reservation now includes only
> >    - Inode
> >    - Superblock
> >    - AGF
> >    i.e. we don't consider the dabtree logging space requirement
> >    when calculating mount time log reservation.
> > 
> > 2. Run time log reservation
> >    To calculate run time log reservation we change xfs_attr_calc_size()
> >    to return, 
> >    1. Number of Dabtree blocks required.
> >    2. Number of Dabtree blocks that might need to be logged.
> >    3. Number of remote blocks.
> >    4. Number of Bmbt blocks that might be required.
> > 
> 
> Can we further break down some of these changes into smaller patches
> that either fix one issue or do some refactoring? For example, one patch
> could move the existing rt reservation out of ->tr_attrsetrt (removing
> it) and establish the new xfs_calc_attr_res() function without changing
> the reservation. Another patch could move the dabtree component from the
> mount time to the runtime calculation. It's not clear to me if the
> individual components of the new rt calculation can be further broken
> down from there without risk of transient regression, but it seems we
> should at least be able to rework the reservation calculation in a
> single patch that does so from first principles with the necessary
> refactoring work to accommodate the proper mount/run time components
> already done..
> 

Sure, I will try to break it down into as many logical patches as possible.

-- 
chandan






[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux