On Thursday, February 27, 2020 12:20 AM Brian Foster wrote: > On Wed, Feb 26, 2020 at 04:51:13PM +0530, Chandan Rajendra wrote: > > On Tuesday, February 25, 2020 10:49 PM Brian Foster wrote: > > > On Mon, Feb 24, 2020 at 09:30:44AM +0530, Chandan Rajendra wrote: > > > > Log space reservation for xattr insert operation can be divided into two > > > > parts, > > > > 1. Mount time > > > > - Inode > > > > - Superblock for accounting space allocations > > > > - AGF for accounting space used by count, block number, rmap and refcnt > > > > btrees. > > > > > > > > 2. The remaining log space can only be calculated at run time because, > > > > - A local xattr can be large enough to cause a double split of the dabtree. > > > > - The value of the xattr can be large enough to be stored in remote > > > > blocks. The contents of the remote blocks are not logged. > > > > > > > > The log space reservation could be, > > > > - 2 * XFS_DA_NODE_MAXDEPTH number of blocks. Additional XFS_DA_NODE_MAXDEPTH > > > > number of blocks are required if xattr is large enough to cause another > > > > split of the dabtree path from root to leaf block. > > > > - BMBT blocks for storing (2 * XFS_DA_NODE_MAXDEPTH) record > > > > entries. Additional XFS_DA_NODE_MAXDEPTH number of blocks are required in > > > > case of a double split of the dabtree path from root to leaf blocks. > > > > - Space for logging blocks of count, block number, rmap and refcnt btrees. > > > > > > > > Presently, mount time log reservation includes block count required for a > > > > single split of the dabtree. The dabtree block count is also taken into > > > > account by xfs_attr_calc_size(). > > > > > > > > Also, AGF log space reservation isn't accounted for. > > > > > > > > Due to the reasons mentioned above, log reservation calculation for xattr > > > > insert operation gives an incorrect value. > > > > > > > > Apart from the above, xfs_log_calc_max_attrsetm_res() passes byte count as > > > > an argument to XFS_NEXTENTADD_SPACE_RES() instead of block count. > > > > > > > > To fix these issues, this commit changes xfs_attr_calc_size() to also > > > > calculate the number of dabtree blocks that need to be logged. > > > > > > > > xfs_attr_set() uses the following values computed by xfs_attr_calc_size() > > > > 1. The number of dabtree blocks that need to be logged. > > > > 2. The number of remote blocks that need to be allocated. > > > > 3. The number of dabtree blocks that need to be allocated. > > > > 4. The number of bmbt blocks that need to be allocated. > > > > 5. The total number of blocks that need to be allocated. > > > > > > > > ... to compute number of bytes that need to be reserved in the log. > > > > > > > > This commit also modifies xfs_log_calc_max_attrsetm_res() to invoke > > > > xfs_attr_calc_size() to obtain the number of blocks to be logged which it uses > > > > to figure out the total number of bytes to be logged. > > > > > > > > Signed-off-by: Chandan Rajendra <chandanrlinux@xxxxxxxxx> > > > > --- > > > > fs/xfs/libxfs/xfs_attr.c | 7 +++-- > > > > fs/xfs/libxfs/xfs_attr.h | 3 ++ > > > > fs/xfs/libxfs/xfs_log_rlimit.c | 16 +++++------ > > > > fs/xfs/libxfs/xfs_trans_resv.c | 50 ++++++++++++++++------------------ > > > > fs/xfs/libxfs/xfs_trans_resv.h | 2 ++ > > > > 5 files changed, 41 insertions(+), 37 deletions(-) > > > > > > > > diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c > > > > index 1d62ce80d7949..f056f8597ee03 100644 > > > > --- a/fs/xfs/libxfs/xfs_attr.c > > > > +++ b/fs/xfs/libxfs/xfs_attr.c > > > > @@ -182,9 +182,12 @@ xfs_attr_calc_size( > > > > size = xfs_attr_leaf_newentsize(mp->m_attr_geo, namelen, valuelen, > > > > local); > > > > resv->total_dablks = XFS_DAENTER_BLOCKS(mp, XFS_ATTR_FORK); > > > > + resv->log_dablks = 2 * resv->total_dablks; > > > > + > > > > if (*local) { > > > > if (size > (blksize / 2)) { > > > > /* Double split possible */ > > > > + resv->log_dablks += resv->total_dablks; > > > > > > I'm not quite clear on why log is double the total above, then we add an > > > increment of total here. Can you elaborate on this? > > > > - resv->total_dablks = XFS_DAENTER_BLOCKS(mp, XFS_ATTR_FORK); > > > > This gives the number of dabtree blocks that we might need to allocate. If > > we do indeed allocate resv->total_dablks worth of blocks then it would mean > > that we would have split blocks from the root node to the leaf node of the > > dabtree. We would then need to log the following blocks, > > - Blocks belonging to the original path from root node to the leaf node. > > - The new set of blocks from the root node to the leaf node which resulted > > from splitting the corresponding blocks in the original path. > > Thus the number of blocks to be logged is 2 * resv->total_dablks. > > > > Got it. > > > - Double split > > If the size of the xattr is large enough to cause another split, then we > > would need yet another set of new blocks for representing the path from root > > to the leaf. This is taken into consideration by 'resv->total_dablks *= 2'. > > These new blocks might also need to logged. So resv->log_dablks should be > > incremented by number of blocks b/w root node and the leaf. I now feel that > > the statement could be rewritten in the following manner, 'resv->log_dablks > > += XFS_DAENTER_BLOCKS(mp, XFS_ATTR_FORK)' if that makes it easier to > > understand. > > > > Yeah, it might be clearer overall to not intertwine the fields. > > > > > > > > resv->total_dablks *= 2; > > > > } > > > > resv->rmt_blks = 0; > > > > @@ -349,9 +352,7 @@ xfs_attr_set( > > > > return error; > > > > } > > > > > > > > - tres.tr_logres = M_RES(mp)->tr_attrsetm.tr_logres + > > > > - M_RES(mp)->tr_attrsetrt.tr_logres * > > > > - args->total; > > > > + tres.tr_logres = xfs_calc_attr_res(mp, &resv); > > > > > > So instead of using the runtime reservation calculation, we call this > > > new function that uses the block res structure and calculates log > > > reservation for us. Seems like a decent cleanup, but I'm still having > > > some trouble determining what is cleanup vs. what is fixing the res > > > calculation. > > > > Log reservation calculation for xattr set operation in XFS is > > incorrect because, > > > > - xfs_attr_set() computes the total log space reservation as the > > sum of > > 1. Mount time log reservation (M_RES(mp)->tr_attrsetm.tr_logres) > > Here, XFS incorrectly includes the number of blocks from root > > node to the leaf of a dabtree. This is incorrect because the > > number of such paths (from root node to leaf) depends on the > > size of the xattr. If the xattr is local but large enough to > > cause a double split then we would need three times the > > number of blocks in a path from root node to leaf. Hence the > > number of dabtree blocks that need to be logged is something > > that cannot be calculated at mount time. > > > > Ok, so the dabtree portion doesn't properly account for dabtree splits > and cannot do so as part of a mount time calculation. > > > Also, the mount log reservation calcuation does not account > > for log space required for logging the AGF. > > > > Ok. > > > 2. Run time log reservatation > > This is calculated by multiplying two components, > > - The first component consists of sum of the following > > - Superblock > > - The number of bytes for logging a single path from root > > to leaf of a bmbt tree. > > - The second component consists of, > > - The number of dabtree blocks > > - The number of bmbt blocks required for mapping new > > dabtree blocks. > > Here Bmbt blocks gets accounted twice. Also it does not make sense to > > multiply these two components. > > > > Hm, Ok. I think I follow, though I'm still curious if this was a > reproducible problem or not. > > > The other bug is related to calcuation performed in > > xfs_log_calc_max_attrsetm_res(), > > 1. The call to XFS_DAENTER_SPACE_RES() macro returns the number of > > blocks required for a single split of dabtree and the > > corresponding number of bmbt blocks required for mapping the new > > dabtree blocks. > > 2. The space occupied by the attribute value is treated as if it > > were a remote attribute and the space for it along with the > > corresponding bmbt blocks required is added to the number of > > blocks obtained in step 1. > > 3. This sum is then multiplied with the runtime reservation. Here > > again Bmbt blocks gets accounted twice, once from the 'run time > > reservation' and the other from the call to > > XFS_DAENTER_SPACE_RES(). > > 4. The result from step 3 is added to 'mount time log > > reservation'. This means that the log space reservation is > > accounted twice for dabtree blocks. > > > > To solve these problems, the fix consists of, > > 1. Mounting time log reservation now includes only > > - Inode > > - Superblock > > - AGF > > i.e. we don't consider the dabtree logging space requirement > > when calculating mount time log reservation. > > > > 2. Run time log reservation > > To calculate run time log reservation we change xfs_attr_calc_size() > > to return, > > 1. Number of Dabtree blocks required. > > 2. Number of Dabtree blocks that might need to be logged. > > 3. Number of remote blocks. > > 4. Number of Bmbt blocks that might be required. > > > > Can we further break down some of these changes into smaller patches > that either fix one issue or do some refactoring? For example, one patch > could move the existing rt reservation out of ->tr_attrsetrt (removing > it) and establish the new xfs_calc_attr_res() function without changing > the reservation. Another patch could move the dabtree component from the > mount time to the runtime calculation. It's not clear to me if the > individual components of the new rt calculation can be further broken > down from there without risk of transient regression, but it seems we > should at least be able to rework the reservation calculation in a > single patch that does so from first principles with the necessary > refactoring work to accommodate the proper mount/run time components > already done.. > Sure, I will try to break it down into as many logical patches as possible. -- chandan