On Tuesday, February 25, 2020 10:49 PM Brian Foster wrote: > On Mon, Feb 24, 2020 at 09:30:44AM +0530, Chandan Rajendra wrote: > > Log space reservation for xattr insert operation can be divided into two > > parts, > > 1. Mount time > > - Inode > > - Superblock for accounting space allocations > > - AGF for accounting space used by count, block number, rmap and refcnt > > btrees. > > > > 2. The remaining log space can only be calculated at run time because, > > - A local xattr can be large enough to cause a double split of the dabtree. > > - The value of the xattr can be large enough to be stored in remote > > blocks. The contents of the remote blocks are not logged. > > > > The log space reservation could be, > > - 2 * XFS_DA_NODE_MAXDEPTH number of blocks. Additional XFS_DA_NODE_MAXDEPTH > > number of blocks are required if xattr is large enough to cause another > > split of the dabtree path from root to leaf block. > > - BMBT blocks for storing (2 * XFS_DA_NODE_MAXDEPTH) record > > entries. Additional XFS_DA_NODE_MAXDEPTH number of blocks are required in > > case of a double split of the dabtree path from root to leaf blocks. > > - Space for logging blocks of count, block number, rmap and refcnt btrees. > > > > Presently, mount time log reservation includes block count required for a > > single split of the dabtree. The dabtree block count is also taken into > > account by xfs_attr_calc_size(). > > > > Also, AGF log space reservation isn't accounted for. > > > > Due to the reasons mentioned above, log reservation calculation for xattr > > insert operation gives an incorrect value. > > > > Apart from the above, xfs_log_calc_max_attrsetm_res() passes byte count as > > an argument to XFS_NEXTENTADD_SPACE_RES() instead of block count. > > > > To fix these issues, this commit changes xfs_attr_calc_size() to also > > calculate the number of dabtree blocks that need to be logged. > > > > xfs_attr_set() uses the following values computed by xfs_attr_calc_size() > > 1. The number of dabtree blocks that need to be logged. > > 2. The number of remote blocks that need to be allocated. > > 3. The number of dabtree blocks that need to be allocated. > > 4. The number of bmbt blocks that need to be allocated. > > 5. The total number of blocks that need to be allocated. > > > > ... to compute number of bytes that need to be reserved in the log. > > > > This commit also modifies xfs_log_calc_max_attrsetm_res() to invoke > > xfs_attr_calc_size() to obtain the number of blocks to be logged which it uses > > to figure out the total number of bytes to be logged. > > > > Signed-off-by: Chandan Rajendra <chandanrlinux@xxxxxxxxx> > > --- > > fs/xfs/libxfs/xfs_attr.c | 7 +++-- > > fs/xfs/libxfs/xfs_attr.h | 3 ++ > > fs/xfs/libxfs/xfs_log_rlimit.c | 16 +++++------ > > fs/xfs/libxfs/xfs_trans_resv.c | 50 ++++++++++++++++------------------ > > fs/xfs/libxfs/xfs_trans_resv.h | 2 ++ > > 5 files changed, 41 insertions(+), 37 deletions(-) > > > > diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c > > index 1d62ce80d7949..f056f8597ee03 100644 > > --- a/fs/xfs/libxfs/xfs_attr.c > > +++ b/fs/xfs/libxfs/xfs_attr.c > > @@ -182,9 +182,12 @@ xfs_attr_calc_size( > > size = xfs_attr_leaf_newentsize(mp->m_attr_geo, namelen, valuelen, > > local); > > resv->total_dablks = XFS_DAENTER_BLOCKS(mp, XFS_ATTR_FORK); > > + resv->log_dablks = 2 * resv->total_dablks; > > + > > if (*local) { > > if (size > (blksize / 2)) { > > /* Double split possible */ > > + resv->log_dablks += resv->total_dablks; > > I'm not quite clear on why log is double the total above, then we add an > increment of total here. Can you elaborate on this? - resv->total_dablks = XFS_DAENTER_BLOCKS(mp, XFS_ATTR_FORK); This gives the number of dabtree blocks that we might need to allocate. If we do indeed allocate resv->total_dablks worth of blocks then it would mean that we would have split blocks from the root node to the leaf node of the dabtree. We would then need to log the following blocks, - Blocks belonging to the original path from root node to the leaf node. - The new set of blocks from the root node to the leaf node which resulted from splitting the corresponding blocks in the original path. Thus the number of blocks to be logged is 2 * resv->total_dablks. - Double split If the size of the xattr is large enough to cause another split, then we would need yet another set of new blocks for representing the path from root to the leaf. This is taken into consideration by 'resv->total_dablks *= 2'. These new blocks might also need to logged. So resv->log_dablks should be incremented by number of blocks b/w root node and the leaf. I now feel that the statement could be rewritten in the following manner, 'resv->log_dablks += XFS_DAENTER_BLOCKS(mp, XFS_ATTR_FORK)' if that makes it easier to understand. > > > resv->total_dablks *= 2; > > } > > resv->rmt_blks = 0; > > @@ -349,9 +352,7 @@ xfs_attr_set( > > return error; > > } > > > > - tres.tr_logres = M_RES(mp)->tr_attrsetm.tr_logres + > > - M_RES(mp)->tr_attrsetrt.tr_logres * > > - args->total; > > + tres.tr_logres = xfs_calc_attr_res(mp, &resv); > > So instead of using the runtime reservation calculation, we call this > new function that uses the block res structure and calculates log > reservation for us. Seems like a decent cleanup, but I'm still having > some trouble determining what is cleanup vs. what is fixing the res > calculation. Log reservation calculation for xattr set operation in XFS is incorrect because, - xfs_attr_set() computes the total log space reservation as the sum of 1. Mount time log reservation (M_RES(mp)->tr_attrsetm.tr_logres) Here, XFS incorrectly includes the number of blocks from root node to the leaf of a dabtree. This is incorrect because the number of such paths (from root node to leaf) depends on the size of the xattr. If the xattr is local but large enough to cause a double split then we would need three times the number of blocks in a path from root node to leaf. Hence the number of dabtree blocks that need to be logged is something that cannot be calculated at mount time. Also, the mount log reservation calcuation does not account for log space required for logging the AGF. 2. Run time log reservatation This is calculated by multiplying two components, - The first component consists of sum of the following - Superblock - The number of bytes for logging a single path from root to leaf of a bmbt tree. - The second component consists of, - The number of dabtree blocks - The number of bmbt blocks required for mapping new dabtree blocks. Here Bmbt blocks gets accounted twice. Also it does not make sense to multiply these two components. The other bug is related to calcuation performed in xfs_log_calc_max_attrsetm_res(), 1. The call to XFS_DAENTER_SPACE_RES() macro returns the number of blocks required for a single split of dabtree and the corresponding number of bmbt blocks required for mapping the new dabtree blocks. 2. The space occupied by the attribute value is treated as if it were a remote attribute and the space for it along with the corresponding bmbt blocks required is added to the number of blocks obtained in step 1. 3. This sum is then multiplied with the runtime reservation. Here again Bmbt blocks gets accounted twice, once from the 'run time reservation' and the other from the call to XFS_DAENTER_SPACE_RES(). 4. The result from step 3 is added to 'mount time log reservation'. This means that the log space reservation is accounted twice for dabtree blocks. To solve these problems, the fix consists of, 1. Mounting time log reservation now includes only - Inode - Superblock - AGF i.e. we don't consider the dabtree logging space requirement when calculating mount time log reservation. 2. Run time log reservation To calculate run time log reservation we change xfs_attr_calc_size() to return, 1. Number of Dabtree blocks required. 2. Number of Dabtree blocks that might need to be logged. 3. Number of remote blocks. 4. Number of Bmbt blocks that might be required. The values returned by xfs_attr_calc_size() is used by - xfs_attr_set() - To compute total number of blocks required for attr set operation. - To compute the total log reservation (via a call to xfs_calc_attr_res()). - This uses dabtree blocks and bmbt blocks as input to calculate log space reservation for free space trees. - The total log space reservation is then computed by adding up - Mount time log reservation space. - dabtree blocks that need to logged and - free space tree blocks required. - xfs_log_calc_max_attrsetm_res() - To compute the log reservation space for a maximum sized local xattr. It uses the helper function xfs_calc_attr_res() accomplish this. i.e. xfs_attr_calc_size() is now a helper function which returns various sizes (dabtree blocks, dabtree blocks that need to be logged, bmbt blocks, and remote blocks) that could be used by other functions (xfs_attr_set() and xfs_log_calc_max_attrsetm_res()) to obtain various block reservation values associated with xattrs. One of the cleanups included in this patch was the addition of xfs_calc_attr_res() function so that the reservation code is limited to xfs_trans_resv.c. Please let me know if the above explaination does not answer your question satisfactorily. > > > tres.tr_logcount = XFS_ATTRSET_LOG_COUNT; > > tres.tr_logflags = XFS_TRANS_PERM_LOG_RES; > > total = args->total; > > diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h > > index 0e387230744c3..83508148bbd12 100644 > > --- a/fs/xfs/libxfs/xfs_attr.h > > +++ b/fs/xfs/libxfs/xfs_attr.h > > @@ -74,6 +74,9 @@ struct xfs_attr_list_context { > > }; > > > > struct xfs_attr_set_resv { > > + /* Number of blocks in the da btree that we might need to log. */ > > + unsigned int log_dablks; > > + > > /* Number of unlogged blocks needed to store the remote attr value. */ > > unsigned int rmt_blks; > > > > diff --git a/fs/xfs/libxfs/xfs_log_rlimit.c b/fs/xfs/libxfs/xfs_log_rlimit.c > > index 7f55eb3f36536..a132ffa7adf32 100644 > > --- a/fs/xfs/libxfs/xfs_log_rlimit.c > > +++ b/fs/xfs/libxfs/xfs_log_rlimit.c > > @@ -10,6 +10,7 @@ > > #include "xfs_log_format.h" > > #include "xfs_trans_resv.h" > > #include "xfs_mount.h" > > +#include "xfs_attr.h" > > #include "xfs_da_format.h" > > #include "xfs_trans_space.h" > > #include "xfs_da_btree.h" > > @@ -21,19 +22,18 @@ > > */ > > STATIC int > > xfs_log_calc_max_attrsetm_res( > > - struct xfs_mount *mp) > > + struct xfs_mount *mp) > > { > > - int size; > > - int nblks; > > + struct xfs_attr_set_resv resv; > > + int size; > > + int local; > > > > size = xfs_attr_leaf_entsize_local_max(mp->m_attr_geo->blksize) - > > MAXNAMELEN - 1; > > - nblks = XFS_DAENTER_SPACE_RES(mp, XFS_ATTR_FORK); > > - nblks += XFS_B_TO_FSB(mp, size); > > - nblks += XFS_NEXTENTADD_SPACE_RES(mp, size, XFS_ATTR_FORK); > > + xfs_attr_calc_size(mp, &resv, size, 0, &local); > > + ASSERT(local == 1); > > > > - return M_RES(mp)->tr_attrsetm.tr_logres + > > - M_RES(mp)->tr_attrsetrt.tr_logres * nblks; > > + return xfs_calc_attr_res(mp, &resv); > > } > > > > /* > > diff --git a/fs/xfs/libxfs/xfs_trans_resv.c b/fs/xfs/libxfs/xfs_trans_resv.c > > index 7a9c04920505a..c6b8cd56df2d7 100644 > > --- a/fs/xfs/libxfs/xfs_trans_resv.c > > +++ b/fs/xfs/libxfs/xfs_trans_resv.c > > @@ -19,6 +19,7 @@ > > #include "xfs_trans.h" > > #include "xfs_qm.h" > > #include "xfs_trans_space.h" > > +#include "xfs_attr.h" > > > > #define _ALLOC true > > #define _FREE false > > @@ -701,12 +702,10 @@ xfs_calc_attrinval_reservation( > > * Setting an attribute at mount time. > > * the inode getting the attribute > > * the superblock for allocations > > - * the agfs extents are allocated from > > - * the attribute btree * max depth > > - * the inode allocation btree > > + * the agf extents are allocated from > > * Since attribute transaction space is dependent on the size of the attribute, > > * the calculation is done partially at mount time and partially at runtime(see > > - * below). > > + * xfs_attr_calc_size()). > > */ > > STATIC uint > > xfs_calc_attrsetm_reservation( > > @@ -714,27 +713,7 @@ xfs_calc_attrsetm_reservation( > > { > > return XFS_DQUOT_LOGRES(mp) + > > xfs_calc_inode_res(mp, 1) + > > - xfs_calc_buf_res(1, mp->m_sb.sb_sectsize) + > > - xfs_calc_buf_res(XFS_DA_NODE_MAXDEPTH, XFS_FSB_TO_B(mp, 1)); > > -} > > - > > -/* > > - * Setting an attribute at runtime, transaction space unit per block. > > - * the superblock for allocations: sector size > > - * the inode bmap btree could join or split: max depth * block size > > - * Since the runtime attribute transaction space is dependent on the total > > - * blocks needed for the 1st bmap, here we calculate out the space unit for > > - * one block so that the caller could figure out the total space according > > - * to the attibute extent length in blocks by: > > - * ext * M_RES(mp)->tr_attrsetrt.tr_logres > > - */ > > -STATIC uint > > -xfs_calc_attrsetrt_reservation( > > - struct xfs_mount *mp) > > -{ > > - return xfs_calc_buf_res(1, mp->m_sb.sb_sectsize) + > > - xfs_calc_buf_res(XFS_BM_MAXLEVELS(mp, XFS_ATTR_FORK), > > - XFS_FSB_TO_B(mp, 1)); > > + xfs_calc_buf_res(2, mp->m_sb.sb_sectsize); > > } > > > > /* > > @@ -832,6 +811,25 @@ xfs_calc_sb_reservation( > > return xfs_calc_buf_res(1, mp->m_sb.sb_sectsize); > > } > > > > +uint > > +xfs_calc_attr_res( > > + struct xfs_mount *mp, > > + struct xfs_attr_set_resv *resv) > > +{ > > + unsigned int space_blks; > > + unsigned int attr_res; > > + > > + space_blks = xfs_allocfree_log_count(mp, > > + resv->total_dablks + resv->bmbt_blks); > > + > > + attr_res = M_RES(mp)->tr_attrsetm.tr_logres + > > + xfs_calc_buf_res(resv->log_dablks, mp->m_attr_geo->blksize) + > > + xfs_calc_buf_res(resv->bmbt_blks, mp->m_sb.sb_blocksize) + > > + xfs_calc_buf_res(space_blks, mp->m_sb.sb_blocksize); > > + > > + return attr_res; > > +} > > This function could use a header comment to explain what the reservation > covers. > Sure, I will do that. > > + > > void > > xfs_trans_resv_calc( > > struct xfs_mount *mp, > > @@ -942,7 +940,7 @@ xfs_trans_resv_calc( > > resp->tr_ichange.tr_logres = xfs_calc_ichange_reservation(mp); > > resp->tr_fsyncts.tr_logres = xfs_calc_swrite_reservation(mp); > > resp->tr_writeid.tr_logres = xfs_calc_writeid_reservation(mp); > > - resp->tr_attrsetrt.tr_logres = xfs_calc_attrsetrt_reservation(mp); > > + resp->tr_attrsetrt.tr_logres = 0; > > Should this go away if it's going to end up as zero? Yes, you are right about this. I will remove this before posting the next iteration of the patchset. > > resp->tr_clearagi.tr_logres = xfs_calc_clear_agi_bucket_reservation(mp); > > resp->tr_growrtzero.tr_logres = xfs_calc_growrtzero_reservation(mp); > > resp->tr_growrtfree.tr_logres = xfs_calc_growrtfree_reservation(mp); > > diff --git a/fs/xfs/libxfs/xfs_trans_resv.h b/fs/xfs/libxfs/xfs_trans_resv.h > > index 7241ab28cf84f..3a6a0bf21e9b1 100644 > > --- a/fs/xfs/libxfs/xfs_trans_resv.h > > +++ b/fs/xfs/libxfs/xfs_trans_resv.h > > @@ -7,6 +7,7 @@ > > #define __XFS_TRANS_RESV_H__ > > > > struct xfs_mount; > > +struct xfs_attr_set_resv; > > > > /* > > * structure for maintaining pre-calculated transaction reservations. > > @@ -91,6 +92,7 @@ struct xfs_trans_resv { > > #define XFS_ATTRSET_LOG_COUNT 3 > > #define XFS_ATTRRM_LOG_COUNT 3 > > > > +uint xfs_calc_attr_res(struct xfs_mount *mp, struct xfs_attr_set_resv *resv); > > void xfs_trans_resv_calc(struct xfs_mount *mp, struct xfs_trans_resv *resp); > > uint xfs_allocfree_log_count(struct xfs_mount *mp, uint num_ops); > > > > -- chandan