On 2019/11/6 12:56, Darrick J. Wong wrote: > On Tue, Nov 05, 2019 at 05:52:12PM +0800, kaixuxia wrote: >> When target_ip exists in xfs_rename(), the xfs_dir_replace() call may >> need to hold the AGF lock to allocate more blocks, and then invoking >> the xfs_droplink() call to hold AGI lock to drop target_ip onto the >> unlinked list, so we get the lock order AGF->AGI. This would break the >> ordering constraint on AGI and AGF locking - inode allocation locks >> the AGI, then can allocate a new extent for new inodes, locking the >> AGF after the AGI. >> >> In this patch we check whether the replace operation need more >> blocks firstly. If so, acquire the agi lock firstly to preserve >> locking order(AGI/AGF). Actually, the locking order problem only >> occurs when we are locking the AGI/AGF of the same AG. For multiple >> AGs the AGI lock will be released after the transaction committed. >> >> Signed-off-by: kaixuxia <kaixuxia@xxxxxxxxxxx> >> --- >> Changes in v2: >> - Add xfs_dir2_sf_replace_needblock() helper in >> xfs_dir2_sf.c. >> >> fs/xfs/libxfs/xfs_dir2.c | 23 +++++++++++++++++++++++ >> fs/xfs/libxfs/xfs_dir2.h | 2 ++ >> fs/xfs/libxfs/xfs_dir2_priv.h | 2 ++ >> fs/xfs/libxfs/xfs_dir2_sf.c | 24 ++++++++++++++++++++++++ >> fs/xfs/xfs_inode.c | 14 ++++++++++++++ >> 5 files changed, 65 insertions(+) >> >> diff --git a/fs/xfs/libxfs/xfs_dir2.c b/fs/xfs/libxfs/xfs_dir2.c >> index 867c5de..1917990 100644 >> --- a/fs/xfs/libxfs/xfs_dir2.c >> +++ b/fs/xfs/libxfs/xfs_dir2.c >> @@ -463,6 +463,29 @@ >> } >> >> /* >> + * Check whether the replace operation need more blocks. Ignore >> + * the parameters check since the real replace() call below will >> + * do that. >> + */ >> +bool >> +xfs_dir_replace_needblock( > > xfs_dir2, to be consistent. > >> + struct xfs_inode *dp, >> + xfs_ino_t inum) > > If you passed the inode pointer (instead of ip->i_ino) here then you > don't need to revalidate the inode number. > >> +{ >> + int rval; >> + >> + rval = xfs_dir_ino_validate(dp->i_mount, inum); >> + if (rval) >> + return false; >> + >> + /* >> + * Only convert the shortform directory to block form maybe >> + * need more blocks. >> + */ >> + return xfs_dir2_sf_replace_needblock(dp, inum); > > if (dp->i_d.di_format != XFS_DINODE_FMT_LOCAL) > return xfs_dir2_sf_replace_needblock(...); > > Also, do other directories formats need extra blocks allocated? Yeah, I think so. Other dirs formats only need to change the inode number to the new value and extra blocks are not necessary for them. > >> +} >> + >> +/* >> * Replace the inode number of a directory entry. >> */ >> int >> diff --git a/fs/xfs/libxfs/xfs_dir2.h b/fs/xfs/libxfs/xfs_dir2.h >> index f542447..e436c14 100644 >> --- a/fs/xfs/libxfs/xfs_dir2.h >> +++ b/fs/xfs/libxfs/xfs_dir2.h >> @@ -124,6 +124,8 @@ extern int xfs_dir_lookup(struct xfs_trans *tp, struct xfs_inode *dp, >> extern int xfs_dir_removename(struct xfs_trans *tp, struct xfs_inode *dp, >> struct xfs_name *name, xfs_ino_t ino, >> xfs_extlen_t tot); >> +extern bool xfs_dir_replace_needblock(struct xfs_inode *dp, >> + xfs_ino_t inum); >> extern int xfs_dir_replace(struct xfs_trans *tp, struct xfs_inode *dp, >> struct xfs_name *name, xfs_ino_t inum, >> xfs_extlen_t tot); >> diff --git a/fs/xfs/libxfs/xfs_dir2_priv.h b/fs/xfs/libxfs/xfs_dir2_priv.h >> index 59f9fb2..002103f 100644 >> --- a/fs/xfs/libxfs/xfs_dir2_priv.h >> +++ b/fs/xfs/libxfs/xfs_dir2_priv.h >> @@ -116,6 +116,8 @@ extern int xfs_dir2_block_to_sf(struct xfs_da_args *args, struct xfs_buf *bp, >> extern int xfs_dir2_sf_create(struct xfs_da_args *args, xfs_ino_t pino); >> extern int xfs_dir2_sf_lookup(struct xfs_da_args *args); >> extern int xfs_dir2_sf_removename(struct xfs_da_args *args); >> +extern bool xfs_dir2_sf_replace_needblock(struct xfs_inode *dp, >> + xfs_ino_t inum); >> extern int xfs_dir2_sf_replace(struct xfs_da_args *args); >> extern xfs_failaddr_t xfs_dir2_sf_verify(struct xfs_inode *ip); >> >> diff --git a/fs/xfs/libxfs/xfs_dir2_sf.c b/fs/xfs/libxfs/xfs_dir2_sf.c >> index 85f14fc..0906f91 100644 >> --- a/fs/xfs/libxfs/xfs_dir2_sf.c >> +++ b/fs/xfs/libxfs/xfs_dir2_sf.c >> @@ -945,6 +945,30 @@ static int xfs_dir2_sf_addname_pick(xfs_da_args_t *args, int objchange, >> } >> >> /* >> + * Check whether the replace operation need more blocks. >> + */ >> +bool >> +xfs_dir2_sf_replace_needblock( > > Urgggh. This is a predicate that we only ever call from xfs_rename(), > right? And it addresses a particular quirk of the locking when the > caller wants us to rename on top of an existing entry and drop the link > count of the old inode, right? So why can't this just be a predicate in > xfs_inode.c ? Nobody else needs to know this particular piece of > information, AFAICT. > > (Apologies, for Brian and I clearly aren't on the same page about > that...) Hmm... sorry, I had misunderstood Brian's mean. Right, maybe we only need the xfs_dir2_sf_replace_needblock() call, and then involve it in both places. Thanks for your comments, will address them soon and send v3. Kaixu > >> + struct xfs_inode *dp, >> + xfs_ino_t inum) >> +{ >> + int newsize; >> + xfs_dir2_sf_hdr_t *sfp; >> + >> + if (dp->i_d.di_format != XFS_DINODE_FMT_LOCAL) >> + return false; > > This check should be used up in xfs_dir2_replace_needblock() to decide > if we're calling xfs_dir2_sf_replace_needblock(), or just returning > false. > >> + >> + sfp = (xfs_dir2_sf_hdr_t *)dp->i_df.if_u1.if_data; >> + newsize = dp->i_df.if_bytes + (sfp->count + 1) * XFS_INO64_DIFF; >> + >> + if (inum > XFS_DIR2_MAX_SHORT_INUM && >> + sfp->i8count == 0 && newsize > XFS_IFORK_DSIZE(dp)) >> + return true; >> + else >> + return false; > > return inum > XFS_DIR2_MAX_SHORT_INUM && (all the rest of that); > >> +} >> + >> +/* >> * Replace the inode number of an entry in a shortform directory. >> */ >> int /* error */ >> diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c >> index 18f4b26..c239070 100644 >> --- a/fs/xfs/xfs_inode.c >> +++ b/fs/xfs/xfs_inode.c >> @@ -3196,6 +3196,7 @@ struct xfs_iunlink { >> struct xfs_trans *tp; >> struct xfs_inode *wip = NULL; /* whiteout inode */ >> struct xfs_inode *inodes[__XFS_SORT_INODES]; >> + struct xfs_buf *agibp; >> int num_inodes = __XFS_SORT_INODES; >> bool new_parent = (src_dp != target_dp); >> bool src_is_directory = S_ISDIR(VFS_I(src_ip)->i_mode); >> @@ -3361,6 +3362,19 @@ struct xfs_iunlink { >> * In case there is already an entry with the same >> * name at the destination directory, remove it first. >> */ >> + >> + /* >> + * Check whether the replace operation need more blocks. >> + * If so, acquire the agi lock firstly to preserve locking > > "first" > >> + * order(AGI/AGF). > > Nit: space between "order" and "(AGI/AGF)". >> + */ >> + if (xfs_dir_replace_needblock(target_dp, src_ip->i_ino)) { >> + error = xfs_read_agi(mp, tp, >> + XFS_INO_TO_AGNO(mp, target_ip->i_ino), &agibp); > > Overly long line here. > > --D > >> + if (error) >> + goto out_trans_cancel; >> + } >> + >> error = xfs_dir_replace(tp, target_dp, target_name, >> src_ip->i_ino, spaceres); >> if (error) >> -- >> 1.8.3.1 >> -- kaixuxia