On 2019/8/22 13:01, Dave Chinner wrote: > On Thu, Aug 22, 2019 at 12:33:23PM +0800, kaixuxia wrote: >> When performing rename operation with RENAME_WHITEOUT flag, we will >> hold AGF lock to allocate or free extents in manipulating the dirents >> firstly, and then doing the xfs_iunlink_remove() call last to hold >> AGI lock to modify the tmpfile info, so we the lock order AGI->AGF. >> >> The big problem here is that we have an ordering constraint on AGF >> and AGI locking - inode allocation locks the AGI, then can allocate >> a new extent for new inodes, locking the AGF after the AGI. Hence >> the ordering that is imposed by other parts of the code is AGI before >> AGF. So we get the ABBA agi&agf deadlock here. > > 'So we get an ABBA deadlock between the AGI and AGF here." > > Can you also change the subject line to "AGI and AGF" instead of > "agi&agf" which isn't easily searchable? e.g. "xfs: fix ABBA > deadlock between AGI and AGF in rename()". OK , will follow it. > >> /* >> - * Set up the target. >> + * Check for expected errors before we dirty the transaction >> + * so we can return an error without a transaction abort. >> */ >> - if (target_ip == NULL) { >> + if (!target_ip && !spaceres) { >> /* >> * If there's no space reservation, check the entry will >> * fit before actually inserting it. >> */ >> - if (!spaceres) { >> - error = xfs_dir_canenter(tp, target_dp, target_name); >> - if (error) >> - goto out_trans_cancel; >> - } >> + error = xfs_dir_canenter(tp, target_dp, target_name); >> + if (error) >> + goto out_trans_cancel; >> + } else if (target_ip && S_ISDIR(VFS_I(target_ip)->i_mode) && >> + (!(xfs_dir_isempty(target_ip)) || >> + (VFS_I(target_ip)->i_nlink > 2))) { >> + /* >> + * If target exists and it's a directory, check that whether >> + * it can be destroyed. >> + */ >> + error = -EEXIST; >> + goto out_trans_cancel; >> + } > > I do think this would be better left separate if statements like > this: > > if (!target_ip) { > /* > * If there's no space reservation, check the entry will > * fit before actually inserting it. > */ > if (!spaceres) { > error = xfs_dir_canenter(tp, target_dp, target_name); > if (error) > goto out_trans_cancel; > } > } else { > /* > * If target exists and it's a directory, check that whether > * it can be destroyed. > */ > if (S_ISDIR(VFS_I(target_ip)->i_mode) && > (!(xfs_dir_isempty(target_ip)) || > (VFS_I(target_ip)->i_nlink > 2))) { > error = -EEXIST; > goto out_trans_cancel; > } > } > > I find this much easier to read and follow the logic, and we don't > really care if it takes a couple more lines of code to make the > comments and code flow more logically. Right, it is much easier to read and understand the logical. > >> @@ -3419,25 +3431,15 @@ struct xfs_iunlink { >> >> /* >> * For whiteouts, we need to bump the link count on the whiteout inode. > > Shouldn't this line be removed as well? Because the xfs_bumplink() call below will do this. > >> - * This means that failures all the way up to this point leave the inode >> - * on the unlinked list and so cleanup is a simple matter of dropping >> - * the remaining reference to it. If we fail here after bumping the link >> - * count, we're shutting down the filesystem so we'll never see the >> - * intermediate state on disk. >> + * The whiteout inode has been removed from the unlinked list and log >> + * recovery will clean up the mess for the failures up to this point. >> + * After this point we have a real link, clear the tmpfile state flag >> + * from the inode so it doesn't accidentally get misused in future. >> */ >> if (wip) { >> ASSERT(VFS_I(wip)->i_nlink == 0); >> xfs_bumplink(tp, wip); >> - error = xfs_iunlink_remove(tp, wip); >> - if (error) >> - goto out_trans_cancel; >> xfs_trans_log_inode(tp, wip, XFS_ILOG_CORE); >> - >> - /* >> - * Now we have a real link, clear the "I'm a tmpfile" state >> - * flag from the inode so it doesn't accidentally get misused in >> - * future. >> - */ >> VFS_I(wip)->i_state &= ~I_LINKABLE; >> } > > Why not move all this up into the same branch that removes the > whiteout from the unlinked list? Why separate this logic as none of > what is left here could cause a failure even if it is run earlier? Yep, it could not cause a failure if we move all this into the same branch that xfs_iunlink_remove() call. We move the xfs_iunlink_remove() first to preserve correct AGI/AGF locking order, and maybe it is better we bump the link count after using the whiteout inode really, such as xfs_dir_replace(...,wip,...) ... > > Cheers, > > Dave. > -- kaixuxia