On Thu, Aug 22, 2019 at 12:33:23PM +0800, kaixuxia wrote: > When performing rename operation with RENAME_WHITEOUT flag, we will > hold AGF lock to allocate or free extents in manipulating the dirents > firstly, and then doing the xfs_iunlink_remove() call last to hold > AGI lock to modify the tmpfile info, so we the lock order AGI->AGF. > > The big problem here is that we have an ordering constraint on AGF > and AGI locking - inode allocation locks the AGI, then can allocate > a new extent for new inodes, locking the AGF after the AGI. Hence > the ordering that is imposed by other parts of the code is AGI before > AGF. So we get the ABBA agi&agf deadlock here. 'So we get an ABBA deadlock between the AGI and AGF here." Can you also change the subject line to "AGI and AGF" instead of "agi&agf" which isn't easily searchable? e.g. "xfs: fix ABBA deadlock between AGI and AGF in rename()". > /* > - * Set up the target. > + * Check for expected errors before we dirty the transaction > + * so we can return an error without a transaction abort. > */ > - if (target_ip == NULL) { > + if (!target_ip && !spaceres) { > /* > * If there's no space reservation, check the entry will > * fit before actually inserting it. > */ > - if (!spaceres) { > - error = xfs_dir_canenter(tp, target_dp, target_name); > - if (error) > - goto out_trans_cancel; > - } > + error = xfs_dir_canenter(tp, target_dp, target_name); > + if (error) > + goto out_trans_cancel; > + } else if (target_ip && S_ISDIR(VFS_I(target_ip)->i_mode) && > + (!(xfs_dir_isempty(target_ip)) || > + (VFS_I(target_ip)->i_nlink > 2))) { > + /* > + * If target exists and it's a directory, check that whether > + * it can be destroyed. > + */ > + error = -EEXIST; > + goto out_trans_cancel; > + } I do think this would be better left separate if statements like this: if (!target_ip) { /* * If there's no space reservation, check the entry will * fit before actually inserting it. */ if (!spaceres) { error = xfs_dir_canenter(tp, target_dp, target_name); if (error) goto out_trans_cancel; } } else { /* * If target exists and it's a directory, check that whether * it can be destroyed. */ if (S_ISDIR(VFS_I(target_ip)->i_mode) && (!(xfs_dir_isempty(target_ip)) || (VFS_I(target_ip)->i_nlink > 2))) { error = -EEXIST; goto out_trans_cancel; } } I find this much easier to read and follow the logic, and we don't really care if it takes a couple more lines of code to make the comments and code flow more logically. > @@ -3419,25 +3431,15 @@ struct xfs_iunlink { > > /* > * For whiteouts, we need to bump the link count on the whiteout inode. Shouldn't this line be removed as well? > - * This means that failures all the way up to this point leave the inode > - * on the unlinked list and so cleanup is a simple matter of dropping > - * the remaining reference to it. If we fail here after bumping the link > - * count, we're shutting down the filesystem so we'll never see the > - * intermediate state on disk. > + * The whiteout inode has been removed from the unlinked list and log > + * recovery will clean up the mess for the failures up to this point. > + * After this point we have a real link, clear the tmpfile state flag > + * from the inode so it doesn't accidentally get misused in future. > */ > if (wip) { > ASSERT(VFS_I(wip)->i_nlink == 0); > xfs_bumplink(tp, wip); > - error = xfs_iunlink_remove(tp, wip); > - if (error) > - goto out_trans_cancel; > xfs_trans_log_inode(tp, wip, XFS_ILOG_CORE); > - > - /* > - * Now we have a real link, clear the "I'm a tmpfile" state > - * flag from the inode so it doesn't accidentally get misused in > - * future. > - */ > VFS_I(wip)->i_state &= ~I_LINKABLE; > } Why not move all this up into the same branch that removes the whiteout from the unlinked list? Why separate this logic as none of what is left here could cause a failure even if it is run earlier? Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx