Re: [PATCH v4] xfs: Fix agi&agf ABBA deadlock when performing rename with RENAME_WHITEOUT flag

kaixuxia <xiakaixu1987@xxxxxxxxx> · Thu, 22 Aug 2019 13:45:48 +0800



On 2019/8/22 13:01, Dave Chinner wrote:
> On Thu, Aug 22, 2019 at 12:33:23PM +0800, kaixuxia wrote:
>> When performing rename operation with RENAME_WHITEOUT flag, we will
>> hold AGF lock to allocate or free extents in manipulating the dirents
>> firstly, and then doing the xfs_iunlink_remove() call last to hold
>> AGI lock to modify the tmpfile info, so we the lock order AGI->AGF.
>>
>> The big problem here is that we have an ordering constraint on AGF
>> and AGI locking - inode allocation locks the AGI, then can allocate
>> a new extent for new inodes, locking the AGF after the AGI. Hence
>> the ordering that is imposed by other parts of the code is AGI before
>> AGF. So we get the ABBA agi&agf deadlock here.
> 
> 'So we get an ABBA deadlock between the AGI and AGF here."
> 
> Can you also change the subject line to "AGI and AGF" instead of
> "agi&agf" which isn't easily searchable? e.g. "xfs: fix ABBA
> deadlock between AGI and AGF in rename()".

OK , will follow it.
> 
>>  	/*
>> -	 * Set up the target.
>> +	 * Check for expected errors before we dirty the transaction
>> +	 * so we can return an error without a transaction abort.
>>  	 */
>> -	if (target_ip == NULL) {
>> +	if (!target_ip && !spaceres) {
>>  		/*
>>  		 * If there's no space reservation, check the entry will
>>  		 * fit before actually inserting it.
>>  		 */
>> -		if (!spaceres) {
>> -			error = xfs_dir_canenter(tp, target_dp, target_name);
>> -			if (error)
>> -				goto out_trans_cancel;
>> -		}
>> +		error = xfs_dir_canenter(tp, target_dp, target_name);
>> +		if (error)
>> +			goto out_trans_cancel;
>> +	} else if (target_ip && S_ISDIR(VFS_I(target_ip)->i_mode) &&
>> +		  (!(xfs_dir_isempty(target_ip)) ||
>> +		  (VFS_I(target_ip)->i_nlink > 2))) {
>> +		/*
>> +		 * If target exists and it's a directory, check that whether
>> +		 * it can be destroyed.
>> +		 */
>> +		error = -EEXIST;
>> +		goto out_trans_cancel;
>> +	}
> 
> I do think this would be better left separate if statements like
> this:
> 
> 	if (!target_ip) {
> 		/*
> 		 * If there's no space reservation, check the entry will
> 		 * fit before actually inserting it.
> 		 */
> 		if (!spaceres) {
> 			error = xfs_dir_canenter(tp, target_dp, target_name);
> 			if (error)
> 				goto out_trans_cancel;
> 		}
> 	} else {
> 		/*
> 		 * If target exists and it's a directory, check that whether
> 		 * it can be destroyed.
> 		 */
> 		if (S_ISDIR(VFS_I(target_ip)->i_mode) &&
> 		    (!(xfs_dir_isempty(target_ip)) ||
> 		    (VFS_I(target_ip)->i_nlink > 2))) {
> 			error = -EEXIST;
> 			goto out_trans_cancel;
> 		}
> 	}
> 
> I find this much easier to read and follow the logic, and we don't
> really care if it takes a couple more lines of code to make the
> comments and code flow more logically.

Right, it is much easier to read and understand the logical.
> 
>> @@ -3419,25 +3431,15 @@ struct xfs_iunlink {
>>  
>>  	/*
>>  	 * For whiteouts, we need to bump the link count on the whiteout inode.
> 
> Shouldn't this line be removed as well?

Because the xfs_bumplink() call below will do this.
> 
>> -	 * This means that failures all the way up to this point leave the inode
>> -	 * on the unlinked list and so cleanup is a simple matter of dropping
>> -	 * the remaining reference to it. If we fail here after bumping the link
>> -	 * count, we're shutting down the filesystem so we'll never see the
>> -	 * intermediate state on disk.
>> +	 * The whiteout inode has been removed from the unlinked list and log
>> +	 * recovery will clean up the mess for the failures up to this point.
>> +	 * After this point we have a real link, clear the tmpfile state flag
>> +	 * from the inode so it doesn't accidentally get misused in future.
>>  	 */
>>  	if (wip) {
>>  		ASSERT(VFS_I(wip)->i_nlink == 0);
>>  		xfs_bumplink(tp, wip);
>> -		error = xfs_iunlink_remove(tp, wip);
>> -		if (error)
>> -			goto out_trans_cancel;
>>  		xfs_trans_log_inode(tp, wip, XFS_ILOG_CORE);
>> -
>> -		/*
>> -		 * Now we have a real link, clear the "I'm a tmpfile" state
>> -		 * flag from the inode so it doesn't accidentally get misused in
>> -		 * future.
>> -		 */
>>  		VFS_I(wip)->i_state &= ~I_LINKABLE;
>>  	}
> 
> Why not move all this up into the same branch that removes the
> whiteout from the unlinked list? Why separate this logic as none of
> what is left here could cause a failure even if it is run earlier?

Yep, it could not cause a failure if we move all this into the same
branch that xfs_iunlink_remove() call. We move the xfs_iunlink_remove()
first to preserve correct AGI/AGF locking order, and maybe it is better
we bump the link count after using the whiteout inode really, such as
xfs_dir_replace(...,wip,...) ...
> 
> Cheers,
> 
> Dave.
> 

-- 
kaixuxia