Re: [PATCH 2/2] xfs: Prevent deadlock when allocating blocks for AGFL

Chandan Babu R <chandan.babu@xxxxxxxxxx> · Tue, 10 Jan 2023 18:18:59 +0530

On Tue, Jan 10, 2023 at 08:24:41 PM +0800, Xiao Guangrong wrote:
> On 6/17/21 12:48, Chandan Babu R wrote:
>
>>>>
>>>> Just because we currently do a blocking flush doesn't mean we always
>>>> must do a blocking flush....
>>>
>>> I will try to work out a solution.
>>
>> I believe the following should be taken into consideration to design an
>> "optimistic flush delay" based solution,
>> 1. Time consumed to perform a discard operation on a filesystem's block.
>> 2. The size of extents that are being discarded.
>> 3. Number of discard operation requests contained in a bio.
>>
>> AFAICT, The combinations resulting from the above make it impossible to
>> calculate a time delay during which sufficient number of busy extents are
>> guaranteed to have been freed so as to fill up the AGFL to the required
>> levels. In other words, sufficent number of busy extents may not have been
>> discarded even after the optimistic delay interval elapses.
>>
>> The other solution that I had thought about was to introduce a new flag for
>> the second argument of xfs_log_force(). The new flag will cause
>> xlog_state_do_iclog_callbacks() to wait on completion of all of the CIL ctxs
>> associated with the iclog that xfs_log_force() would be waiting on. Hence, a
>> call to xfs_log_force(mp, NEW_SYNC_FLAG) will return only after all the busy
>> extents associated with the iclog are discarded.
>>
>> However, this method is also flawed as described below.
>>
>> ----------------------------------------------------------
>>   Task A                        Task B
>> ----------------------------------------------------------
>>   Submit a filled up iclog
>>   for write operation
>>   (Assume that the iclog
>>   has non-zero number of CIL
>>   ctxs associated with it).
>>   On completion of iclog write
>>   operation, discard requests
>>   for busy extents are issued.
>>
>>   Write log records (including
>>   commit record) into another
>>   iclog.
>>
>>                                 A task which is trying
>>                                 to fill AGFL will now
>>                                 invoke xfs_log_force()
>>                                 with the new sync
>>                                 flag.
>>                                 Submit the 2nd iclog which
>>                                 was partially filled by
>>                                 Task A.
>>                                 If there are no
>>                                 discard requests
>>                                 associated this iclog,
>>                                 xfs_log_force() will
>>                                 return. As the discard
>>                                 requests associated with
>>                                 the first iclog are yet
>>                                 to be completed,
>>                                 we end up incorrectly
>>                                 concluding that
>>                                 all busy extents
>>                                 have been processed.
>> ----------------------------------------------------------
>>
>> The inconsistency indicated above could also occur when discard requests
>> issued against second iclog get processed before discard requests associated
>> with the first iclog.
>>
>> XFS_EXTENT_BUSY_IN_TRANS flag based solution is the only method that I can
>> think of that can solve this problem correctly. However I do agree with your
>> earlier observation that we should not flush busy extents unless we have
>> checked for presence of free extents in the btree records present on the left
>> side of the btree cursor.
>>
>
> Hi Chandan,
>
> Thanks for your great work. Do you have any update on these patches?
>
> We met the same issue on the 4.19 kernel, I am not sure if the work has already
> been merged in the upstream kernel.

Sorry, The machine on which the problem was created broke and I wasn't able to
recreate this bug on my new work setup. Hence, I didn't pursue working on this
bug.

-- 
chandan