Re: [PATCH 2/2] xfs: Prevent deadlock when allocating blocks for AGFL

Xiao Guangrong <xiaoguangrong.eric@xxxxxxxxx> · Tue, 10 Jan 2023 20:24:41 +0800

On 6/17/21 12:48, Chandan Babu R wrote:

>>>
>>> Just because we currently do a blocking flush doesn't mean we always
>>> must do a blocking flush....
>>
>> I will try to work out a solution.
>
> I believe the following should be taken into consideration to design an
> "optimistic flush delay" based solution,
> 1. Time consumed to perform a discard operation on a filesystem's block.
> 2. The size of extents that are being discarded.
> 3. Number of discard operation requests contained in a bio.
>
> AFAICT, The combinations resulting from the above make it impossible to
> calculate a time delay during which sufficient number of busy extents are
> guaranteed to have been freed so as to fill up the AGFL to the required
> levels. In other words, sufficent number of busy extents may not have been
> discarded even after the optimistic delay interval elapses.
>
> The other solution that I had thought about was to introduce a new flag for
> the second argument of xfs_log_force(). The new flag will cause
> xlog_state_do_iclog_callbacks() to wait on completion of all of the CIL ctxs
> associated with the iclog that xfs_log_force() would be waiting on. Hence, a
> call to xfs_log_force(mp, NEW_SYNC_FLAG) will return only after all the busy
> extents associated with the iclog are discarded.
>
> However, this method is also flawed as described below.
>
> ----------------------------------------------------------
>   Task A                        Task B
> ----------------------------------------------------------
>   Submit a filled up iclog
>   for write operation
>   (Assume that the iclog
>   has non-zero number of CIL
>   ctxs associated with it).
>   On completion of iclog write
>   operation, discard requests
>   for busy extents are issued.
>
>   Write log records (including
>   commit record) into another
>   iclog.
>
>                                 A task which is trying
>                                 to fill AGFL will now
>                                 invoke xfs_log_force()
>                                 with the new sync
>                                 flag.
>                                 Submit the 2nd iclog which
>                                 was partially filled by
>                                 Task A.
>                                 If there are no
>                                 discard requests
>                                 associated this iclog,
>                                 xfs_log_force() will
>                                 return. As the discard
>                                 requests associated with
>                                 the first iclog are yet
>                                 to be completed,
>                                 we end up incorrectly
>                                 concluding that
>                                 all busy extents
>                                 have been processed.
> ----------------------------------------------------------
>
> The inconsistency indicated above could also occur when discard requests
> issued against second iclog get processed before discard requests associated
> with the first iclog.
>
> XFS_EXTENT_BUSY_IN_TRANS flag based solution is the only method that I can
> think of that can solve this problem correctly. However I do agree with your
> earlier observation that we should not flush busy extents unless we have
> checked for presence of free extents in the btree records present on the left
> side of the btree cursor.
>

Hi Chandan,

Thanks for your great work. Do you have any update on these patches?

We met the same issue on the 4.19 kernel, I am not sure if the work has already
been merged in the upstream kernel.