Re: [PATCH 2/2] xfs: Prevent deadlock when allocating blocks for AGFL

Xiao Guangrong <xiaoguangrong.eric@xxxxxxxxx> · Wed, 11 Jan 2023 11:14:11 +0800

Okay :)

I am going to reproduce it, and will return to this thread if I get something.

Thanks!

On Tue, Jan 10, 2023 at 8:52 PM Chandan Babu R <chandan.babu@xxxxxxxxxx> wrote:
>
> On Tue, Jan 10, 2023 at 08:24:41 PM +0800, Xiao Guangrong wrote:
> > On 6/17/21 12:48, Chandan Babu R wrote:
> >
> >>>>
> >>>> Just because we currently do a blocking flush doesn't mean we always
> >>>> must do a blocking flush....
> >>>
> >>> I will try to work out a solution.
> >>
> >> I believe the following should be taken into consideration to design an
> >> "optimistic flush delay" based solution,
> >> 1. Time consumed to perform a discard operation on a filesystem's block.
> >> 2. The size of extents that are being discarded.
> >> 3. Number of discard operation requests contained in a bio.
> >>
> >> AFAICT, The combinations resulting from the above make it impossible to
> >> calculate a time delay during which sufficient number of busy extents are
> >> guaranteed to have been freed so as to fill up the AGFL to the required
> >> levels. In other words, sufficent number of busy extents may not have been
> >> discarded even after the optimistic delay interval elapses.
> >>
> >> The other solution that I had thought about was to introduce a new flag for
> >> the second argument of xfs_log_force(). The new flag will cause
> >> xlog_state_do_iclog_callbacks() to wait on completion of all of the CIL ctxs
> >> associated with the iclog that xfs_log_force() would be waiting on. Hence, a
> >> call to xfs_log_force(mp, NEW_SYNC_FLAG) will return only after all the busy
> >> extents associated with the iclog are discarded.
> >>
> >> However, this method is also flawed as described below.
> >>
> >> ----------------------------------------------------------
> >>   Task A                        Task B
> >> ----------------------------------------------------------
> >>   Submit a filled up iclog
> >>   for write operation
> >>   (Assume that the iclog
> >>   has non-zero number of CIL
> >>   ctxs associated with it).
> >>   On completion of iclog write
> >>   operation, discard requests
> >>   for busy extents are issued.
> >>
> >>   Write log records (including
> >>   commit record) into another
> >>   iclog.
> >>
> >>                                 A task which is trying
> >>                                 to fill AGFL will now
> >>                                 invoke xfs_log_force()
> >>                                 with the new sync
> >>                                 flag.
> >>                                 Submit the 2nd iclog which
> >>                                 was partially filled by
> >>                                 Task A.
> >>                                 If there are no
> >>                                 discard requests
> >>                                 associated this iclog,
> >>                                 xfs_log_force() will
> >>                                 return. As the discard
> >>                                 requests associated with
> >>                                 the first iclog are yet
> >>                                 to be completed,
> >>                                 we end up incorrectly
> >>                                 concluding that
> >>                                 all busy extents
> >>                                 have been processed.
> >> ----------------------------------------------------------
> >>
> >> The inconsistency indicated above could also occur when discard requests
> >> issued against second iclog get processed before discard requests associated
> >> with the first iclog.
> >>
> >> XFS_EXTENT_BUSY_IN_TRANS flag based solution is the only method that I can
> >> think of that can solve this problem correctly. However I do agree with your
> >> earlier observation that we should not flush busy extents unless we have
> >> checked for presence of free extents in the btree records present on the left
> >> side of the btree cursor.
> >>
> >
> > Hi Chandan,
> >
> > Thanks for your great work. Do you have any update on these patches?
> >
> > We met the same issue on the 4.19 kernel, I am not sure if the work has already
> > been merged in the upstream kernel.
>
> Sorry, The machine on which the problem was created broke and I wasn't able to
> recreate this bug on my new work setup. Hence, I didn't pursue working on this
> bug.
>
> --
> chandan