Okay :) I am going to reproduce it, and will return to this thread if I get something. Thanks! On Tue, Jan 10, 2023 at 8:52 PM Chandan Babu R <chandan.babu@xxxxxxxxxx> wrote: > > On Tue, Jan 10, 2023 at 08:24:41 PM +0800, Xiao Guangrong wrote: > > On 6/17/21 12:48, Chandan Babu R wrote: > > > >>>> > >>>> Just because we currently do a blocking flush doesn't mean we always > >>>> must do a blocking flush.... > >>> > >>> I will try to work out a solution. > >> > >> I believe the following should be taken into consideration to design an > >> "optimistic flush delay" based solution, > >> 1. Time consumed to perform a discard operation on a filesystem's block. > >> 2. The size of extents that are being discarded. > >> 3. Number of discard operation requests contained in a bio. > >> > >> AFAICT, The combinations resulting from the above make it impossible to > >> calculate a time delay during which sufficient number of busy extents are > >> guaranteed to have been freed so as to fill up the AGFL to the required > >> levels. In other words, sufficent number of busy extents may not have been > >> discarded even after the optimistic delay interval elapses. > >> > >> The other solution that I had thought about was to introduce a new flag for > >> the second argument of xfs_log_force(). The new flag will cause > >> xlog_state_do_iclog_callbacks() to wait on completion of all of the CIL ctxs > >> associated with the iclog that xfs_log_force() would be waiting on. Hence, a > >> call to xfs_log_force(mp, NEW_SYNC_FLAG) will return only after all the busy > >> extents associated with the iclog are discarded. > >> > >> However, this method is also flawed as described below. > >> > >> ---------------------------------------------------------- > >> Task A Task B > >> ---------------------------------------------------------- > >> Submit a filled up iclog > >> for write operation > >> (Assume that the iclog > >> has non-zero number of CIL > >> ctxs associated with it). > >> On completion of iclog write > >> operation, discard requests > >> for busy extents are issued. > >> > >> Write log records (including > >> commit record) into another > >> iclog. > >> > >> A task which is trying > >> to fill AGFL will now > >> invoke xfs_log_force() > >> with the new sync > >> flag. > >> Submit the 2nd iclog which > >> was partially filled by > >> Task A. > >> If there are no > >> discard requests > >> associated this iclog, > >> xfs_log_force() will > >> return. As the discard > >> requests associated with > >> the first iclog are yet > >> to be completed, > >> we end up incorrectly > >> concluding that > >> all busy extents > >> have been processed. > >> ---------------------------------------------------------- > >> > >> The inconsistency indicated above could also occur when discard requests > >> issued against second iclog get processed before discard requests associated > >> with the first iclog. > >> > >> XFS_EXTENT_BUSY_IN_TRANS flag based solution is the only method that I can > >> think of that can solve this problem correctly. However I do agree with your > >> earlier observation that we should not flush busy extents unless we have > >> checked for presence of free extents in the btree records present on the left > >> side of the btree cursor. > >> > > > > Hi Chandan, > > > > Thanks for your great work. Do you have any update on these patches? > > > > We met the same issue on the 4.19 kernel, I am not sure if the work has already > > been merged in the upstream kernel. > > Sorry, The machine on which the problem was created broke and I wasn't able to > recreate this bug on my new work setup. Hence, I didn't pursue working on this > bug. > > -- > chandan