On 6/17/21 12:48, Chandan Babu R wrote: >>> >>> Just because we currently do a blocking flush doesn't mean we always >>> must do a blocking flush.... >> >> I will try to work out a solution. > > I believe the following should be taken into consideration to design an > "optimistic flush delay" based solution, > 1. Time consumed to perform a discard operation on a filesystem's block. > 2. The size of extents that are being discarded. > 3. Number of discard operation requests contained in a bio. > > AFAICT, The combinations resulting from the above make it impossible to > calculate a time delay during which sufficient number of busy extents are > guaranteed to have been freed so as to fill up the AGFL to the required > levels. In other words, sufficent number of busy extents may not have been > discarded even after the optimistic delay interval elapses. > > The other solution that I had thought about was to introduce a new flag for > the second argument of xfs_log_force(). The new flag will cause > xlog_state_do_iclog_callbacks() to wait on completion of all of the CIL ctxs > associated with the iclog that xfs_log_force() would be waiting on. Hence, a > call to xfs_log_force(mp, NEW_SYNC_FLAG) will return only after all the busy > extents associated with the iclog are discarded. > > However, this method is also flawed as described below. > > ---------------------------------------------------------- > Task A Task B > ---------------------------------------------------------- > Submit a filled up iclog > for write operation > (Assume that the iclog > has non-zero number of CIL > ctxs associated with it). > On completion of iclog write > operation, discard requests > for busy extents are issued. > > Write log records (including > commit record) into another > iclog. > > A task which is trying > to fill AGFL will now > invoke xfs_log_force() > with the new sync > flag. > Submit the 2nd iclog which > was partially filled by > Task A. > If there are no > discard requests > associated this iclog, > xfs_log_force() will > return. As the discard > requests associated with > the first iclog are yet > to be completed, > we end up incorrectly > concluding that > all busy extents > have been processed. > ---------------------------------------------------------- > > The inconsistency indicated above could also occur when discard requests > issued against second iclog get processed before discard requests associated > with the first iclog. > > XFS_EXTENT_BUSY_IN_TRANS flag based solution is the only method that I can > think of that can solve this problem correctly. However I do agree with your > earlier observation that we should not flush busy extents unless we have > checked for presence of free extents in the btree records present on the left > side of the btree cursor. > Hi Chandan, Thanks for your great work. Do you have any update on these patches? We met the same issue on the 4.19 kernel, I am not sure if the work has already been merged in the upstream kernel.