On Tue, Jan 10, 2023 at 08:24:41 PM +0800, Xiao Guangrong wrote: > On 6/17/21 12:48, Chandan Babu R wrote: > >>>> >>>> Just because we currently do a blocking flush doesn't mean we always >>>> must do a blocking flush.... >>> >>> I will try to work out a solution. >> >> I believe the following should be taken into consideration to design an >> "optimistic flush delay" based solution, >> 1. Time consumed to perform a discard operation on a filesystem's block. >> 2. The size of extents that are being discarded. >> 3. Number of discard operation requests contained in a bio. >> >> AFAICT, The combinations resulting from the above make it impossible to >> calculate a time delay during which sufficient number of busy extents are >> guaranteed to have been freed so as to fill up the AGFL to the required >> levels. In other words, sufficent number of busy extents may not have been >> discarded even after the optimistic delay interval elapses. >> >> The other solution that I had thought about was to introduce a new flag for >> the second argument of xfs_log_force(). The new flag will cause >> xlog_state_do_iclog_callbacks() to wait on completion of all of the CIL ctxs >> associated with the iclog that xfs_log_force() would be waiting on. Hence, a >> call to xfs_log_force(mp, NEW_SYNC_FLAG) will return only after all the busy >> extents associated with the iclog are discarded. >> >> However, this method is also flawed as described below. >> >> ---------------------------------------------------------- >> Task A Task B >> ---------------------------------------------------------- >> Submit a filled up iclog >> for write operation >> (Assume that the iclog >> has non-zero number of CIL >> ctxs associated with it). >> On completion of iclog write >> operation, discard requests >> for busy extents are issued. >> >> Write log records (including >> commit record) into another >> iclog. >> >> A task which is trying >> to fill AGFL will now >> invoke xfs_log_force() >> with the new sync >> flag. >> Submit the 2nd iclog which >> was partially filled by >> Task A. >> If there are no >> discard requests >> associated this iclog, >> xfs_log_force() will >> return. As the discard >> requests associated with >> the first iclog are yet >> to be completed, >> we end up incorrectly >> concluding that >> all busy extents >> have been processed. >> ---------------------------------------------------------- >> >> The inconsistency indicated above could also occur when discard requests >> issued against second iclog get processed before discard requests associated >> with the first iclog. >> >> XFS_EXTENT_BUSY_IN_TRANS flag based solution is the only method that I can >> think of that can solve this problem correctly. However I do agree with your >> earlier observation that we should not flush busy extents unless we have >> checked for presence of free extents in the btree records present on the left >> side of the btree cursor. >> > > Hi Chandan, > > Thanks for your great work. Do you have any update on these patches? > > We met the same issue on the 4.19 kernel, I am not sure if the work has already > been merged in the upstream kernel. Sorry, The machine on which the problem was created broke and I wasn't able to recreate this bug on my new work setup. Hence, I didn't pursue working on this bug. -- chandan