On Wed, May 17, 2023 at 07:14:32PM +0000, Wengang Wang wrote: > > On May 16, 2023, at 6:40 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote: > > On Tue, May 16, 2023 at 05:59:13PM -0700, Darrick J. Wong wrote: > > I was thinking this code changes to: > > > > flags |= XFS_ALLOC_FLAG_TRY_FLUSH; > > .... > > <attempt allocation> > > .... > > if (busy) { > > xfs_btree_del_cursor(cnt_cur, XFS_BTREE_NOERROR); > > trace_xfs_alloc_size_busy(args); > > error = xfs_extent_busy_flush(args->tp, args->pag, > > busy_gen, flags); > > if (!error) { > > flags &= ~XFS_ALLOC_FLAG_TRY_FLUSH; > > What’s the benefits to use XFS_ALLOC_FLAG_TRY_FLUSH? > If no change happened to pagb_gen, we would get nothing good in the retry > but waste cycles. Or I missed something? You missed something: the synchronous log force is always done. The log force is what allows busy extents to be resolved - busy extents have to be committed to stable storage before they can be removed from the busy extent tree. If online discards are not enabled, busy extents are resolved directly in journal IO completion - the log force waits for this to occur. In this case, pag->pagb_gen will have already incremented to indicate progress has been made, and we should never wait in the loop after the log force. The only time we do that is when the current transaction holds busy extents itself, and hence if the current tx holds busy extents we should not wait beyond the log force.... If online discards are enabled, then they'll be scheduled by journal IO completion. i.e. waiting on the log force guarntees pending discards have been scheduled and they'll start completing soon after the log force returns. When they complete they'll start incrementing pag->pagb_gen. This is the case the pag->pagb_gen wait loop exists for - it waits for a discard to complete and resolve the busy extent in it's IO compeltion routine. At which point the allocation attempt can restart. However, the same caveat about the current tx holding busy extents still exists - we can't tell the difference between "discards scheduled but not completed" and "no busy extents to resolve" in the flush code. Hence regardless of the online discard feature state, we should not be waiting on busy extent generation changes if we hold busy extents in the transaction.... IOWs, the TRY_FLUSH code reflects the fact that for most users, the log force resolves the busy extents, not the wait loop on pag->pagb_gen changing. The wait loop only really kicks in when online discard is active, and in that case we really do want to retry allocation without waiting for (potentially very slow) discards to complete first. We'll do that "wait for discards" the second time we fail to find a non-busy extent.... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx