On Mon, Aug 22, 2022 at 10:08:04AM -0700, Darrick J. Wong wrote: > On Wed, Aug 10, 2022 at 09:03:46AM +1000, Dave Chinner wrote: > > From: Dave Chinner <dchinner@xxxxxxxxxx> > > > > We have a mechanism that checks the amount of log space remaining > > available every time we make a transaction reservation. If the > > amount of space is below a threshold (25% free) we push on the AIL > > to tell it to do more work. To do this, we end up calculating the > > LSN that the AIL needs to push to on every reservation and updating > > the push target for the AIL with that new target LSN. > > > > This is silly and expensive. The AIL is perfectly capable of > > calculating the push target itself, and it will always be running > > when the AIL contains objects. > > > > Modify the AIL to calculate it's 25% push target before it starts a > > push using the same reserve grant head based calculation as is > > currently used, and remove all the places where we ask the AIL to > > push to a new 25% free target. ..... > > @@ -414,6 +395,57 @@ xfsaild_push_item( > > return lip->li_ops->iop_push(lip, &ailp->ail_buf_list); > > } > > > > +/* > > + * Compute the LSN that we'd need to push the log tail towards in order to have > > + * at least 25% of the log space free. If the log free space already meets this > > + * threshold, this function returns NULLCOMMITLSN. > > + */ > > +xfs_lsn_t > > +__xfs_ail_push_target( > > + struct xfs_ail *ailp) > > +{ > > + struct xlog *log = ailp->ail_log; > > + xfs_lsn_t threshold_lsn = 0; > > + xfs_lsn_t last_sync_lsn; > > + int free_blocks; > > + int free_bytes; > > + int threshold_block; > > + int threshold_cycle; > > + int free_threshold; > > + > > + free_bytes = xlog_space_left(log, &log->l_reserve_head.grant); > > + free_blocks = BTOBBT(free_bytes); > > + > > + /* > > + * Set the threshold for the minimum number of free blocks in the > > + * log to the maximum of what the caller needs, one quarter of the > > + * log, and 256 blocks. > > + */ > > + free_threshold = log->l_logBBsize >> 2; > > + if (free_blocks >= free_threshold) > > What happened to the "free_threshold = max(free_threshold, 256);" from > the old code? Or is the documented 256 block minimum no longer > necessary? Oh, I must have dropped the comment change when fixing the last round of rebase conflicts. The minimum of 256 blocks is largely useless because the even the smallest logs we create on tiny filesystems are around 1000 filesystem blocks in size. So a minimum free threshold of 128kB (256 BBs) is always going to be less than one quarter the size of the journal.... > > @@ -454,21 +486,24 @@ xfsaild_push( > > * capture updates that occur after the sync push waiter has gone to > > * sleep. > > */ > > - if (waitqueue_active(&ailp->ail_empty)) { > > + if (test_bit(XFS_AIL_OPSTATE_PUSH_ALL, &ailp->ail_opstate) || > > + waitqueue_active(&ailp->ail_empty)) { > > lip = xfs_ail_max(ailp); > > if (lip) > > target = lip->li_lsn; > > + else > > + clear_bit(XFS_AIL_OPSTATE_PUSH_ALL, &ailp->ail_opstate); > > } else { > > - /* barrier matches the ail_target update in xfs_ail_push() */ > > - smp_rmb(); > > - target = ailp->ail_target; > > - ailp->ail_target_prev = target; > > + target = __xfs_ail_push_target(ailp); > > Hmm. So now the AIL decides how far it ought to push itself: until 25% > of the log is free if nobody's watching, or all the way to the end if > there are xfs_ail_push_all_sync waiters or OPSTATE_PUSH_ALL is set > because someone needs grant space? Kind of. What the target does is determine if the AIL needs to do any work before it goes back to sleep. If we haven't run out of reservation space or memory (or some other push all trigger), it will simply go back to sleep for a while if there is more than 25% of the journal space free without doing anything. If there are items in the AIL at a lower LSN than the target, it will try to push up to the target or to the point of getting stuck before going back to sleep and trying again soon after. If the OPSTATE_PUSH_ALL flag is set, it will keep updating the push target until the log is empty every time it loops. THis is slightly different behaviour to the existing "push all" code which selects a LSN to push towards and it doesn't try to push beyond that even if new items are inserted into the AIL after the push_all has been triggered. However, because push_all_sync() effectly waits until the AIL is empty (i.e. keep looping updating the push target until the AIL is empty), and async pushes never wait for it to complete, there is no practical difference between the current implementation and this one. > So the xlog*grant* callers now merely wake up the AIL and let push > whatever it will, instead of telling the AIL how far to push itself? Yes. > Does that mean that those grant callers might have to wait until the AIL > empties itself? No. The moment the log tail moves forward because of a removal from the tail of the AIL via xfs_ail_update_finish(), we call xlog_assign_tail_lsn_locked() to move the l_tail_lsn forwards and make grant space available, then we call xfs_log_space_wake() to wake up any grant waiters that are waiting on the space to be made available. The reason for using the "push all" when grant space runs out is that we can run out of grant space when there is more than 25% of the log free. Small logs are notorious for this, and we have a hack in the log callback code (xlog_state_set_callback()) where we push the AIL because the *head* moved) to ensure that we kick the AIL when we consume space in it because that can push us over the "less than 25% available" available that starts tail pushing back up again. Hence when we run out of grant space and are going to sleep, we have to consider that the grant space may be consuming almost all the log space and there is almost nothing in the AIL. In this situation, the AIL pins the tail and moving the tail forwards is the only way the grant space will come available, so we have to force the AIL to push everything to guarantee grant space will eventually be returned. Hence triggering a "push all" just before sleeping removes all the nasty corner cases we have in other parts of the code that work around the "we didn't ask the AIL to push enough to free grant space" condition that leads to log space hangs... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx