Re: [PATCH 2/9] xfs: AIL doesn't need manual pushing

"Darrick J. Wong" <djwong@xxxxxxxxxx> · Fri, 26 Aug 2022 08:46:26 -0700

On Tue, Aug 23, 2022 at 11:51:56AM +1000, Dave Chinner wrote:
> On Mon, Aug 22, 2022 at 10:08:04AM -0700, Darrick J. Wong wrote:
> > On Wed, Aug 10, 2022 at 09:03:46AM +1000, Dave Chinner wrote:
> > > From: Dave Chinner <dchinner@xxxxxxxxxx>
> > > 
> > > We have a mechanism that checks the amount of log space remaining
> > > available every time we make a transaction reservation. If the
> > > amount of space is below a threshold (25% free) we push on the AIL
> > > to tell it to do more work. To do this, we end up calculating the
> > > LSN that the AIL needs to push to on every reservation and updating
> > > the push target for the AIL with that new target LSN.
> > > 
> > > This is silly and expensive. The AIL is perfectly capable of
> > > calculating the push target itself, and it will always be running
> > > when the AIL contains objects.
> > > 
> > > Modify the AIL to calculate it's 25% push target before it starts a
> > > push using the same reserve grant head based calculation as is
> > > currently used, and remove all the places where we ask the AIL to
> > > push to a new 25% free target.
> .....
> > > @@ -414,6 +395,57 @@ xfsaild_push_item(
> > >  	return lip->li_ops->iop_push(lip, &ailp->ail_buf_list);
> > >  }
> > >  
> > > +/*
> > > + * Compute the LSN that we'd need to push the log tail towards in order to have
> > > + * at least 25% of the log space free.  If the log free space already meets this
> > > + * threshold, this function returns NULLCOMMITLSN.
> > > + */
> > > +xfs_lsn_t
> > > +__xfs_ail_push_target(
> > > +	struct xfs_ail		*ailp)
> > > +{
> > > +	struct xlog	*log = ailp->ail_log;
> > > +	xfs_lsn_t	threshold_lsn = 0;
> > > +	xfs_lsn_t	last_sync_lsn;
> > > +	int		free_blocks;
> > > +	int		free_bytes;
> > > +	int		threshold_block;
> > > +	int		threshold_cycle;
> > > +	int		free_threshold;
> > > +
> > > +	free_bytes = xlog_space_left(log, &log->l_reserve_head.grant);
> > > +	free_blocks = BTOBBT(free_bytes);
> > > +
> > > +	/*
> > > +	 * Set the threshold for the minimum number of free blocks in the
> > > +	 * log to the maximum of what the caller needs, one quarter of the
> > > +	 * log, and 256 blocks.
> > > +	 */
> > > +	free_threshold = log->l_logBBsize >> 2;
> > > +	if (free_blocks >= free_threshold)
> > 
> > What happened to the "free_threshold = max(free_threshold, 256);" from
> > the old code?  Or is the documented 256 block minimum no longer
> > necessary?
> 
> Oh, I must have dropped the comment change when fixing the last
> round of rebase conflicts. The minimum of 256 blocks is largely
> useless because the even the smallest logs we create on tiny
> filesystems are around 1000 filesystem blocks in size. So a minimum
> free threshold of 128kB (256 BBs) is always going to be less than
> one quarter the size of the journal....

<nod> And even more pointless now that we've effectively mandated 64M
logs for all new filesystems.

> 
> > > @@ -454,21 +486,24 @@ xfsaild_push(
> > >  	 * capture updates that occur after the sync push waiter has gone to
> > >  	 * sleep.
> > >  	 */
> > > -	if (waitqueue_active(&ailp->ail_empty)) {
> > > +	if (test_bit(XFS_AIL_OPSTATE_PUSH_ALL, &ailp->ail_opstate) ||
> > > +	    waitqueue_active(&ailp->ail_empty)) {
> > >  		lip = xfs_ail_max(ailp);
> > >  		if (lip)
> > >  			target = lip->li_lsn;
> > > +		else
> > > +			clear_bit(XFS_AIL_OPSTATE_PUSH_ALL, &ailp->ail_opstate);
> > >  	} else {
> > > -		/* barrier matches the ail_target update in xfs_ail_push() */
> > > -		smp_rmb();
> > > -		target = ailp->ail_target;
> > > -		ailp->ail_target_prev = target;
> > > +		target = __xfs_ail_push_target(ailp);
> > 
> > Hmm.  So now the AIL decides how far it ought to push itself: until 25%
> > of the log is free if nobody's watching, or all the way to the end if
> > there are xfs_ail_push_all_sync waiters or OPSTATE_PUSH_ALL is set
> > because someone needs grant space?
> 
> Kind of. What the target does is determine if the AIL needs to do
> any work before it goes back to sleep. If we haven't run out of
> reservation space or memory (or some other push all trigger), it
> will simply go back to sleep for a while if there is more than 25%
> of the journal space free without doing anything.
> 
> If there are items in the AIL at a lower LSN than the target, it
> will try to push up to the target or to the point of getting stuck
> before going back to sleep and trying again soon after.
> 
> If the OPSTATE_PUSH_ALL flag is set, it will keep updating the
> push target until the log is empty every time it loops. THis is
> slightly different behaviour to the existing "push all" code which
> selects a LSN to push towards and it doesn't try to push beyond that
> even if new items are inserted into the AIL after the push_all has
> been triggered.

<nod> Ok, that's what I thought I was seeing -- the target is now a
little more dynamic, which means a "push all" will be more aggressive,
with perhaps less latency spikes later.

> However, because push_all_sync() effectly waits until the AIL is
> empty (i.e. keep looping updating the push target until the AIL is
> empty), and async pushes never wait for it to complete, there is no
> practical difference between the current implementation and this
> one.
> 
> > So the xlog*grant* callers now merely wake up the AIL and let push
> > whatever it will, instead of telling the AIL how far to push itself?
> 
> Yes.
> 
> > Does that mean that those grant callers might have to wait until the AIL
> > empties itself?
> 
> No. The moment the log tail moves forward because of a removal from
> the tail of the AIL via xfs_ail_update_finish(), we call
> xlog_assign_tail_lsn_locked() to move the l_tail_lsn forwards and
> make grant space available, then we call xfs_log_space_wake() to
> wake up any grant waiters that are waiting on the space to be made
> available.

Aha!  There's the missing piece, thank you.

> The reason for using the "push all" when grant space runs out is
> that we can run out of grant space when there is more than 25% of
> the log free. Small logs are notorious for this, and we have a hack
> in the log callback code (xlog_state_set_callback()) where we push
> the AIL because the *head* moved) to ensure that we kick the AIL
> when we consume space in it because that can push us over the "less
> than 25% available" available that starts tail pushing back up
> again.

...and thank you for the reminder of why that was there, because I was
puzzling over what that (now removed) line of code was doing.

> Hence when we run out of grant space and are going to sleep, we have
> to consider that the grant space may be consuming almost all the log
> space and there is almost nothing in the AIL. In this situation, the
> AIL pins the tail and moving the tail forwards is the only way the
> grant space will come available, so we have to force the AIL to push
> everything to guarantee grant space will eventually be returned.
> Hence triggering a "push all" just before sleeping removes all the
> nasty corner cases we have in other parts of the code that work
> around the "we didn't ask the AIL to push enough to free grant
> space" condition that leads to log space hangs...

<nod> I'll resume reading now.

--D

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@xxxxxxxxxxxxx