Re: [PATCH 3/9] xfs: background AIL push targets physical space, not grant space

"Darrick J. Wong" <djwong@xxxxxxxxxx> · Fri, 26 Aug 2022 16:49:41 -0700

On Fri, Aug 26, 2022 at 08:47:35AM -0700, Darrick J. Wong wrote:
> On Tue, Aug 23, 2022 at 12:01:03PM +1000, Dave Chinner wrote:
> > On Mon, Aug 22, 2022 at 12:00:03PM -0700, Darrick J. Wong wrote:
> > > On Wed, Aug 10, 2022 at 09:03:47AM +1000, Dave Chinner wrote:
> > > > From: Dave Chinner <dchinner@xxxxxxxxxx>
> > > > 
> > > > Currently the AIL attempts to keep 25% of the "log space" free,
> > > > where the current used space is tracked by the reserve grant head.
> > > > That is, it tracks both physical space used plus the amount reserved
> > > > by transactions in progress.
> > > > 
> > > > When we start tail pushing, we are trying to make space for new
> > > > reservations by writing back older metadata and the log is generally
> > > > physically full of dirty metadata, and reservations for modifications
> > > > in flight take up whatever space the AIL can physically free up.
> > > > 
> > > > Hence we don't really need to take into account the reservation
> > > > space that has been used - we just need to keep the log tail moving
> > > > as fast as we can to free up space for more reservations to be made.
> > > > We know exactly how much physical space the journal is consuming in
> > > > the AIL (i.e. max LSN - min LSN) so we can base push thresholds
> > > > directly on this state rather than have to look at grant head
> > > > reservations to determine how much to physically push out of the
> > > > log.
> > > > 
> > > > Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
> > > 
> > > Makes sense, I think.  Though I was wondering about the last patch --
> > > pushing the AIL until it's empty when a trans_alloc can't find grant
> > > reservation could take a while on a slow storage.

Now that I've had a chance to see where we're going...
Reviewed-by: Darrick J. Wong <djwong@xxxxxxxxxx>

--D

> > 
> > The push in the grant reservation code is not a blocking push - it
> > just tells the AIL to start pushing everything, then it goes to
> > sleep waiting for the tail to move and space to come available. The
> > AIL behaviour is largely unchanged, especially if the application is
> > running under even slight memory pressure as the inode shrinker will
> > repeatedly kick the AIL push-all trigger regardless of consumed
> > journal/grant space.
> 
> Ok.
> 
> > > Does this mean that
> > > we're trading the incremental freeing-up of the existing code for
> > > potentially higher transaction allocation latency in the hopes that more
> > > threads can get reservation?  Or does the "keep the AIL going" bits make
> > > up for that?
> > 
> > So far I've typically measured slightly lower worst case latencies
> > with this mechanism that with the existing "repeatedly push to 25%
> > free" that we currently have. It's not really significant enough to
> > make statements about (unlike cpu usage reductions or perf
> > increases), but it does seem to be a bit better...
> 
> <nod>
> 
> --D
> 
> > Cheers,
> > 
> > Dave.
> > -- 
> > Dave Chinner
> > david@xxxxxxxxxxxxx