Re: [PATCH 8/8] xfs: intent item whiteouts

Dave Chinner <david@xxxxxxxxxxxxx> · Wed, 27 Apr 2022 15:47:57 +1000

On Tue, Apr 26, 2022 at 08:32:52PM -0700, Darrick J. Wong wrote:
> On Wed, Apr 27, 2022 at 12:22:59PM +1000, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@xxxxxxxxxx>
> > 
> > When we log modifications based on intents, we add both intent
> > and intent done items to the modification being made. These get
> > written to the log to ensure that the operation is re-run if the
> > intent done is not found in the log.
> > 
> > However, for operations that complete wholly within a single
> > checkpoint, the change in the checkpoint is atomic and will never
> > need replay. In this case, we don't need to actually write the
> > intent and intent done items to the journal because log recovery
> > will never need to manually restart this modification.
> > 
> > Log recovery currently handles intent/intent done matching by
> > inserting the intent into the AIL, then removing it when a matching
> > intent done item is found. Hence for all the intent-based operations
> > that complete within a checkpoint, we spend all that time parsing
> > the intent/intent done items just to cancel them and do nothing with
> > them.
> > 
> > Hence it follows that the only time we actually need intents in the
> > log is when the modification crosses checkpoint boundaries in the
> > log and so may only be partially complete in the journal. Hence if
> > we commit and intent done item to the CIL and the intent item is in
> > the same checkpoint, we don't actually have to write them to the
> > journal because log recovery will always cancel the intents.
> > 
> > We've never really worried about the overhead of logging intents
> > unnecessarily like this because the intents we log are generally
> > very much smaller than the change being made. e.g. freeing an extent
> > involves modifying at lease two freespace btree blocks and the AGF,
> > so the EFI/EFD overhead is only a small increase in space and
> > processing time compared to the overall cost of freeing an extent.
> 
> Question: If we whiteout enough intent items, does that make it possible
> to cram more updates into a checkpoint?

Yes - we release the space the cancelled intent pair used from the
ctx->space_used counter that tracks the size of the CIL checkpoint.

> Are changes required to the existing intent item code to support
> whiteouts, or does the log code autodetect an *I/*D pair in the same
> checkpoint and elide them automatically?

The log code automagically detects it. That's what the ->iop_intent
op is for - when a done intent committed, it looks up it's intent
pair via ->iop_intent and then checks if it is in the current
checkpoint via xlog_item_in_current_chkpt() and if that returns true
then we place a whiteout on the intent and release the space it
consumes.

We don't cull the intent from the CIL until the context checkpoint
commits - we could remove it immediately, but then when the CIL
scalability code gets placed on top of this we can't remove log
items from the per-cpu CIL in transaction commit context and so we
have to use whiteouts to delay removal to the push context. So I
just implemented it that way to start with....

> (I might be fishing here for "Does this make generic/447 faster when it
> deletes the million extents?")

I think it knocks it down a little - maybe 10%? One machine would
appear to drop 126->116s, another it is 52->48s.

The intents are generally tiny compared to the rest of the changes
being (re-)logged, so I'm not expecting miracles for the
rmap/reflink code. The big wins come when intents contain large
chunks of data...

> > diff --git a/fs/xfs/xfs_bmap_item.c b/fs/xfs/xfs_bmap_item.c
> > index 59aa5f9bf769..670d074a71cc 100644
> > --- a/fs/xfs/xfs_bmap_item.c
> > +++ b/fs/xfs/xfs_bmap_item.c
> > @@ -39,6 +39,7 @@ STATIC void
> >  xfs_bui_item_free(
> >  	struct xfs_bui_log_item	*buip)
> >  {
> > +	kmem_free(buip->bui_item.li_lv_shadow);
> 
> Why is it necessary for log items to free their own shadow buffer?

Twisty unpin passages...

Intents with whiteouts on them were leaking them when they
were unpinned from the whiteout list in xlog_cil_push_work(). The
log vectors no longer get attached to the CIL context and freed
via xlog_cil_committed()->xlog_cil_free_logvec(), and so when they
are unpinned by xlog_cil_push_work() the last reference is released
and we have to free the log vector attached to the item as it is
still attached.

The reason we can't do it directly from ->iop_unpin() is that we
also call ->iop_unpin from xlog_cil_committed()->
xfs_trans_committed_bulk(), and if we are aborting there we do not
want to free the shadow buffer because it is still linked into the
lv chain attached to the CIL ctx and will get freed once
xfs_trans_committed_bulk() returns....

> > @@ -1393,7 +1463,11 @@ xlog_cil_commit(
> >  	/* lock out background commit */
> >  	down_read(&cil->xc_ctx_lock);
> >  
> > -	xlog_cil_insert_items(log, tp);
> > +	if (tp->t_flags & XFS_TRANS_HAS_INTENT_DONE)
> > +		released_space = xlog_cil_process_intents(cil, tp);
> > +
> > +	xlog_cil_insert_items(log, tp, released_space);
> > +	tp->t_ticket->t_curr_res += released_space;
> 
> I'm a little tired, so why isn't this adjustment a part of
> xlog_cil_insert_items?  A similar adjustment is made to
> ctx->space_used to release the unused space back to the committing tx,
> right?

Probably because it was a bug fix I added at some point and not
original code....

I'm not fussed where it ends up - I can move it if you want.

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx