On Wed, Apr 27, 2022 at 03:47:57PM +1000, Dave Chinner wrote: > On Tue, Apr 26, 2022 at 08:32:52PM -0700, Darrick J. Wong wrote: > > On Wed, Apr 27, 2022 at 12:22:59PM +1000, Dave Chinner wrote: > > > From: Dave Chinner <dchinner@xxxxxxxxxx> > > > > > > When we log modifications based on intents, we add both intent > > > and intent done items to the modification being made. These get > > > written to the log to ensure that the operation is re-run if the > > > intent done is not found in the log. > > > > > > However, for operations that complete wholly within a single > > > checkpoint, the change in the checkpoint is atomic and will never > > > need replay. In this case, we don't need to actually write the > > > intent and intent done items to the journal because log recovery > > > will never need to manually restart this modification. > > > > > > Log recovery currently handles intent/intent done matching by > > > inserting the intent into the AIL, then removing it when a matching > > > intent done item is found. Hence for all the intent-based operations > > > that complete within a checkpoint, we spend all that time parsing > > > the intent/intent done items just to cancel them and do nothing with > > > them. > > > > > > Hence it follows that the only time we actually need intents in the > > > log is when the modification crosses checkpoint boundaries in the > > > log and so may only be partially complete in the journal. Hence if > > > we commit and intent done item to the CIL and the intent item is in > > > the same checkpoint, we don't actually have to write them to the > > > journal because log recovery will always cancel the intents. > > > > > > We've never really worried about the overhead of logging intents > > > unnecessarily like this because the intents we log are generally > > > very much smaller than the change being made. e.g. freeing an extent > > > involves modifying at lease two freespace btree blocks and the AGF, > > > so the EFI/EFD overhead is only a small increase in space and > > > processing time compared to the overall cost of freeing an extent. > > > > Question: If we whiteout enough intent items, does that make it possible > > to cram more updates into a checkpoint? > > Yes - we release the space the cancelled intent pair used from the > ctx->space_used counter that tracks the size of the CIL checkpoint. <nod> Good, that's what I was thinking. > > Are changes required to the existing intent item code to support > > whiteouts, or does the log code autodetect an *I/*D pair in the same > > checkpoint and elide them automatically? > > The log code automagically detects it. That's what the ->iop_intent > op is for - when a done intent committed, it looks up it's intent > pair via ->iop_intent and then checks if it is in the current > checkpoint via xlog_item_in_current_chkpt() and if that returns true > then we place a whiteout on the intent and release the space it > consumes. > > We don't cull the intent from the CIL until the context checkpoint > commits - we could remove it immediately, but then when the CIL > scalability code gets placed on top of this we can't remove log > items from the per-cpu CIL in transaction commit context and so we > have to use whiteouts to delay removal to the push context. So I > just implemented it that way to start with.... <nod> Sounds reasonable to me. > > (I might be fishing here for "Does this make generic/447 faster when it > > deletes the million extents?") > > I think it knocks it down a little - maybe 10%? One machine would > appear to drop 126->116s, another it is 52->48s. > > The intents are generally tiny compared to the rest of the changes > being (re-)logged, so I'm not expecting miracles for the > rmap/reflink code. The big wins come when intents contain large > chunks of data... I'll take a 10% win. :) > > > diff --git a/fs/xfs/xfs_bmap_item.c b/fs/xfs/xfs_bmap_item.c > > > index 59aa5f9bf769..670d074a71cc 100644 > > > --- a/fs/xfs/xfs_bmap_item.c > > > +++ b/fs/xfs/xfs_bmap_item.c > > > @@ -39,6 +39,7 @@ STATIC void > > > xfs_bui_item_free( > > > struct xfs_bui_log_item *buip) > > > { > > > + kmem_free(buip->bui_item.li_lv_shadow); > > > > Why is it necessary for log items to free their own shadow buffer? > > Twisty unpin passages... > > Intents with whiteouts on them were leaking them when they > were unpinned from the whiteout list in xlog_cil_push_work(). The > log vectors no longer get attached to the CIL context and freed > via xlog_cil_committed()->xlog_cil_free_logvec(), and so when they > are unpinned by xlog_cil_push_work() the last reference is released > and we have to free the log vector attached to the item as it is > still attached. > > The reason we can't do it directly from ->iop_unpin() is that we > also call ->iop_unpin from xlog_cil_committed()-> > xfs_trans_committed_bulk(), and if we are aborting there we do not > want to free the shadow buffer because it is still linked into the > lv chain attached to the CIL ctx and will get freed once > xfs_trans_committed_bulk() returns.... Huh. So now that I'm more awake, I noticed that you didn't patch xfs_extfree_item.c to free the shadow buffers because the xfs_ef[id]_item_free functions already have code to free the shadow buffer. git blame says that was added in: b1c5ebb21301 ("xfs: allocate log vector buffers outside CIL context lock") This commit was added in 4.8-rc1, and just prior to merging the rmap patches. When do log intent items get shadow buffers, since they should only be committed once? Looking at that old commit, I think what's going on is that we preallocate the shadow buffers for every log item at commit time to avoid a memory allocation when we have the ctx lock, so now it's necessary for all log intent items to free them? Does that mean RUI/CUI/BUI log items could have been leaking shadow buffers since the beginning, and we just haven't noticed because the CIL has freed them for us? Which means that the changes to xfs_*_item.c could, in theory, be a separate patch that fixes a theoretical memory leak? (I'm not asking for you to separate the changes; I'm checking my understanding of something that caused me to go "Eh???" on first reading.) > > > @@ -1393,7 +1463,11 @@ xlog_cil_commit( > > > /* lock out background commit */ > > > down_read(&cil->xc_ctx_lock); > > > > > > - xlog_cil_insert_items(log, tp); > > > + if (tp->t_flags & XFS_TRANS_HAS_INTENT_DONE) > > > + released_space = xlog_cil_process_intents(cil, tp); > > > + > > > + xlog_cil_insert_items(log, tp, released_space); > > > + tp->t_ticket->t_curr_res += released_space; > > > > I'm a little tired, so why isn't this adjustment a part of > > xlog_cil_insert_items? A similar adjustment is made to > > ctx->space_used to release the unused space back to the committing tx, > > right? > > Probably because it was a bug fix I added at some point and not > original code.... > > I'm not fussed where it ends up - I can move it if you want. Yes please, since xlog_cil_insert_items already adjusts tp->t_ticket->t_curr_res and it would seem to make more sense to keep all those adjustments together. --D > > Cheers, > > Dave. > -- > Dave Chinner > david@xxxxxxxxxxxxx