On Mon, Apr 03, 2023 at 10:55:57AM -0700, Darrick J. Wong wrote: > Hi folks, > > The scrub-strengthen-rmap-checking branch of my xfs-linux repository at: > > git://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git > > has just been updated for your review. > > This code snapshot has been rebased against recent upstream, freshly > QA'd, and is ready for people to examine. For veteran readers, the new > snapshot can be diffed against the previous snapshot; and for new > readers, this is a reasonable place to begin reading. For the best > experience, it is recommended to pull this branch and walk the commits > instead of trying to read any patch deluge. > > Here's the rebase against 6.3-rc5 of the online fsck design > documentation and all pending scrub fixes. I've fixed most of the > low-hanging fruit that Dave commented on in #xfs. > > (This isn't a true deluge, since I'm only posting this notice, not the > entire patchset.) Notes as I go through the code (I'm ignoring the Documentation/ stuff as Allison has been going through that with a fine toothed comb). I'm simply reading the output of: $ git reset --hard v6.3-rc5 $ git merge djwong/scrub-strengthen-rmap-checking $ git diff v6.3-rc5.. fs/xfs .... and commenting as I see things, also leaning on the notes I had from the first, much slower, pass I did through this code over the past 3 weeks. --- +xfs-$(CONFIG_XFS_DRAIN_INTENTS) += xfs_drain.o "drain" is kinda generic. That shows up like this: + xfs_drain_init(&pag->pag_intents); Where it's clear that it's an intent drain. More generically, I think this is a deferred work drain which just happens to be a set of linked intents. We already have the xfs_defer_* namespace for managing deferred work chains, so perhaps this API should be named xfs_defer_drain* to indicate what subsystem it interacts with. --- xfs_perag_drain_intents -> xfs_perag_intent_drain() xfs_perag_intents_busy -> xfs_perag_intent_busy() to keep namespace consistent with _intent_hold/rele() interface. --- xfs_extent_free_get_group(), xfs_extent_free_put_group(). These are actually grabbing/dropping the perag and bumping the deferred work drain count held in the pag->pag_intents. I'd much prefer a common interface that looks like: xfs_perag_intent_get(mp, agno) { pag = xfs_perag_get(mp, agno); xfs_perag_intent_hold(pag); return pag; } and then we have: xefi->xefi_pag = xfs_perag_intent_get(mp, XFS_FSB_TO_AGNO(mp, xefi->xefi_startblock); and: xfs_perag_intent_put(xefi->xefi_pag); instead of the type specific wrappers that all do the same thing. IOWs, we have an API where: xfs_perag_intent_get() xfs_perag_intent_put() Are used to get a perag reference with an intent drain hold, and: xfs_perag_intent_hold() xfs_perag_intent_rele() Are used to bump/drop the intent drain count when the intent already has a perag reference. [ Repeat these comments for all intent _get/put_group() functions. ] --- General observation. As a cleanup - not for this patchset - we should pass the perag to tracepoints, not the agno vi pag->pag_agno. --- General observation. I like how the perag is migrating outward from the core alloc code into the callers (e.g. __xfs_free_extent()) as callers start to track perag references themselves. I also like ho9w callers are now asking for operations on {perag, agbno, len} tuples instead of {fsbno, len} tuples. --- xfs_alloc_complain_bad_rec(), xfs_bmap_complain_bad_rec(), et al. Should these also dump a trace event recording the bad record? --- General observation. ->diff_two_keys() method. Should we rename that ->keycmp()? --- xfs_rmap_count_owners_helper(): First delta calc has a cast to (int64_t), second one doesn't. Don't both need the same cast (or lack of cast)? roc->results->nono_matches++; Clever name, but it's not a "no no" match (as in a match that should never happen) - that's what "badno_matches" are. It's a non-owner match, so I think the variable names should be "nonowner_matches" and "bad_nonowner_matches" to avoid confusion when I next read the code.... --- xbitmap conversion to use interval trees. The logic looks ok and the way the callers use it look fine, but I'm not going to find off-by-one bugs in it just by reading the code. It's definitely an improvement on the previous code, though. --- xchk_perag_lock() isn't really locking the perag. It's locking the AGF+AGI and ensuring there are no deferred operation chains in progress in the AG. xchk_ag_drain_and_lock()? --- xchk_iget_agi() is a bit nasty. Lets chase that through. read/lock AGI xfs_iget(tp, NORETRY|UNTRUSTED) cache hit can't get inode because, say, NEEDGC return -EAGAIN cache miss xfs_imap xfs_imap_lookup() (because UNTRUSTED) read AGI works only because AGI is locked in this transaction. Ok, so the -EAGAIN return is only going to come through cache hit path, the cache miss path is not going to deadlock even though it needs the AGI we already have locked. Incore lookup will fail with EAGAIN if inactivation on the inode is required, and that may remove the inode from the unlinked list and hence need the AGI. Hence xchk_iget_agi() needs to drop the AGI lock on EAGAIN to allow inodegc to make progress and move the inode to IRECLAIMABLE state where xfs_iget will be able to return it again if it can lock the inode. OK. Nasty, but looks like it is deadlock free. The comment above the function needs to document this special case handling for xfs_iget() returning -EAGAIN and the dependence on having a valid transaction context for the cache miss case to avoid deadlocking in xfs_imap_lookup() on the AGI lock. --- Looks like a lot of commonality between xchk_setup_inode() and xchk_iget_for_scrubbing() - same xfs-iget/xfs_iget_agi/xfs_imap checking logic - so maybe scope for a common helper function there? --- scrub/readdir.c This looks to be a reimplementation of the normal readdir code just with a different callback to process the names that are found. I'd prefer not to have two copies of the readdir implementation in the code base - is it possible to use the filldir context and the existing readdir code to perform the scrub readdir walk function? Maybe that is for a later patchset.... --- That's all I've noticed reading through the code - I'm not going to find off-by-one errors in the logic reading through a diff of this size: 73 files changed, 4976 insertions(+), 1794 deletions(-) The code changes largely make sense, the impact on the code outside fs/xfs/scrub appears to be largely minimal and there are some good cleanups and improvements in the code. I'll start running this through my limited QA resources right now (hardware failure has taken out my large test machine) but if it's anything like the previous version, I don't expect to find any regressions. I'll probably start on the second set of commits tomorrow.... Cheers, -Dave. -- Dave Chinner david@xxxxxxxxxxxxx