Re: [ANNOUNCE 1/2] xfs-linux: scrub-strengthen-rmap-checking updated to 64e6494e1175

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Apr 03, 2023 at 10:55:57AM -0700, Darrick J. Wong wrote:
> Hi folks,
> 
> The scrub-strengthen-rmap-checking branch of my xfs-linux repository at:
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git
> 
> has just been updated for your review.
> 
> This code snapshot has been rebased against recent upstream, freshly
> QA'd, and is ready for people to examine.  For veteran readers, the new
> snapshot can be diffed against the previous snapshot; and for new
> readers, this is a reasonable place to begin reading.  For the best
> experience, it is recommended to pull this branch and walk the commits
> instead of trying to read any patch deluge.
> 
> Here's the rebase against 6.3-rc5 of the online fsck design
> documentation and all pending scrub fixes.  I've fixed most of the
> low-hanging fruit that Dave commented on in #xfs.
> 
> (This isn't a true deluge, since I'm only posting this notice, not the
> entire patchset.)

Notes as I go through the code (I'm ignoring the Documentation/
stuff as Allison has been going through that with a fine toothed
comb). I'm simply reading the output of:

$ git reset --hard v6.3-rc5
$ git merge djwong/scrub-strengthen-rmap-checking
$ git diff v6.3-rc5.. fs/xfs
....

and commenting as I see things, also leaning on the notes I had from
the first, much slower, pass I did through this code over the past 3
weeks.

---

+xfs-$(CONFIG_XFS_DRAIN_INTENTS)        += xfs_drain.o

"drain" is kinda generic. That shows up like this:

+               xfs_drain_init(&pag->pag_intents);

Where it's clear that it's an intent drain. More generically, I think
this is a deferred work drain which just happens to be a set of
linked intents.

We already have the xfs_defer_* namespace for managing deferred
work chains, so perhaps this API should be named xfs_defer_drain*
to indicate what subsystem it interacts with.

---

xfs_perag_drain_intents -> xfs_perag_intent_drain()
xfs_perag_intents_busy -> xfs_perag_intent_busy()

to keep namespace consistent with _intent_hold/rele() interface.

---

xfs_extent_free_get_group(), xfs_extent_free_put_group().

These are actually grabbing/dropping the perag and bumping the
deferred work drain count held in the pag->pag_intents. I'd much
prefer a common interface that looks like:

xfs_perag_intent_get(mp, agno)
{
	pag = xfs_perag_get(mp, agno);
	xfs_perag_intent_hold(pag);
	return pag;
}

and then we have:

	xefi->xefi_pag = xfs_perag_intent_get(mp,
				XFS_FSB_TO_AGNO(mp, xefi->xefi_startblock);

and:

	xfs_perag_intent_put(xefi->xefi_pag);

instead of the type specific wrappers that all do the same thing.

IOWs, we have an API where:

xfs_perag_intent_get()
xfs_perag_intent_put()

Are used to get a perag reference with an intent drain hold, and:

xfs_perag_intent_hold()
xfs_perag_intent_rele()

Are used to bump/drop the intent drain count when the intent already
has a perag reference.

[ Repeat these comments for all intent _get/put_group() functions. ]

---

General observation.

As a cleanup - not for this patchset - we should pass the perag to
tracepoints, not the agno vi pag->pag_agno.

---

General observation.

I like how the perag is migrating outward from the core alloc code
into the callers (e.g. __xfs_free_extent()) as callers start to
track perag references themselves. I also like ho9w callers are now
asking for operations on {perag, agbno, len} tuples instead of
{fsbno, len} tuples.

---

xfs_alloc_complain_bad_rec(), xfs_bmap_complain_bad_rec(), et al.

Should these also dump a trace event recording the bad record?

---

General observation.

->diff_two_keys() method. Should we rename that ->keycmp()?

---

xfs_rmap_count_owners_helper():

First delta calc has a cast to (int64_t), second one doesn't. Don't
both need the same cast (or lack of cast)?

	roc->results->nono_matches++;

Clever name, but it's not a "no no" match (as in a match that should
never happen) - that's what "badno_matches" are.

It's a non-owner match, so I think the variable names should be
"nonowner_matches" and "bad_nonowner_matches" to avoid confusion
when I next read the code....

---

xbitmap conversion to use interval trees. The logic looks ok and the
way the callers use it look fine, but I'm not going to find
off-by-one bugs in it just by reading the code. It's definitely an
improvement on the previous code, though.

---

xchk_perag_lock() isn't really locking the perag. It's locking the
AGF+AGI and ensuring there are no deferred operation chains in
progress in the AG. xchk_ag_drain_and_lock()?

---

xchk_iget_agi() is a bit nasty. Lets chase that through.

read/lock AGI
xfs_iget(tp, NORETRY|UNTRUSTED)
  cache hit
    can't get inode because, say, NEEDGC
    return -EAGAIN
  cache miss
    xfs_imap
      xfs_imap_lookup() (because UNTRUSTED)
        read AGI
	  works only because AGI is locked in this transaction.

Ok, so the -EAGAIN return is only going to come through cache hit
path, the cache miss path is not going to deadlock even though it
needs the AGI we already have locked. Incore lookup will fail with
EAGAIN if inactivation on the inode is required, and that may remove
the inode from the unlinked list and hence need the AGI. Hence
xchk_iget_agi() needs to drop the AGI lock on EAGAIN to allow
inodegc to make progress and move the inode to IRECLAIMABLE state
where xfs_iget will be able to return it again if it can lock the
inode.

OK. Nasty, but looks like it is deadlock free. The comment above the
function needs to document this special case handling for xfs_iget()
returning -EAGAIN and the dependence on having a valid transaction
context for the cache miss case to avoid deadlocking in
xfs_imap_lookup() on the AGI lock.

---

Looks like a lot of commonality between xchk_setup_inode() and
xchk_iget_for_scrubbing() - same xfs-iget/xfs_iget_agi/xfs_imap
checking logic - so maybe scope for a common helper function there?

---

scrub/readdir.c

This looks to be a reimplementation of the normal readdir code just
with a different callback to process the names that are found. I'd
prefer not to have two copies of the readdir implementation in the
code base - is it possible to use the filldir context and the
existing readdir code to perform the scrub readdir walk function?

Maybe that is for a later patchset....

---

That's all I've noticed reading through the code - I'm not going to
find off-by-one errors in the logic reading through a diff of this
size:

73 files changed, 4976 insertions(+), 1794 deletions(-)

The code changes largely make sense, the impact on the code outside
fs/xfs/scrub appears to be largely minimal and there are some good
cleanups and improvements in the code. I'll start running this
through my limited QA resources right now (hardware failure has
taken out my large test machine) but if it's anything like the
previous version, I don't expect to find any regressions.

I'll probably start on the second set of commits tomorrow....

Cheers,

-Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx



[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux