Re: [PATCH 6/7] xfs: don't update file system geometry through transaction deltas

"Darrick J. Wong" <djwong@xxxxxxxxxx> · Fri, 11 Oct 2024 16:12:41 -0700

On Fri, Oct 11, 2024 at 02:41:17PM -0400, Brian Foster wrote:
> On Fri, Oct 11, 2024 at 10:13:03AM -0700, Darrick J. Wong wrote:
> > On Fri, Oct 11, 2024 at 10:02:16AM -0400, Brian Foster wrote:
> > > On Fri, Oct 11, 2024 at 09:57:09AM +0200, Christoph Hellwig wrote:
> > > > On Thu, Oct 10, 2024 at 10:05:53AM -0400, Brian Foster wrote:
> > > > > Ok, so we don't want geometry changes transactions in the same CIL
> > > > > checkpoint as alloc related transactions that might depend on the
> > > > > geometry changes. That seems reasonable and on a first pass I have an
> > > > > idea of what this is doing, but the description is kind of vague.
> > > > > Obviously this fixes an issue on the recovery side (since I've tested
> > > > > it), but it's not quite clear to me from the description and/or logic
> > > > > changes how that issue manifests.
> > > > > 
> > > > > Could you elaborate please? For example, is this some kind of race
> > > > > situation between an allocation request and a growfs transaction, where
> > > > > the former perhaps sees a newly added AG between the time the growfs
> > > > > transaction commits (applying the sb deltas) and it actually commits to
> > > > > the log due to being a sync transaction, thus allowing an alloc on a new
> > > > > AG into the same checkpoint that adds the AG?
> > > > 
> > > > This is based on the feedback by Dave on the previous version:
> > > > 
> > > > https://lore.kernel.org/linux-xfs/Zut51Ftv%2F46Oj386@xxxxxxxxxxxxxxxxxxx/
> > > > 
> > > 
> > > Ah, Ok. That all seems reasonably sane to me on a first pass, but I'm
> > > not sure I'd go straight to this change given the situation...
> > > 
> > > > Just doing the perag/in-core sb updates earlier fixes all the issues
> > > > with my test case, so I'm not actually sure how to get more updates
> > > > into the check checkpoint.  I'll try your exercisers if it could hit
> > > > that.
> > > > 
> > > 
> > > Ok, that explains things a bit. My observation is that the first 5
> > > patches or so address the mount failure problem, but from there I'm not
> > > reproducing much difference with or without the final patch.
> > 
> > Does this change to flush the log after committing the new sb fix the
> > recovery problems on older kernels?  I /think/ that's the point of this
> > patch.
> > 
> 
> I don't follow.. growfs always forced the log via the sync transaction,
> right? Or do you mean something else by "change to flush the log?"

I guess I was typing a bit too fast this morning -- "change to flush the
log to disk before anyone else can get their hands on the superblock".
You're right that xfs_log_sb and data-device growfs already do that.

That said, growfsrt **doesn't** call xfs_trans_set_sync, so that's a bug
that this patch fixes, right?

> I thought the main functional change of this patch was to hold the
> superblock buffer locked across the force so nothing else can relog the
> new geometry superblock buffer in the same cil checkpoint. Presumably,
> the theory is that prevents recovery from seeing updates to different
> buffers that depend on the geometry update before the actual sb geometry
> update is recovered (because the latter might have been relogged).
> 
> Maybe we're saying the same thing..? Or maybe I just misunderstand.
> Either way I think patch could use a more detailed commit log...

<nod> The commit message should point out that we're fixing a real bug
here, which is that growfsrt doesn't force the log to disk when it
commits the new rt geometry.

> > >                                                              Either way,
> > > I see aborts and splats all over the place, which implies at minimum
> > > this isn't the only issue here.
> > 
> > Ugh.  I've recently noticed the long soak logrecovery test vm have seen
> > a slight tick up in failure rates -- random blocks that have clearly had
> > garbage written to them such that recovery tries to read the block to
> > recover a buffer log item and kaboom.  At this point it's unclear if
> > that's a problem with xfs or somewhere else. :(
> > 
> > > So given that 1. growfs recovery seems pretty much broken, 2. this
> > > particular patch has no straightforward way to test that it fixes
> > > something and at the same time doesn't break anything else, and 3. we do
> > 
> > I'm curious, what might break?  Was that merely a general comment, or do
> > you have something specific in mind?  (iows: do you see more string to
> > pull? :))
> > 
> 
> Just a general comment..
> 
> Something related that isn't totally clear to me is what about the
> inverse shrink situation where dblocks is reduced. I.e., is there some
> similar scenario where for example instead of the sb buffer being
> relogged past some other buffer update that depends on it, some other
> change is relogged past a sb update that invalidates/removes blocks
> referenced by the relogged buffer..? If so, does that imply a shrink
> should flush the log before the shrink transaction commits to ensure it
> lands in a new checkpoint (as opposed to ensuring followon updates land
> in a new checkpoint)..?

I think so.  Might we want to do that before and after to be careful?

> Anyways, my point is just that if it were me I wouldn't get too deep
> into this until some of the reproducible growfs recovery issues are at
> least characterized and testing is more sorted out.
> 
> The context for testing is here [1]. The TLDR is basically that
> Christoph has a targeted test that reproduces the initial mount failure
> and I hacked up a more general test that also reproduces it and
> additional growfs recovery problems. This test does seem to confirm that
> the previous patches address the mount failure issue, but this patch
> doesn't seem to prevent any of the other problems produced by the
> generic test. That might just mean the test doesn't reproduce what this
> fixes, but it's kind of hard to at least regression test something like
> this when basic growfs crash-recovery seems pretty much broken.

Hmm, if you make a variant of that test which formats with an rt device
and -d rtinherit=1 and then runs xfs_growfs -R instead of -D, do you see
similar blowups?  Let's see what happens if I do that...

--D

> Brian
> 
> [1] https://lore.kernel.org/fstests/ZwVdtXUSwEXRpcuQ@bfoster/
> 
> > > have at least one fairly straightforward growfs/recovery test in the
> > > works that reliably explodes, personally I'd suggest to split this work
> > > off into separate series.
> > > 
> > > It seems reasonable enough to me to get patches 1-5 in asap once they're
> > > fully cleaned up, and then leave the next two as part of a followon
> > > series pending further investigation into these other issues. As part of
> > > that I'd like to know whether the recovery test reproduces (or can be
> > > made to reproduce) the issue this patch presumably fixes, but I'd also
> > > settle for "the grow recovery test now passes reliably and this doesn't
> > > regress it." But once again, just my .02.
> > 
> > Yeah, it's too bad there's no good way to test recovery with older
> > kernels either. :(
> > 
> > --D
> > 
> > > Brian
> > > 
> > > 
> > 
> 
>