On Fri, Oct 11, 2024 at 10:13:03AM -0700, Darrick J. Wong wrote: > On Fri, Oct 11, 2024 at 10:02:16AM -0400, Brian Foster wrote: > > On Fri, Oct 11, 2024 at 09:57:09AM +0200, Christoph Hellwig wrote: > > > On Thu, Oct 10, 2024 at 10:05:53AM -0400, Brian Foster wrote: > > > > Ok, so we don't want geometry changes transactions in the same CIL > > > > checkpoint as alloc related transactions that might depend on the > > > > geometry changes. That seems reasonable and on a first pass I have an > > > > idea of what this is doing, but the description is kind of vague. > > > > Obviously this fixes an issue on the recovery side (since I've tested > > > > it), but it's not quite clear to me from the description and/or logic > > > > changes how that issue manifests. > > > > > > > > Could you elaborate please? For example, is this some kind of race > > > > situation between an allocation request and a growfs transaction, where > > > > the former perhaps sees a newly added AG between the time the growfs > > > > transaction commits (applying the sb deltas) and it actually commits to > > > > the log due to being a sync transaction, thus allowing an alloc on a new > > > > AG into the same checkpoint that adds the AG? > > > > > > This is based on the feedback by Dave on the previous version: > > > > > > https://lore.kernel.org/linux-xfs/Zut51Ftv%2F46Oj386@xxxxxxxxxxxxxxxxxxx/ > > > > > > > Ah, Ok. That all seems reasonably sane to me on a first pass, but I'm > > not sure I'd go straight to this change given the situation... > > > > > Just doing the perag/in-core sb updates earlier fixes all the issues > > > with my test case, so I'm not actually sure how to get more updates > > > into the check checkpoint. I'll try your exercisers if it could hit > > > that. > > > > > > > Ok, that explains things a bit. My observation is that the first 5 > > patches or so address the mount failure problem, but from there I'm not > > reproducing much difference with or without the final patch. > > Does this change to flush the log after committing the new sb fix the > recovery problems on older kernels? I /think/ that's the point of this > patch. > I don't follow.. growfs always forced the log via the sync transaction, right? Or do you mean something else by "change to flush the log?" I thought the main functional change of this patch was to hold the superblock buffer locked across the force so nothing else can relog the new geometry superblock buffer in the same cil checkpoint. Presumably, the theory is that prevents recovery from seeing updates to different buffers that depend on the geometry update before the actual sb geometry update is recovered (because the latter might have been relogged). Maybe we're saying the same thing..? Or maybe I just misunderstand. Either way I think patch could use a more detailed commit log... > > Either way, > > I see aborts and splats all over the place, which implies at minimum > > this isn't the only issue here. > > Ugh. I've recently noticed the long soak logrecovery test vm have seen > a slight tick up in failure rates -- random blocks that have clearly had > garbage written to them such that recovery tries to read the block to > recover a buffer log item and kaboom. At this point it's unclear if > that's a problem with xfs or somewhere else. :( > > > So given that 1. growfs recovery seems pretty much broken, 2. this > > particular patch has no straightforward way to test that it fixes > > something and at the same time doesn't break anything else, and 3. we do > > I'm curious, what might break? Was that merely a general comment, or do > you have something specific in mind? (iows: do you see more string to > pull? :)) > Just a general comment.. Something related that isn't totally clear to me is what about the inverse shrink situation where dblocks is reduced. I.e., is there some similar scenario where for example instead of the sb buffer being relogged past some other buffer update that depends on it, some other change is relogged past a sb update that invalidates/removes blocks referenced by the relogged buffer..? If so, does that imply a shrink should flush the log before the shrink transaction commits to ensure it lands in a new checkpoint (as opposed to ensuring followon updates land in a new checkpoint)..? Anyways, my point is just that if it were me I wouldn't get too deep into this until some of the reproducible growfs recovery issues are at least characterized and testing is more sorted out. The context for testing is here [1]. The TLDR is basically that Christoph has a targeted test that reproduces the initial mount failure and I hacked up a more general test that also reproduces it and additional growfs recovery problems. This test does seem to confirm that the previous patches address the mount failure issue, but this patch doesn't seem to prevent any of the other problems produced by the generic test. That might just mean the test doesn't reproduce what this fixes, but it's kind of hard to at least regression test something like this when basic growfs crash-recovery seems pretty much broken. Brian [1] https://lore.kernel.org/fstests/ZwVdtXUSwEXRpcuQ@bfoster/ > > have at least one fairly straightforward growfs/recovery test in the > > works that reliably explodes, personally I'd suggest to split this work > > off into separate series. > > > > It seems reasonable enough to me to get patches 1-5 in asap once they're > > fully cleaned up, and then leave the next two as part of a followon > > series pending further investigation into these other issues. As part of > > that I'd like to know whether the recovery test reproduces (or can be > > made to reproduce) the issue this patch presumably fixes, but I'd also > > settle for "the grow recovery test now passes reliably and this doesn't > > regress it." But once again, just my .02. > > Yeah, it's too bad there's no good way to test recovery with older > kernels either. :( > > --D > > > Brian > > > > >