Re: [PATCH 2/3] xfs: restrict the h_size fixup in xlog_do_recovery_pass

Brian Foster <bfoster@xxxxxxxxxx> · Fri, 10 May 2024 08:34:35 -0400

On Tue, Apr 30, 2024 at 06:59:47AM -0400, Brian Foster wrote:
> On Mon, Apr 29, 2024 at 07:15:52PM +0200, Christoph Hellwig wrote:
> > On Mon, Apr 29, 2024 at 08:18:44AM -0400, Brian Foster wrote:
> > > > -		if (h_len > h_size && h_len <= log->l_mp->m_logbsize &&
> > > > +		if (!xfs_has_reflink(log->l_mp) && xfs_has_reflink(log->l_mp) &&
> > > > +		    h_len > h_size && h_len <= log->l_mp->m_logbsize &&
> > > 
> > > ... but I'm going to assume this hasn't been tested. ;) Do you mean to
> > > also check !rmapbt here?
> > 
> > Heh.  Well, it has been tested in that we don't do the fixup for the
> > reproducer fixed by the previous patch and in that xfstests still passes.
> > I guess nothing in there hits the old mkfs fixup, which isn't surprising.
> > 
> 
> Yeah.. (sorry, just teasing about the testing.. ;).
> 
> > > Can you please also just double check that we still handle the original
> > > mkfs problem correctly after these changes? I think that just means mkfs
> > > from a sufficiently old xfsprogs using a larger log stripe unit, and
> > > confirm the fs mounts (with a warning).
> > 
> > Yeah.  Is there any way to commit a fs image to xfstests so that we
> > actually regularly test for it?
> > 
> 
> Not sure.. ideally we could fuzz the log record header somehow or
> another to test for these various scenarios, since we clearly broke this
> once already.
> 
> I don't quite recall if I looked into that at the time of the original
> workaround. To Darrick's point, I wonder if there would be some use to
> an expert logformat command or something that allowed for some bonkers
> parameters (assuming something like that doesn't exist already).
> 
> I'm out on PTO for (at least) today, but I can take a closer look at
> that once I'm back and caught up...
> 

I've had a chance to poke at this a bit and so far I don't think it's as
straightforward as I'd hoped. The logformat command already exists,
which makes it pretty easy to malformat a log record, but the recovery
code only runs into this path on a dirty log. I suspect the original
issue that prompted this logic was something like a crash-recovery test
leading to log record trimming (i.e. simulated torn log writes) and then
happening onto previously mkfs-formatted log records that way, but that
is a guess at this point.

I did play around a bit with fuzzing h_size for those sorts of tests
(i.e. xfs/141), but that ran into other issues. I went back to using
xfsprogs v4.3 or so and that eventually also produces an unmountable
filesystem with a similar error (even with the h_size fix patch). It
fails to locate the head/tail, which is obviously necessary in order to
process the log and perform record validation, so I'm wondering if
something else changed on the kernel side to further gate this scenario.
Of course it's also possible I'm looking at things wrong and this is
just orthogonal to the original problem.

But given all of that, I am a little skeptical on the value of retaining
this logic at all. Does whatever test case that recently uncovered the
h_size problem leave a mountable filesystem, or does it just work around
bad behavior to provide a more graceful failure condition?

Brian