Re: [PATCH] generic: skip dm-log-writes tests on XFS v5 superblock filesystems

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Feb 27, 2019 at 08:18:39AM -0500, Brian Foster wrote:
> On Tue, Feb 26, 2019 at 11:10:02PM +0200, Amir Goldstein wrote:
> > On Tue, Feb 26, 2019 at 8:14 PM Brian Foster <bfoster@xxxxxxxxxx> wrote:
> > >
> > > The dm-log-writes mechanism runs a workload against a filesystem,
> > > tracks underlying FUAs and restores the filesystem to various points
> > > in time based on FUA marks. This allows fstests to check fs
> > > consistency at various points and verify log recovery works as
> > > expected.
> > >
> > 
> > Inaccurate. generic/482 restores to FUA points.
> > generic/45[57] restore to user defined points in time (marks).
> > dm-log-writes mechanism is capable of restoring either.
> > 
> 
> The above is poorly worded. I'm aware of the separate tests and I've
> used the mechanism to bounce around to various marks. Note that my
> understanding of the mechanism beyond that is rudimentary. I'll reword
> this if the patch survives, but it sounds like there may be opportunity
> to fix the mechanism, which clearly would be ideal.
> 
> > > This mechanism does not play well with LSN based log recovery
> > > ordering behavior on XFS v5 superblocks, however. For example,
> > > generic/482 can reproduce false positive corruptions based on extent
> > > to btree conversion of an inode if the inode and associated btree
> > > block are written back after different checkpoints. Even though both
> > > items are logged correctly in the extent-to-btree transaction, the
> > > btree block can be relogged (multiple times) and only written back
> > > once when the filesystem unmounts. If the inode was written back
> > > after the initial conversion, recovery points between that mark and
> > > when the btree block is ultimately written back will show corruption
> > > because log recovery sees that the destination buffer is newer than
> > > the recovered buffer and intentionally skips the buffer. This is a
> > > false positive because the destination buffer was resiliently
> > > written back after being physically relogged one or more times.
> > >
> > 
> > This story doesn't add up.
> > Either dm-log-writes emulated power failure correctly or it doesn't.
> 
> It doesn't. It leaves the log and broader filesystem in a state that
> makes no sense with respect to a power failure.
> 
> > My understanding is that the issue you are seeing is a result of
> > XFS seeing "data from the future" after a restore of a power failure
> > snapshot, because the scratch device is not a clean slate.
> > If I am right, then the correct solution is to wipe the journal before
> > starting to replay restore points.
> > 
> > Am I misunderstanding whats going on?
> > 
> 
> Slightly. Wiping the journal will not help. I _think_ that a wipe of the
> broader filesystem before recovering from the initial fua and replaying
> in order from there would mitigate the problem. Is there an easy way to
> test that theory? For example, would a mkfs of the scratch device before
> the replay sequence of generic/482 begins allow the test to still
> otherwise function correctly?
> 

FYI, I gave this a try and it didn't ultimately work because mkfs didn't
clear the device either. I ended up reproducing the problem, physically
zeroing the device, replaying the associated FUA and observing the
problem go away. From there, if I replay to the final FUA mark and go
back to the (originally) problematic FUA, the problem is reintroduced.

Brian

> I was going to elaborate further on the sequence of events, but I see
> Dave has already nicely described this generically in his most recent
> reply.
> 
> > IIRC, some of Josef's earlier versions used dm snapshots to restore
> > the blockdev to a clean state before replying log-writes.
> > I think that one of the earlier versions of generic/482 also took
> > that approach, but that resulted in longer test runtime (not sure).
> > 
> > > Update the dm-log-writes require checks to enforce v4 superblocks
> > > when running against XFS and skip the test otherwise.
> > 
> > You might as well disable dm-log-writes test for XFS completely.
> > Who cares about v4 superblocks these days?
> > We need a tool to make sure the NEW features are crash resilient.
> > 
> > dm-log-writes proved itself to be a powerful generic test tool that found
> > some serious crash consistency bugs in every one of the major filesystems
> > and it found bugs with XFS reflink log recovery as well, so IMO
> > disabling dm-log-writes for v5 would be "very unwise!".
> > 
> 
> Thanks for the insight
> 
> Brian
> 
> > Thanks,
> > Amir.



[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux