On Wed, Feb 27, 2019 at 06:06:57AM +0200, Amir Goldstein wrote: > On Wed, Feb 27, 2019 at 1:22 AM Dave Chinner <david@xxxxxxxxxxxxx> wrote: > > > > On Tue, Feb 26, 2019 at 11:10:02PM +0200, Amir Goldstein wrote: > > > On Tue, Feb 26, 2019 at 8:14 PM Brian Foster <bfoster@xxxxxxxxxx> wrote: > > > > > > > > The dm-log-writes mechanism runs a workload against a filesystem, > > > > tracks underlying FUAs and restores the filesystem to various points > > > > in time based on FUA marks. This allows fstests to check fs > > > > consistency at various points and verify log recovery works as > > > > expected. > > > > > > > > > > Inaccurate. generic/482 restores to FUA points. > > > generic/45[57] restore to user defined points in time (marks). > > > dm-log-writes mechanism is capable of restoring either. > > > > > > > This mechanism does not play well with LSN based log recovery > > > > ordering behavior on XFS v5 superblocks, however. For example, > > > > generic/482 can reproduce false positive corruptions based on extent > > > > to btree conversion of an inode if the inode and associated btree > > > > block are written back after different checkpoints. Even though both > > > > items are logged correctly in the extent-to-btree transaction, the > > > > btree block can be relogged (multiple times) and only written back > > > > once when the filesystem unmounts. If the inode was written back > > > > after the initial conversion, recovery points between that mark and > > > > when the btree block is ultimately written back will show corruption > > > > because log recovery sees that the destination buffer is newer than > > > > the recovered buffer and intentionally skips the buffer. This is a > > > > false positive because the destination buffer was resiliently > > > > written back after being physically relogged one or more times. > > > > > > > > > > This story doesn't add up. > > > Either dm-log-writes emulated power failure correctly or it doesn't. > > > My understanding is that the issue you are seeing is a result of > > > XFS seeing "data from the future" after a restore of a power failure > > > snapshot, because the scratch device is not a clean slate. > > > If I am right, then the correct solution is to wipe the journal before > > > starting to replay restore points. > > > > If that is the problem, then I think we should be wiping the entire > > block device before replaying the recorded logwrite. > > > > Indeed. > > > i.e. this sounds like a "block device we are replaying onto has > > stale data in it" problem because we are replaying the same > > filesystem over the top of itself. Hence there are no unique > > identifiers in the metadata that can detect stale metadata in > > the block device. > > > > I'm surprised that we haven't tripped over this much earlier that > > this... > > > > I remember asking myself the same thing... it's coming back to me > now. I really remember having this discussion during test review. > generic/482 is an adaptation of Josef's test script [1], which > does log recovery onto a snapshot on every FUA checkpoint. > > [1] https://github.com/josefbacik/log-writes/blob/master/replay-fsck-wrapper.sh > > Setting up snapshots for every checkpoint was found empirically to take > more test runtime, than replaying log from the start for each checkpoint. > That observation was limited to the systems that Qu and Eryu tested on. > > IRC, what usually took care of cleaning the block device is replaying the > "discard everything" IO from mkfs time. dm-log-writes driver should take care > of zeroing blocks upon replaying a discard IO even on a target device that > doesn't support discard, but maybe if original device doesn't support discard, > the IO is not recorded at all and therefore not replayed? > > Brian, can you check if your log-writes stream contains a discard IO for > the entire block device? I do remember that the initial log-writes tests worked > very reliably on my laptop with SSD and were a bit flaky on another test machine > with spinning rust, but that machine had other hardware reliability issues at > the time (bad SATA cable) so I attributed all issues to that problem. > FYI, the command from your other mail on a logwrites dev that demonstrates this problem shows the following: # ./src/log-writes/replay-log -vv --find --end-mark mkfs --log /dev/test/tmp | grep DISCARD seek entry 0@2: 0, size 8388607, flags 0x4(DISCARD) seek entry 1@3: 8388607, size 8388607, flags 0x4(DISCARD) seek entry 2@4: 16777214, size 8388607, flags 0x4(DISCARD) seek entry 3@5: 25165821, size 6291459, flags 0x4(DISCARD) ... which appears to cover the entire device. Is the intention that this should wipe the scratch device? Brian > BTW, looking closer at generic/482, $prev does not seem to be used at all > in the replay loop. > > > > > Update the dm-log-writes require checks to enforce v4 superblocks > > > > when running against XFS and skip the test otherwise. > > > > > > You might as well disable dm-log-writes test for XFS completely. > > > Who cares about v4 superblocks these days? > > > > Enough of the inflammatory hyperbole, Amir. Statements like this > > serve no useful purpose. > > > > <deep breath> Agreed. > > Thanks, > Amir.