On Tue, Nov 28, 2017 at 11:05 PM, Josef Bacik <josef@xxxxxxxxxxxxxx> wrote: > On Tue, Nov 28, 2017 at 10:40:24PM +0200, Amir Goldstein wrote: >> On Tue, Nov 28, 2017 at 9:29 PM, Amir Goldstein <amir73il@xxxxxxxxx> wrote: >> > On Tue, Nov 28, 2017 at 7:30 PM, Josef Bacik <josef@xxxxxxxxxxxxxx> wrote: >> >> From: Josef Bacik <jbacik@xxxxxx> >> >> >> >> Amir noticed that sometimes the xfstests using dm-log-writes would fail >> >> randomly but would work fine after trying again manually. This is >> >> because dm-log-writes writes directly to the device, but the log replay >> >> tools read and write via the block device page cache. Sometimes this >> >> resulted in stale data being in the block device's page cache which >> >> would result in random failures. To handle this simply invalidate the >> >> block device page cache on destruction so any replay of the log device >> >> that follows will be forced to read the new real contents. >> >> >> >> Reported-and-tested-by: Amir Goldstein <amir73il@xxxxxxxxx> >> > >> > I'm fine with the Reported-by, but let's wait a while with this patch so >> > I have more time to torture it. >> > The incidents I got even before the patch did not happen more than >> > a handful of times after running for a few days, so I need some more >> > days to validate the fix. >> > I had already sent you some weird output. Let's see what else comes >> > along. >> > >> >> Sorry, no cigar. >> Another run just completed with Malformed log and corrupted fs >> >> The _check_scratch_fs that fails is the one right after _log_writes_remove >> just like the report that I sent before this patch >> and the LOGWRITES_DEV itself has malformed entry before the "end" mark >> or even the last fsync mark: >> >> ./src/log-writes/replay-log -v --log $LOGWRITES_DEV --find --end-mark >> testfile1.mark17 >> Malformed entry @112134 >> >> For what its worth, I am testing on spinning disks, 100G scratch dev. >> Right now, I zoomed in on the following fsx seeds that managed to fail the test >> a few times already, but in different ways, so I'm not sure the seeds are more >> than voodoo: >> seeds=(4597 4598 4599 4600) >> >> I'll start running the same test but with fsx running on test partition, just >> to get the feel for running the same fsx threads on bare xfs. >> >> Any other ideas? >> > > Is there anything special about your devices? Are they 4k drives? The corrupt > log is not awesome, was it still corrupt after the test bailed out? Thanks, > No nothing special. boring 4TB WD drive. just reported on the xfstest thread that problem was reproduced with xfs on scratch partition, where dm-log-writes in not in the picture, so for now, dm-log-writes is off the hook. Still need to explain the malformed log, but will follow the xfs corruption lead first. Thanks, Amir.