On Tue, Nov 28, 2017 at 10:00 PM, Josef Bacik <josef@xxxxxxxxxxxxxx> wrote: > On Tue, Nov 28, 2017 at 09:32:59PM +0200, Amir Goldstein wrote: >> On Tue, Nov 28, 2017 at 7:21 PM, Josef Bacik <josef@xxxxxxxxxxxxxx> wrote: >> > On Tue, Nov 28, 2017 at 06:48:43PM +0200, Amir Goldstein wrote: >> >> On Mon, Nov 27, 2017 at 5:04 PM, Josef Bacik <josef@xxxxxxxxxxxxxx> wrote: >> >> > On Mon, Nov 27, 2017 at 11:56:58AM +0200, Amir Goldstein wrote: >> >> >> On Tue, Sep 5, 2017 at 10:11 PM, Amir Goldstein <amir73il@xxxxxxxxx> wrote: >> >> >> > Cherry-picked the test from commit 70d41e17164b >> >> >> > in Josef Bacik's fstests tree (https://github.com/josefbacik/fstests). >> >> >> > Quoting from Josef's commit message: >> >> >> > >> >> >> > The test just runs some ops and exits, then finds all of the good buffers >> >> >> > in the directory we provided and: >> >> >> > - replays up to the mark given >> >> >> > - mounts the file system and compares the md5sum >> >> >> > - unmounts and fsck's to check for metadata integrity >> >> >> > >> >> >> > dm-log-writes will pretend to do discard and the replay-log tool will >> >> >> > replay it properly depending on the underlying device, either by writing >> >> >> > 0's or actually calling the discard ioctl, so I've enabled discard in the >> >> >> > test for maximum fun. >> >> >> > >> >> >> > [Amir:] >> >> >> > - Removed unneeded _test_falloc_support dynamic FSX_OPTS >> >> >> > - Fold repetitions into for loops >> >> >> > - Added place holders for using constant random seeds >> >> >> > - Add pre umount checkpint >> >> >> > - Add test to new 'replay' group >> >> >> > - Address review comments by Eryu Guan >> >> >> > >> >> >> > Cc: Josef Bacik <jbacik@xxxxxx> >> >> >> > Signed-off-by: Amir Goldstein <amir73il@xxxxxxxxx> >> >> >> >> >> >> >> >> >> Josef, >> >> >> >> >> >> As you know, this test is now merged to xfstest as generic/455. >> >> >> I have been running the test for a while on xfs and it occasionally >> >> >> reports inconsistencies which I try to investigate. >> >> >> >> >> >> In some of the reports, it appears that dm-log-writes may be exhibiting >> >> >> a reliability issue (see below). >> >> >> >> >> > >> >> > It's not a reliability issue, its a caching issue. dm-log-writes is just >> >> > issuing bio's to the log device, and our destructor waits for all pending io >> >> > blocks to complete before exiting, so unless I've missed how dm is destroying >> >> > devices everything should be on disk. >> >> > >> >> > However since we replay in userspace we are going through the blockdevice's >> >> > pagecache, so we could have stale pages left in place which is screwing us up. >> >> > Will you try this patch and see if it fixes the problem? Thanks, >> >> > >> >> > Josef >> >> > >> >> > >> >> > diff --git a/drivers/md/dm-log-writes.c b/drivers/md/dm-log-writes.c >> >> > index 8b80a9ce9ea9..1c502930af5e 100644 >> >> > --- a/drivers/md/dm-log-writes.c >> >> > +++ b/drivers/md/dm-log-writes.c >> >> > @@ -545,6 +545,8 @@ static void log_writes_dtr(struct dm_target *ti) >> >> > !atomic_read(&lc->pending_blocks)); >> >> > kthread_stop(lc->log_kthread); >> >> > >> >> > + invalidate_bdev(lc->logdev->bdev); >> >> > + invalidate_bdev(lc->dev->bdev); >> >> > WARN_ON(!list_empty(&lc->logging_blocks)); >> >> > WARN_ON(!list_empty(&lc->unflushed_blocks)); >> >> > dm_put_device(ti, lc->dev); >> >> >> >> Josef, >> >> >> >> With your patch OR with my xfstest patch that adds "sync" I did not yet see >> >> another problem of garbage fs after _log_writes_remove. >> >> >> >> I did however, encounter this error (failure to verify read data during fsx) >> >> from scratch/log-writes device (see attached full log). >> >> >> >> I will keep running the test to collect more information. >> >> >> > >> > That failure I'll lay at the feet of whatever fs you are testing ;). I'm glad >> > my patch fixed the replay problem, I'll send that up. Thanks, >> > >> >> O oh!. You are implying that xfs fails plain fsx and nobody noticed. >> That is not where I would place my bet. >> > > Why not? This is one of the only tests that runs multiple threads of fsx at the > same time. dm-log-writes does nothing special with the actual target disk, it > just clones the bio's as it gets them, sets the bdev for the normal device and > sends them on their way, there's no reason to think we're breaking that. > Furthermore fsx is in normal page cache mode, not O_DIRECT, so it's more likely > something else is going wrong and not dm-log-writes, unless there is so little > memory on the system that we are constantly reading cold-cache. Thanks, > Very well, I'll run the same test on bare xfs for comparison. Amir.