Re: [PATCH v3 10/13] fstests: crash consistency fsx test using dm-log-writes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Nov 28, 2017 at 10:00 PM, Josef Bacik <josef@xxxxxxxxxxxxxx> wrote:
> On Tue, Nov 28, 2017 at 09:32:59PM +0200, Amir Goldstein wrote:
>> On Tue, Nov 28, 2017 at 7:21 PM, Josef Bacik <josef@xxxxxxxxxxxxxx> wrote:
>> > On Tue, Nov 28, 2017 at 06:48:43PM +0200, Amir Goldstein wrote:
>> >> On Mon, Nov 27, 2017 at 5:04 PM, Josef Bacik <josef@xxxxxxxxxxxxxx> wrote:
>> >> > On Mon, Nov 27, 2017 at 11:56:58AM +0200, Amir Goldstein wrote:
>> >> >> On Tue, Sep 5, 2017 at 10:11 PM, Amir Goldstein <amir73il@xxxxxxxxx> wrote:
>> >> >> > Cherry-picked the test from commit 70d41e17164b
>> >> >> > in Josef Bacik's fstests tree (https://github.com/josefbacik/fstests).
>> >> >> > Quoting from Josef's commit message:
>> >> >> >
>> >> >> >   The test just runs some ops and exits, then finds all of the good buffers
>> >> >> >   in the directory we provided and:
>> >> >> >   - replays up to the mark given
>> >> >> >   - mounts the file system and compares the md5sum
>> >> >> >   - unmounts and fsck's to check for metadata integrity
>> >> >> >
>> >> >> >   dm-log-writes will pretend to do discard and the replay-log tool will
>> >> >> >   replay it properly depending on the underlying device, either by writing
>> >> >> >   0's or actually calling the discard ioctl, so I've enabled discard in the
>> >> >> >   test for maximum fun.
>> >> >> >
>> >> >> > [Amir:]
>> >> >> > - Removed unneeded _test_falloc_support dynamic FSX_OPTS
>> >> >> > - Fold repetitions into for loops
>> >> >> > - Added place holders for using constant random seeds
>> >> >> > - Add pre umount checkpint
>> >> >> > - Add test to new 'replay' group
>> >> >> > - Address review comments by Eryu Guan
>> >> >> >
>> >> >> > Cc: Josef Bacik <jbacik@xxxxxx>
>> >> >> > Signed-off-by: Amir Goldstein <amir73il@xxxxxxxxx>
>> >> >>
>> >> >>
>> >> >> Josef,
>> >> >>
>> >> >> As you know, this test is now merged to xfstest as generic/455.
>> >> >> I have been running the test for a while on xfs and it occasionally
>> >> >> reports inconsistencies which I try to investigate.
>> >> >>
>> >> >> In some of the reports, it appears that dm-log-writes may be exhibiting
>> >> >> a reliability issue (see below).
>> >> >>
>> >> >
>> >> > It's not a reliability issue, its a caching issue.  dm-log-writes is just
>> >> > issuing bio's to the log device, and our destructor waits for all pending io
>> >> > blocks to complete before exiting, so unless I've missed how dm is destroying
>> >> > devices everything should be on disk.
>> >> >
>> >> > However since we replay in userspace we are going through the blockdevice's
>> >> > pagecache, so we could have stale pages left in place which is screwing us up.
>> >> > Will you try this patch and see if it fixes the problem?  Thanks,
>> >> >
>> >> > Josef
>> >> >
>> >> >
>> >> > diff --git a/drivers/md/dm-log-writes.c b/drivers/md/dm-log-writes.c
>> >> > index 8b80a9ce9ea9..1c502930af5e 100644
>> >> > --- a/drivers/md/dm-log-writes.c
>> >> > +++ b/drivers/md/dm-log-writes.c
>> >> > @@ -545,6 +545,8 @@ static void log_writes_dtr(struct dm_target *ti)
>> >> >                    !atomic_read(&lc->pending_blocks));
>> >> >         kthread_stop(lc->log_kthread);
>> >> >
>> >> > +       invalidate_bdev(lc->logdev->bdev);
>> >> > +       invalidate_bdev(lc->dev->bdev);
>> >> >         WARN_ON(!list_empty(&lc->logging_blocks));
>> >> >         WARN_ON(!list_empty(&lc->unflushed_blocks));
>> >> >         dm_put_device(ti, lc->dev);
>> >>
>> >> Josef,
>> >>
>> >> With your patch OR with my xfstest patch that adds "sync" I did not yet see
>> >> another problem of garbage fs after _log_writes_remove.
>> >>
>> >> I did however, encounter this error (failure to verify read data during fsx)
>> >> from scratch/log-writes device (see attached full log).
>> >>
>> >> I will keep running the test to collect more information.
>> >>
>> >
>> > That failure I'll lay at the feet of whatever fs you are testing ;).  I'm glad
>> > my patch fixed the replay problem, I'll send that up.  Thanks,
>> >
>>
>> O oh!. You are implying that xfs fails plain fsx and nobody noticed.
>> That is not where I would place my bet.
>>
>
> Why not?  This is one of the only tests that runs multiple threads of fsx at the
> same time.  dm-log-writes does nothing special with the actual target disk, it
> just clones the bio's as it gets them, sets the bdev for the normal device and
> sends them on their way, there's no reason to think we're breaking that.
> Furthermore fsx is in normal page cache mode, not O_DIRECT, so it's more likely
> something else is going wrong and not dm-log-writes, unless there is so little
> memory on the system that we are constantly reading cold-cache.  Thanks,
>

Very well, I'll run the same test on bare xfs for comparison.

Amir.



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux