Re: [PATCH] dm-log-writes: invalidate the bdev's for both of our devices

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Nov 28, 2017 at 11:05 PM, Josef Bacik <josef@xxxxxxxxxxxxxx> wrote:
> On Tue, Nov 28, 2017 at 10:40:24PM +0200, Amir Goldstein wrote:
>> On Tue, Nov 28, 2017 at 9:29 PM, Amir Goldstein <amir73il@xxxxxxxxx> wrote:
>> > On Tue, Nov 28, 2017 at 7:30 PM, Josef Bacik <josef@xxxxxxxxxxxxxx> wrote:
>> >> From: Josef Bacik <jbacik@xxxxxx>
>> >>
>> >> Amir noticed that sometimes the xfstests using dm-log-writes would fail
>> >> randomly but would work fine after trying again manually.  This is
>> >> because dm-log-writes writes directly to the device, but the log replay
>> >> tools read and write via the block device page cache.  Sometimes this
>> >> resulted in stale data being in the block device's page cache which
>> >> would result in random failures.  To handle this simply invalidate the
>> >> block device page cache on destruction so any replay of the log device
>> >> that follows will be forced to read the new real contents.
>> >>
>> >> Reported-and-tested-by: Amir Goldstein <amir73il@xxxxxxxxx>
>> >
>> > I'm fine with the Reported-by, but let's wait a while with this patch so
>> > I have more time to torture it.
>> > The incidents I got even before the patch did not happen more than
>> > a handful of times after running for a few days, so I need some more
>> > days to validate the fix.
>> > I had already sent you some weird output. Let's see what else comes
>> > along.
>> >
>>
>> Sorry, no cigar.
>> Another run just completed with Malformed log and corrupted fs
>>
>> The _check_scratch_fs that fails is the one right after _log_writes_remove
>> just like the report that I sent before this patch
>> and the LOGWRITES_DEV itself has malformed entry before the "end" mark
>> or even the last fsync mark:
>>
>> ./src/log-writes/replay-log -v --log $LOGWRITES_DEV --find --end-mark
>> testfile1.mark17
>> Malformed entry @112134
>>
>> For what its worth, I am testing on spinning disks, 100G scratch dev.
>> Right now, I zoomed in on the following fsx seeds that managed to fail the test
>> a few times already, but in different ways, so I'm not sure the seeds are more
>> than voodoo:
>> seeds=(4597 4598 4599 4600)
>>
>> I'll start running the same test but with fsx running on test partition, just
>> to get the feel for running the same fsx threads on bare xfs.
>>
>> Any other ideas?
>>
>
> Is there anything special about your devices?  Are they 4k drives?  The corrupt
> log is not awesome, was it still corrupt after the test bailed out?  Thanks,
>

No nothing special. boring 4TB WD drive.
just reported on the xfstest thread that problem was reproduced with
xfs on scratch
partition, where dm-log-writes in not in the picture, so for now,
dm-log-writes is off the
hook.
Still need to explain the malformed log, but will follow the xfs
corruption lead first.

Thanks,
Amir.



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux