Re: [LSF/MM TOPIC] Working towards better power fail testing

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 01/13/2015 12:05 PM, Dmitry Monakhov wrote:
Josef Bacik <jbacik@xxxxxx> writes:

Hello,

We have been doing pretty well at populating xfstests with loads of
tests to catch regressions and validate we're all working properly.  One
thing that has been lacking is a good way to verify file system
integrity after a power fail.  This is a core part of what file systems
are supposed to provide but it is probably the least tested aspect.  We
have dm-flakey tests in xfstests to test fsync correctness, but these
tests do not catch the random horrible things that can go wrong.  We are
still finding horrible scary things that go wrong in Btrfs because it is
simply hard to reproduce and test for.

I have been working on an idea to do this better, some may have seen my
dm-power-fail attempt, and I've got a new incarnation of the idea thanks
to discussions with Zach Brown.  Obviously there will be a lot changing
in this area in the time between now and March but it would be good to
have everybody in the room talking about what they would need to build a
good and deterministic test to make sure we're always giving a
consistent file system and to make sure our fsync() handling is working
properly.  Thanks,
I've submitted generic/019 long time ago. Test is fine and helps to
uncover several bugs, But it is not ideal because currently power failure
simulation (via fail_make_request) is not not completely atomic
So I would like to attend to discussion how we can implement power
failure simulation completely atomic.


Yeah I did the first dm-flakey tests and extended that some. These are good baselines but I've hit a few bugs recently in btrfs that would have required us to crash at exactly the right spot to hit which is what I want to try and build for. Something we can run through all the possible crash scenarios to make sure we're always leaving a consistent fs.

BTW I also would like to share hw-flush utility (which our QA team use for
use power-fail/SSD-cache testing) and harness for it.


That would be super cool, the more testing we can have around making sure we're waiting for stuff properly and flushing caches properly the better. Thanks,

Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux