On Wed, 10 Dec 2014, Josef Bacik wrote: > On 12/10/2014 06:27 AM, Jan Kara wrote: > > On Mon 08-12-14 17:11:41, Josef Bacik wrote: > > > Hello, > > > > > > We have been doing pretty well at populating xfstests with loads of > > > tests to catch regressions and validate we're all working properly. > > > One thing that has been lacking is a good way to verify file system > > > integrity after a power fail. This is a core part of what file > > > systems are supposed to provide but it is probably the least tested > > > aspect. We have dm-flakey tests in xfstests to test fsync > > > correctness, but these tests do not catch the random horrible things > > > that can go wrong. We are still finding horrible scary things that > > > go wrong in Btrfs because it is simply hard to reproduce and test > > > for. > > > > > > I have been working on an idea to do this better, some may have seen > > > my dm-power-fail attempt, and I've got a new incarnation of the idea > > > thanks to discussions with Zach Brown. Obviously there will be a > > > lot changing in this area in the time between now and March but it > > > would be good to have everybody in the room talking about what they > > > would need to build a good and deterministic test to make sure we're > > > always giving a consistent file system and to make sure our fsync() > > > handling is working properly. Thanks, > > I agree we are lacking in testing this aspect. Just I don't see too much > > material for discussion there, unless we have something more tangible - > > when we have some implementation, we can talk about pros and cons of it, > > what still needs doing etc. > > > > Right that's what I was getting at. I have a solution and have sent it around > but there doesn't seem to be too many people interested in commenting on it. > I figure one of two things will happen > > 1) My solution will go in before LSF, in which case YAY my job is done and > this is more of an [ATTEND] than a [TOPIC], or > > 2) My solution hasn't gone in yet and I'd like to discuss my methodology and > how we can integrate it into xfstests, future features, other areas we could > test etc. > > Maybe not a full blown slot but combined with a overall testing slot or hell > just a quick lightening talk. Thanks, I have a related topic that may make sense to fit into any discussion about this. Twice recently we've run into trouble using newish or less common (combinations of) syscalls. The first instance was with the use of sync_file_range to try to control/limit the amount of dirty data in the page cache. This, possibly in combination with posix_fadvise(DONTNEED), managed to break the writeback sequence in XFS and led to data corruption after power loss. The other issue we saw was just a general raft of FIEMAP bugs over the last year or two. We saw cases where even after fsync a fiemap result would not include all extents, and (not unexpectedly) lots of corner cases in several file systems, e.g., around partial blocks at end of file. (As far as I know everything we saw is resolved in current kernels.) I'm not so concerned with these specific bugs, but worried that we (perhaps naively) expected them to be pretty safe. Perhaps for FIEMAP this is a general case where a newish syscall/ioctl should be tested carefully with our workloads before being relied upon, and we could have worked to make sure e.g. xfstests has appropriate tests. For power fail testing in particular, though, right now it isn't clear who is testing what under what workloads, so the only really "safe" approach is to stick to whatever syscall combinations we think the rest of the world is using, or make sure we test ourselves. As things stand now the other devs are loathe to touch any remotely exotic fs call, but that hardly seems ideal. Hopefully a common framework for powerfail testing can improve on this. Perhaps there are other ways we make it easier to tell what is (well) tested, and conversely ensure that those tests are well-aligned with what real users are doing... sage -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html