On Mon, Jan 05, 2015 at 10:34:57AM -0800, Sage Weil wrote: > On Wed, 10 Dec 2014, Josef Bacik wrote: > > On 12/10/2014 06:27 AM, Jan Kara wrote: > > > On Mon 08-12-14 17:11:41, Josef Bacik wrote: > > > > Hello, > > > > > > > > We have been doing pretty well at populating xfstests with loads of > > > > tests to catch regressions and validate we're all working properly. > > > > One thing that has been lacking is a good way to verify file system > > > > integrity after a power fail. This is a core part of what file > > > > systems are supposed to provide but it is probably the least tested > > > > aspect. We have dm-flakey tests in xfstests to test fsync > > > > correctness, but these tests do not catch the random horrible things > > > > that can go wrong. We are still finding horrible scary things that > > > > go wrong in Btrfs because it is simply hard to reproduce and test > > > > for. > > > > > > > > I have been working on an idea to do this better, some may have seen > > > > my dm-power-fail attempt, and I've got a new incarnation of the idea > > > > thanks to discussions with Zach Brown. Obviously there will be a > > > > lot changing in this area in the time between now and March but it > > > > would be good to have everybody in the room talking about what they > > > > would need to build a good and deterministic test to make sure we're > > > > always giving a consistent file system and to make sure our fsync() > > > > handling is working properly. Thanks, > > > I agree we are lacking in testing this aspect. Just I don't see too much > > > material for discussion there, unless we have something more tangible - > > > when we have some implementation, we can talk about pros and cons of it, > > > what still needs doing etc. > > > > > > > Right that's what I was getting at. I have a solution and have sent it around > > but there doesn't seem to be too many people interested in commenting on it. > > I figure one of two things will happen > > > > 1) My solution will go in before LSF, in which case YAY my job is done and > > this is more of an [ATTEND] than a [TOPIC], or > > > > 2) My solution hasn't gone in yet and I'd like to discuss my methodology and > > how we can integrate it into xfstests, future features, other areas we could > > test etc. > > > > Maybe not a full blown slot but combined with a overall testing slot or hell > > just a quick lightening talk. Thanks, > > I have a related topic that may make sense to fit into any discussion > about this. Twice recently we've run into trouble using newish or less > common (combinations of) syscalls. > > The first instance was with the use of sync_file_range to try to > control/limit the amount of dirty data in the page cache. This, possibly > in combination with posix_fadvise(DONTNEED), managed to break the > writeback sequence in XFS and led to data corruption after power loss. > Was there a report or any other details on this one? In particular, I'm wondering if this is related to the problem exposed by xfstests test xfs/053... Brian > The other issue we saw was just a general raft of FIEMAP bugs over the > last year or two. We saw cases where even after fsync a fiemap result > would not include all extents, and (not unexpectedly) lots of corner cases > in several file systems, e.g., around partial blocks at end of file. (As > far as I know everything we saw is resolved in current kernels.) > > I'm not so concerned with these specific bugs, but worried that we > (perhaps naively) expected them to be pretty safe. Perhaps for FIEMAP > this is a general case where a newish syscall/ioctl should be tested > carefully with our workloads before being relied upon, and we could have > worked to make sure e.g. xfstests has appropriate tests. For power fail > testing in particular, though, right now it isn't clear who is testing > what under what workloads, so the only really "safe" approach is to stick > to whatever syscall combinations we think the rest of the world is using, > or make sure we test ourselves. > > As things stand now the other devs are loathe to touch any remotely exotic > fs call, but that hardly seems ideal. Hopefully a common framework for > powerfail testing can improve on this. Perhaps there are other ways we > make it easier to tell what is (well) tested, and conversely ensure that > those tests are well-aligned with what real users are doing... > > sage > -- > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html