On Mon, Jan 05, 2015 at 11:13:28AM -0800, Sage Weil wrote: > On Mon, 5 Jan 2015, Brian Foster wrote: > > On Mon, Jan 05, 2015 at 10:34:57AM -0800, Sage Weil wrote: > > > On Wed, 10 Dec 2014, Josef Bacik wrote: > > > > On 12/10/2014 06:27 AM, Jan Kara wrote: > > > > > On Mon 08-12-14 17:11:41, Josef Bacik wrote: > > > > > > Hello, > > > > > > > > > > > > We have been doing pretty well at populating xfstests with loads of > > > > > > tests to catch regressions and validate we're all working properly. > > > > > > One thing that has been lacking is a good way to verify file system > > > > > > integrity after a power fail. This is a core part of what file > > > > > > systems are supposed to provide but it is probably the least tested > > > > > > aspect. We have dm-flakey tests in xfstests to test fsync > > > > > > correctness, but these tests do not catch the random horrible things > > > > > > that can go wrong. We are still finding horrible scary things that > > > > > > go wrong in Btrfs because it is simply hard to reproduce and test > > > > > > for. > > > > > > > > > > > > I have been working on an idea to do this better, some may have seen > > > > > > my dm-power-fail attempt, and I've got a new incarnation of the idea > > > > > > thanks to discussions with Zach Brown. Obviously there will be a > > > > > > lot changing in this area in the time between now and March but it > > > > > > would be good to have everybody in the room talking about what they > > > > > > would need to build a good and deterministic test to make sure we're > > > > > > always giving a consistent file system and to make sure our fsync() > > > > > > handling is working properly. Thanks, > > > > > I agree we are lacking in testing this aspect. Just I don't see too much > > > > > material for discussion there, unless we have something more tangible - > > > > > when we have some implementation, we can talk about pros and cons of it, > > > > > what still needs doing etc. > > > > > > > > > > > > > Right that's what I was getting at. I have a solution and have sent it around > > > > but there doesn't seem to be too many people interested in commenting on it. > > > > I figure one of two things will happen > > > > > > > > 1) My solution will go in before LSF, in which case YAY my job is done and > > > > this is more of an [ATTEND] than a [TOPIC], or > > > > > > > > 2) My solution hasn't gone in yet and I'd like to discuss my methodology and > > > > how we can integrate it into xfstests, future features, other areas we could > > > > test etc. > > > > > > > > Maybe not a full blown slot but combined with a overall testing slot or hell > > > > just a quick lightening talk. Thanks, > > > > > > I have a related topic that may make sense to fit into any discussion > > > about this. Twice recently we've run into trouble using newish or less > > > common (combinations of) syscalls. > > > > > > The first instance was with the use of sync_file_range to try to > > > control/limit the amount of dirty data in the page cache. This, possibly > > > in combination with posix_fadvise(DONTNEED), managed to break the > > > writeback sequence in XFS and led to data corruption after power loss. > > > > > > > Was there a report or any other details on this one? In particular, I'm > > wondering if this is related to the problem exposed by xfstests test > > xfs/053... > > This is the original thread: > > http://oss.sgi.com/archives/xfs/2013-06/msg00066.html > Thanks. It does look similar to xfs/053, the intent of which was to indirectly create the kind of writeback pattern that exposes this. > Looks like 053 is about ACLs though? > generic/053 does something with ACLs, xfs/053 is the test of interest. Regardless, from the thread above it sounds like Dave had honed in on the cause. Brian > sage -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html