Re: [Lsf-pc] [LSF/MM TOPIC] Working towards better power fail testing

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 10 Dec 2014, Josef Bacik wrote:
> On 12/10/2014 06:27 AM, Jan Kara wrote:
> > On Mon 08-12-14 17:11:41, Josef Bacik wrote:
> > > Hello,
> > > 
> > > We have been doing pretty well at populating xfstests with loads of
> > > tests to catch regressions and validate we're all working properly.
> > > One thing that has been lacking is a good way to verify file system
> > > integrity after a power fail.  This is a core part of what file
> > > systems are supposed to provide but it is probably the least tested
> > > aspect.  We have dm-flakey tests in xfstests to test fsync
> > > correctness, but these tests do not catch the random horrible things
> > > that can go wrong.  We are still finding horrible scary things that
> > > go wrong in Btrfs because it is simply hard to reproduce and test
> > > for.
> > > 
> > > I have been working on an idea to do this better, some may have seen
> > > my dm-power-fail attempt, and I've got a new incarnation of the idea
> > > thanks to discussions with Zach Brown.  Obviously there will be a
> > > lot changing in this area in the time between now and March but it
> > > would be good to have everybody in the room talking about what they
> > > would need to build a good and deterministic test to make sure we're
> > > always giving a consistent file system and to make sure our fsync()
> > > handling is working properly.  Thanks,
> >    I agree we are lacking in testing this aspect. Just I don't see too much
> > material for discussion there, unless we have something more tangible -
> > when we have some implementation, we can talk about pros and cons of it,
> > what still needs doing etc.
> > 
> 
> Right that's what I was getting at.  I have a solution and have sent it around
> but there doesn't seem to be too many people interested in commenting on it.
> I figure one of two things will happen
> 
> 1) My solution will go in before LSF, in which case YAY my job is done and
> this is more of an [ATTEND] than a [TOPIC], or
> 
> 2) My solution hasn't gone in yet and I'd like to discuss my methodology and
> how we can integrate it into xfstests, future features, other areas we could
> test etc.
> 
> Maybe not a full blown slot but combined with a overall testing slot or hell
> just a quick lightening talk.  Thanks,

I have a related topic that may make sense to fit into any discussion 
about this. Twice recently we've run into trouble using newish or less 
common (combinations of) syscalls.

The first instance was with the use of sync_file_range to try to 
control/limit the amount of dirty data in the page cache.  This, possibly 
in combination with posix_fadvise(DONTNEED), managed to break the 
writeback sequence in XFS and led to data corruption after power loss.

The other issue we saw was just a general raft of FIEMAP bugs over the 
last year or two. We saw cases where even after fsync a fiemap result 
would not include all extents, and (not unexpectedly) lots of corner cases 
in several file systems, e.g., around partial blocks at end of file.  (As 
far as I know everything we saw is resolved in current kernels.)

I'm not so concerned with these specific bugs, but worried that we 
(perhaps naively) expected them to be pretty safe.  Perhaps for FIEMAP 
this is a general case where a newish syscall/ioctl should be tested 
carefully with our workloads before being relied upon, and we could have 
worked to make sure e.g. xfstests has appropriate tests.  For power fail 
testing in particular, though, right now it isn't clear who is testing 
what under what workloads, so the only really "safe" approach is to stick 
to whatever syscall combinations we think the rest of the world is using, 
or make sure we test ourselves.

As things stand now the other devs are loathe to touch any remotely exotic 
fs call, but that hardly seems ideal.  Hopefully a common framework for 
powerfail testing can improve on this.  Perhaps there are other ways we 
make it easier to tell what is (well) tested, and conversely ensure that 
those tests are well-aligned with what real users are doing...

sage
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux