Re: [Lsf-pc] [LSF/MM TOPIC] Working towards better power fail testing

Dave Chinner <david@xxxxxxxxxxxxx> · Tue, 6 Jan 2015 08:47:55 +1100

On Mon, Jan 05, 2015 at 10:34:57AM -0800, Sage Weil wrote:
> On Wed, 10 Dec 2014, Josef Bacik wrote:
> > On 12/10/2014 06:27 AM, Jan Kara wrote:
> The first instance was with the use of sync_file_range to try to 
> control/limit the amount of dirty data in the page cache.  This, possibly 
> in combination with posix_fadvise(DONTNEED), managed to break the 
> writeback sequence in XFS and led to data corruption after power loss.

Corruption after power loss is not brilliant behaviour from XFS
here, but I'll point out for the wider audience that
sync_file_range() provides absolutely no data integrity guarantees
for power loss situations. It's really, really badly named because
it doesn't give the same guarantees as other "sync" functions
filesystems provide. IOWs, if you value your data, the only
interface you can rely for data integrity is fsync/fdatasync...

> The other issue we saw was just a general raft of FIEMAP bugs over the 
> last year or two. We saw cases where even after fsync a fiemap result 
> would not include all extents, and (not unexpectedly) lots of corner cases 
> in several file systems, e.g., around partial blocks at end of file.  (As 
> far as I know everything we saw is resolved in current kernels.)

Again, this is probably more a misunderstanding of FIEMAP than
anything. FIEMAP is *advisory* and gives no output accuracy
guarantees as userspace cannot prevent the extent maps from changing
at any time. As an example, see the aborted attempt by the 'cp'
utility to use FIEMAP to detect holes when copying sparse files....

> I'm not so concerned with these specific bugs, but worried that we 
> (perhaps naively) expected them to be pretty safe.  Perhaps for FIEMAP 
> this is a general case where a newish syscall/ioctl should be tested 
> carefully with our workloads before being relied upon, and we could have 
> worked to make sure e.g. xfstests has appropriate tests. 

Oh, it does - that's why it mostly works now across all filesystems
that are regularly tested with xfstests.

> For power fail 
> testing in particular, though, right now it isn't clear who is testing 
> what under what workloads, so the only really "safe" approach is to stick 
> to whatever syscall combinations we think the rest of the world is using, 
> or make sure we test ourselves.

Write tests for the regression test suite that filesystem developers
run all the time. ;)

> As things stand now the other devs are loathe to touch any remotely exotic 
> fs call, but that hardly seems ideal.  Hopefully a common framework for 
> powerfail testing can improve on this.  Perhaps there are other ways we 
> make it easier to tell what is (well) tested, and conversely ensure that 
> those tests are well-aligned with what real users are doing...

We don't actually need power failure (or even device failure)
infrastructure to test data integrity on failure. Filesystems just
need a shutdown method that stops any IO from being issued once the
shutdown flag is set. XFS has this and it's used by xfstests via the
"godown" utility to shut the fileystem down in various
circumstances. We've been using this for data integrity and log
recovery testing in xfstests for many years.

Hence we know if the device behaves correctly w.r.t cache flushes
and FUA then the filesystem will behave correctly on power loss. We
don't need a device power fail simulator to tell us violating
fundamental architectural assumptions will corrupt filesystems....

Unfortunately, nobody else seems to want to implement shutdown
traps, even though it massively improves reliability as it
results in extensive error path testing that can't otherwise be
easily exercised...

So, from an xfstests persepctive, I'd much prefer to see
XFS_IOC_GOINGDOWN implemented by other filesystems and have the
tests that use it made generic first. Filesystems need to handle
themselves correctly in simple error conditions before we even start
to consider creating esoteric failure conditions...

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html