Re: [Lsf-pc] [LSF/MM TOPIC] Working towards better power fail testing

Jan Kara <jack@xxxxxxx> · Wed, 7 Jan 2015 11:10:15 +0100



On Wed 07-01-15 09:07:06, Dave Chinner wrote:
> On Tue, Jan 06, 2015 at 09:53:47AM +0100, Jan Kara wrote:
> > On Tue 06-01-15 08:47:55, Dave Chinner wrote:
> > > > As things stand now the other devs are loathe to touch any remotely exotic 
> > > > fs call, but that hardly seems ideal.  Hopefully a common framework for 
> > > > powerfail testing can improve on this.  Perhaps there are other ways we 
> > > > make it easier to tell what is (well) tested, and conversely ensure that 
> > > > those tests are well-aligned with what real users are doing...
> > > 
> > > We don't actually need power failure (or even device failure)
> > > infrastructure to test data integrity on failure. Filesystems just
> > > need a shutdown method that stops any IO from being issued once the
> > > shutdown flag is set. XFS has this and it's used by xfstests via the
> > > "godown" utility to shut the fileystem down in various
> > > circumstances. We've been using this for data integrity and log
> > > recovery testing in xfstests for many years.
> > > 
> > > Hence we know if the device behaves correctly w.r.t cache flushes
> > > and FUA then the filesystem will behave correctly on power loss. We
> > > don't need a device power fail simulator to tell us violating
> > > fundamental architectural assumptions will corrupt filesystems....
> >   I think that fs ioctl cannot easily simulate the situation where
> > on-device volatile caches aren't properly flushed in all the necessary
> > cases (we had a bugs like this in ext3/4 in the past which were hit by real
> > users).
> 
> Sure, I'm not arguing that it does. I'm suggesting that it's the
> wrong place to be focussing effort on initially as it assumes the
> filesystem behaves correctly on simple device failures.  i.e. if
> filesystems fail to do the right thing on a block device that isn't
> lossy, then we've got big problems to solve before we even consider
> random "volatile cache blocks went missing" corruption and recovery
> issues.
> 
> i.e. what we need to focus on first is "failure paths are exercised
> and work reliably". When we have decent coverage of that for most
> filesystems (and we sure as hell don't for btrfs and ext4), then we
> can focus on "in this corner case of broken/lying hardware..."
> 
> > I also think that simulating the device failure in a different layer is
> > simpler than checking for superblock flag in all the places where the
> > filesystem submits IO (e.g. ext4 doesn't have dedicated buffer layer like
> > xfs has and we rely on flusher thread to flush committed metadata to final
> 
> flusher threads call back into the filesystems to write both data
> and metadata, so I don't think that's an issue. And there's
> realtively few places you'd need to add a flag support to (ie.
> wrappers around submit_bh and submit_bio in the relavent layers)
> and that would trap all IO.
  Well, they don't for ext4. Ext4 metadata is backed by block device
mapping. That mapping is written back using generic_writepages() which ends
up calling blkdev_writepage(), which just calls block_write_full_page()
with blkdev_get_block() handler. The bad thing is that at that point, we
don't have the context to decide which filesystem that writeback is coming
from since the only inode we have is the block device inode belonging to
block device superblock. So I don't see an easy way how to solve this
problem for ext4.

> Don't get fooled by the fact that XFS has lots of shutdown traps;
> there really are only three shutdown traps that prevent IO - one in
> xfs_buf_submit() for metadata IO, one in xfs_map_blocks() during
> ->writepage for data IO, and one in xlog_bdstrat() for log IO.
> 
> All the other shutdown traps are for aborting operations that may
> not reach the IO layer (as many operations will hit cached objects)
> or will fail later when the inevitable IO is done (e.g. on
> transaction commit). Hence shutdown traps get us fast, reliable
> responses to userspace when fatal corruption errors occur, and in
> doing so they also provide hooks for testing error paths in ways
> that otherwise are very difficult to exercise.
  Ext4 detects whether fs is shutdown in some cases as well and bails out
early - by checking whether the journal is aborted (is_journal_aborted()
checks). So it for example doesn't start any new transaction when the fs is
shutdown. It is easy to add an ext4 ioctl() which will abort the journal
and that will test the error paths we have. It's just that it will be a
very different test from a situation when the device goes away, power
fails, or similar cases. For verifying those cases having a target which
just starts returning EIO for any submitted IO is much easier for ext4.

								Honza
-- 
Jan Kara <jack@xxxxxxx>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html