Re: [PATCH] vfs: freeze filesystems just prior to reboot

Colin Walters <walters@xxxxxxxxxx> · Thu, 03 Aug 2017 16:24:50 -0400

Resurrecting this thread:

On Fri, May 19, 2017, at 03:01 PM, Darrick J. Wong wrote:
> On Fri, May 19, 2017 at 10:00:31AM -0400, Colin Walters wrote:
> > On Thu, May 18, 2017, at 08:20 PM, Darrick J. Wong wrote:
> > 
> > > Therefore, add a reboot hook to freeze all filesystems (which in general
> > > will induce ext4/xfs/btrfs to checkpoint the log) just prior to reboot.
> > > This is an unfortunate and insufficient workaround for multiple layers
> > > of inadequate external software, but at least it will reduce boot time
> > > surprises for the "OS updater failed to disengage the filesystem before
> > > rebooting" case.
> > 
> > As a maintainer of one of those userspace tools
> > (https://github.com/ostreedev/ostree), which I don't think is the one
> > in question here, but likely has the same issue - I'd like to have
> > some sort of API to fix this - maybe flush the journal *without*
> > remounting r/o?
> 
> The convention (at least among ext4 and xfs) is that fs freeze should be
> checkpointing the journal.

OK, so I finally implemented this:
https://github.com/ostreedev/ostree/pull/1049

I had to go to some awkward lengths to try to make this safe; everything
in libostree is designed to be "crash only" - we're an update system
that doesn't install a SIGINT/SIGTERM handler, we just let the kernel
kill us, and that should always be safe.  But if we're interrupted right after
we invoke FIFREEZE we'd leave the fs frozen.  

Any objections to something like an ioctl (fd, FIFREEZETHAW, 0) ?

I was thinking about this more though, and while this obviously helps,
it's still just narrowing a window; if we have a system crash after
writing the config but before we've done a freeze-thaw, we still
have the journaled data problem.

in the end probably the real fix is probably something like storing
multiple copies of the bootloader config with checksums that grub
can verify.  Basically teach grub to try really hard to extract known-good
data from the FS.  For file-level consistency that'd be pretty easy,
we could have e.g. 
/boot/efi/grub.cfg
/boot/efi/grub.cfg.checksum (sha256 of grub.cfg)
/boot/efi/grub.cfg.orig
/boot/efi/grub.cfg.orig.checksum (sha256 of grub.cfg.orig)
etc.

But what I don't know offhand without diving a lot more into XFS
internals is how resilient such a scheme would be against the
outstanding journal writes for the directory.  (Maybe it's more
resilient to use separate /boot/efi/grub-new and /boot/efi/grub-old
dirs?)
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html