Re: [RFC 4/5] ext4: add fs freezing support on suspend/hibernation

Dave Chinner <david@xxxxxxxxxxxxx> · Wed, 4 Oct 2017 18:05:23 +1100

On Tue, Oct 03, 2017 at 09:42:04PM -0400, Theodore Ts'o wrote:
> On Tue, Oct 03, 2017 at 10:13:21PM +0200, Luis R. Rodriguez wrote:
> > > After we remove add the NEEDS_RECOVERY flag, we need to make sure
> > > recovery flag is pushed out to disk before any other changes are
> > > allowed to be pushed out to disk.  That's why we originally did the
> > > update synchronously.
> > 
> > I see. I had to try to dig through linux-history to see why this was done, and
> > I actually could not find an exact commit which explained what you just did.
> > Thanks!
> > 
> > Hrm, so on freeze we issue the same commit as well, so is a sync *really* needed
> > on thaw?
> 
> So let me explain what's going on.  When we do a freeze, we make sure
> all of the blocks that are written to the journal are writen to the
> final location on disk, and the journal is is truncated.  (This is
> called a "journal checkpoint".)  Then we clear the NEEDS RECOVERY
> feature flag and set the EXT4_VALID_FS flags in the superblock.  In
> the no journal case, we flush out all metadata writes, and we set the
> EXT4_VALID_FS flag.  In both cases, we call ext4_commit_super(sb, 1)
> to make sure the flags update is pushed out to disk.  This puts the
> file system into a "quiscent" state; in fact, it looks like the file
> system has been unmounted, so that it becomes safe to run
> dump/restore, run fsck -n on the file system, etc.
> 
> The problem on the thaw side is that we need to make sure that
> EXT4_VALID_FS flag is removed (and if journaling is enabled, the NEEDS
> RECOVERY feature flag is set) and the superblock is flushed out to the
> storage device before any other writes are persisted on the disk.  In
> the case where we have journalling enabled, we could wait until the
> first journal commit to write out the superblock, since in journal
> mode all metadata updates to the disk are suppressed until the journal
> commit.  We don't do that today, but it is a change we could make.
> 
> However, in the no journal node we need to make sure the EXT4_VALID_FS
> flag is cleared on disk before any other metadata operations can take
> place, and calling ext4_commit_super(sb, 1) is the only real way to do
> that.
> 
> > No, it was am empirical evaluation done at testing, I observed bio_submit()
> > stalls, we never get completion from the device. Even if we call thaw at the
> > very end of resume, after the devices should be up, we still end up in the same
> > situation. Given how I order freezing filesystems after freezing userspace I do
> > believe we should thaw filesystems prior unfreezing userspace, its why I placed
> > the call where it is now.
> 
> So when we call ext4_commit_super(sb, 1), we issue the bio, and then
> we block waiting for the I/O to complete.  That's a blocking call.  Is
> the thaw context one which allows blocking?

thaw_super() does down_write(), so it *must* be able to sleep.

> If userspace is still
> frozen, does that imply that the scheduler isn't allow to run?  If
> that's the case, then that's probably the problem.
> 
> More generally, if the thawing code needs to allocate memory, or do

Which is does. xfs_fs_unfreeze() queues work to a workqueue, so
there's memory allocation there.

> any number of things that could potentially block, this could
> potentially be an issue.

yup, gfs2_unfreeze does even more stuff - it releases glocks which
may end up queuing work, waking other threads and freeing stuff via
call_rcu()...

Basically, before thawing filesystems the rest of the kernel
infrastructure needs to have been restarted. i.e. the order
needs to be:

freeze userspace
freeze filesystems
freeze kernel threads
freeze workqueues

thaw workqueues
thaw kernel threads
thaw filesystems
thaw userspace

and it should end up that way.

> Or if we have a network block device, or
> something else in the storage stack that needs to run a kernel thread
> context (or a workqueue, etc.) --- is the fact that userspace is
> frozen mean the scheduler is going to refuse to schedule()?

No.

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html