On Tue, Oct 03, 2017 at 09:42:04PM -0400, Theodore Ts'o wrote: > On Tue, Oct 03, 2017 at 10:13:21PM +0200, Luis R. Rodriguez wrote: > > > After we remove add the NEEDS_RECOVERY flag, we need to make sure > > > recovery flag is pushed out to disk before any other changes are > > > allowed to be pushed out to disk. That's why we originally did the > > > update synchronously. > > > > I see. I had to try to dig through linux-history to see why this was done, and > > I actually could not find an exact commit which explained what you just did. > > Thanks! > > > > Hrm, so on freeze we issue the same commit as well, so is a sync *really* needed > > on thaw? > > So let me explain what's going on. When we do a freeze, we make sure > all of the blocks that are written to the journal are writen to the > final location on disk, and the journal is is truncated. (This is > called a "journal checkpoint".) Then we clear the NEEDS RECOVERY > feature flag and set the EXT4_VALID_FS flags in the superblock. In > the no journal case, we flush out all metadata writes, and we set the > EXT4_VALID_FS flag. In both cases, we call ext4_commit_super(sb, 1) > to make sure the flags update is pushed out to disk. This puts the > file system into a "quiscent" state; in fact, it looks like the file > system has been unmounted, so that it becomes safe to run > dump/restore, run fsck -n on the file system, etc. > > The problem on the thaw side is that we need to make sure that > EXT4_VALID_FS flag is removed (and if journaling is enabled, the NEEDS > RECOVERY feature flag is set) and the superblock is flushed out to the > storage device before any other writes are persisted on the disk. In > the case where we have journalling enabled, we could wait until the > first journal commit to write out the superblock, since in journal > mode all metadata updates to the disk are suppressed until the journal > commit. We don't do that today, but it is a change we could make. > > However, in the no journal node we need to make sure the EXT4_VALID_FS > flag is cleared on disk before any other metadata operations can take > place, and calling ext4_commit_super(sb, 1) is the only real way to do > that. > > > No, it was am empirical evaluation done at testing, I observed bio_submit() > > stalls, we never get completion from the device. Even if we call thaw at the > > very end of resume, after the devices should be up, we still end up in the same > > situation. Given how I order freezing filesystems after freezing userspace I do > > believe we should thaw filesystems prior unfreezing userspace, its why I placed > > the call where it is now. > > So when we call ext4_commit_super(sb, 1), we issue the bio, and then > we block waiting for the I/O to complete. That's a blocking call. Is > the thaw context one which allows blocking? thaw_super() does down_write(), so it *must* be able to sleep. > If userspace is still > frozen, does that imply that the scheduler isn't allow to run? If > that's the case, then that's probably the problem. > > More generally, if the thawing code needs to allocate memory, or do Which is does. xfs_fs_unfreeze() queues work to a workqueue, so there's memory allocation there. > any number of things that could potentially block, this could > potentially be an issue. yup, gfs2_unfreeze does even more stuff - it releases glocks which may end up queuing work, waking other threads and freeing stuff via call_rcu()... Basically, before thawing filesystems the rest of the kernel infrastructure needs to have been restarted. i.e. the order needs to be: freeze userspace freeze filesystems freeze kernel threads freeze workqueues thaw workqueues thaw kernel threads thaw filesystems thaw userspace and it should end up that way. > Or if we have a network block device, or > something else in the storage stack that needs to run a kernel thread > context (or a workqueue, etc.) --- is the fact that userspace is > frozen mean the scheduler is going to refuse to schedule()? No. Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx