Re: [RFC 4/5] ext4: add fs freezing support on suspend/hibernation

Dave Chinner <david@xxxxxxxxxxxxx> · Thu, 5 Oct 2017 09:22:33 +1100

On Wed, Oct 04, 2017 at 12:48:39PM -0400, Theodore Ts'o wrote:
> On Wed, Oct 04, 2017 at 06:05:23PM +1100, Dave Chinner wrote:
> > Basically, before thawing filesystems the rest of the kernel
> > infrastructure needs to have been restarted. i.e. the order
> > needs to be:
> > 
> > freeze userspace
> > freeze filesystems
> > freeze kernel threads
> > freeze workqueues
> > 
> > thaw workqueues
> > thaw kernel threads
> > thaw filesystems
> > thaw userspace
> > 
> > and it should end up that way.
> > 
> > > Or if we have a network block device, or
> > > something else in the storage stack that needs to run a kernel thread
> > > context (or a workqueue, etc.) --- is the fact that userspace is
> > > frozen mean the scheduler is going to refuse to schedule()?
> > 
> > No.
> 
> Well, that's what the answer *should* be.  I was asking what this
> patch series does, and given that Luis reported that with this patch
> series ext4_commit_super(sb, 1) is hanging, I have my suspicions about
> what the answer might be with this patch set.  (Especially since the
> claimed goal of the patch set is, "kthread freezing with filesystem
> freeze/thaw".

There are know bugs in the patchset w.r.t.  workqueues and kernel
threads, and IO completion requires workqueues and kernel threads to
be running correctly. Hence filesystem thaw needs to occur after
workqueues and kernel threads are thawed.

The existing code has this assumption - that filesystem will start
working again the moment workqueues and kernel threads are thawed,
but trying to do IO before that will not work. So it's the same
with freeze/thaw of filesystems - thaw of filesystems will not work
if the rest of the kernel machinery is still frozen...

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx