Re: [PATCH] fs / ext3: Always unlock updates in ext3_freeze()

Dave Chinner <david@xxxxxxxxxxxxx> · Tue, 16 Aug 2011 10:09:16 +1000

On Mon, Aug 15, 2011 at 10:58:07PM +0200, Jan Kara wrote:
>   Hello,
> 
> On Mon 15-08-11 20:09:13, Rafael J. Wysocki wrote:
> > On Monday, August 15, 2011, Jan Kara wrote:
> > >   BTW,  filesystem freezing never really worked for mmaped writes under
> > > ext3 - ext3 would have to implement page_mkwrite() callback for that - so
> > > if you want to rely on it for suspending, this will be non-trivial.
> > 
> > At this point the purpose of freezing filesystems is basically to
> > prevent XFS from deadlocking with hibernation's memory preallocation.
> > For other filesystems it may or may not make a difference depending on
> > their implementation of freeze/unfreeze_super().
>   What's exactly the problem? Memory preallocation enters direct reclaim
> and that deadlocks in the filesystem?

Well, that's one possible manifestation. The problem is that the
current hibernate code still assumes that sys_sync() results in an
idle filesystem that will not change after the call if nothing is
dirty.

The result is that when the large memory allocation occurs for the
hibernate image (after the sys_sync() call) then the shrink_slab()
tends to be called. The XFS shrinkers are capable of dirtying inodes
and the backing buffers of inodes that are in the reclaimable state.
But those buffers cannot be flushed to disk because hibernate has
already frozen the xfsbufd threads, so the shrinker doing inode
reclaim hangs up on locks waiting for the buffers to be written.
This either leads to deadlock or hibernate image allocation failure.

Far worse, IMO, is the case where is -doesn't- deadlock, because the
filesystem state can still changing after the allocation has
finished due to async metadata IO completions. That has the
potential to cause filesystem corruption as after resume the on-disk
state may not match what is written from memory to the hibernate
image.

The problem really isn't XFS specific, nor is it new - the fact is
that any filesystem that has registered a shrinker or can do async
work in the background post-sync is vulnerable to this problem. It's
just that XFS is the filesystem that usually exposes such issues, so
it gets blamed for causing the problem....

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html