Re: [RFC v2] fs: add prepare_freeze/prepare_thaw fs hooks

"Benjamin Marzinski" <bmarzins@xxxxxxxxxx> · Mon, 29 Sep 2014 11:58:30 -0500

On Thu, Sep 25, 2014 at 11:42:24AM +1000, Dave Chinner wrote:
> On Wed, Sep 24, 2014 at 02:01:58PM -0500, Benjamin Marzinski wrote:
> > Currently, freezing a filesystem involves calling freeze_super, which locks
> > sb->s_umount and then calls the fs-specific freeze_fs hook. This makes it
> > hard for gfs2 (and potentially other cluster filesystems) to use the vfs
> > freezing code to do freezes on all the cluster nodes.
> > 
> > In order to communicate that a freeze has been requested, and to make sure
> > that only one node is trying to freeze at a time, gfs2 uses a glock
> > (sd_freeze_gl). The problem is that there is no hook for gfs2 to acquire
> > this lock before calling freeze_super. This means that two nodes can
> > attempt to freeze the filesystem by both calling freeze_super, acquiring
> > the sb->s_umount lock, and then attempting to grab the cluster glock
> > sd_freeze_gl. Only one will succeed, and the other will be stuck in
> > freeze_super, making it impossible to finish freezing the node.
> > 
> > To solve this problem, this patch adds the prepare_freeze prepare_thaw
> > hooks.  If a filesystem implements these hooks, they are called instead of
> > freeze_super and thaw_super. This means that every filesystem that
> > implements prepare_freeze/thaw must call freeze/thaw_super within that
> > function to make use of the vfs freezing code.
> 
> Why instead? The filesystem still have to call
> freeze_super/thaw_super() after "prepare".

I realize that the way I'm using them, prepare_* isn't the best name,
but at least for gfs2, it's much nicer to have all the nodes freezing
and thawing in the same bit of code.  This doesn't happen if the node
that gets the freeze call calls freeze_super in one place, and all the
other nodes call freeze_super in response the locking change. Ideally
it would work like:

1. freezing node grabs the freeze glock in prepare_freeze
2. All cluster nodes (including the node initiating the freeze)
	A. drop their cached freeze glock causing them to call
	   freeze_super
3. freezing node returns from prepare_freeze.

Things are also complicated on thaw.  The way I'd like that to work is

1. freezing node drops the freeze glock in prepare_thaw
2. All cluster nodes
	A. acquire the freeze glock in the shared state, disabling
	   freezing.
	B. call thaw_super to unfreeze the filesystem
	C. release the freeze glock, but keep it cached to
	   be able to respond to the freeze request

If the unfreezing node had to call thaw_super after prepare_thaw, you
would be in a situation where the node could be frozen again, while it
was trying to thaw. Again, this can be worked around by making these
functions work differently on the freezing node than the others, but
aside from the names, I don't see any problem about adding new hooks
that let the filesystem be in charge of when to call freeze/thaw_super
as long as they guarantee that it's called before the hook returns.

> > Signed-off-by: Benjamin Marzinski <bmarzins@xxxxxxxxxx>
> > ---
> >  fs/block_dev.c     | 10 ++++++++--
> >  fs/ioctl.c         |  6 +++++-
> >  include/linux/fs.h |  2 ++
> >  3 files changed, 15 insertions(+), 3 deletions(-)
> > 
> > diff --git a/fs/block_dev.c b/fs/block_dev.c
> > index 6d72746..f931412 100644
> > --- a/fs/block_dev.c
> > +++ b/fs/block_dev.c
> > @@ -245,7 +245,10 @@ struct super_block *freeze_bdev(struct block_device *bdev)
> >  	sb = get_active_super(bdev);
> >  	if (!sb)
> >  		goto out;
> > -	error = freeze_super(sb);
> > +	if (sb->s_op->prepare_freeze)
> > +		error = sb->s_op->prepare_freeze(sb);
> > +	else
> > +		error = freeze_super(sb);
> 
> I was proposing this callout to be like this:
> 
> 	if (sb->s_op->prepare_freeze)
> 		error = sb->s_op->prepare_freeze(sb);
> 	if (!error)
> 		error = freeze_super(sb);
> 	if (error) {
> 		....
> 
> i.e. prepare() does all the filesystem specific preparation,
> everything else goes through the normal freeze code.  The lack of
> gfs2 specific patches showing how gfs2 is going to use this
> interface is not helping here...

I'll post the gfs2 patch for this.

-Ben

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@xxxxxxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html