Re: [PATCH 2/3] fs: wait for partially frozen filesystems

"Darrick J. Wong" <djwong@xxxxxxxxxx> · Mon, 12 Jun 2023 11:36:57 -0700

On Mon, Jun 12, 2023 at 01:35:26PM +0200, Jan Kara wrote:
> On Sun 11-06-23 20:15:28, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@xxxxxxxxxx>
> > 
> > Jan Kara suggested that when one thread is in the middle of freezing a
> > filesystem, another thread trying to freeze the same fs but with a
> > different freeze_holder should wait until the freezer reaches either end
> > state (UNFROZEN or COMPLETE) instead of returning EBUSY immediately.
> > 
> > Plumb in the extra coded needed to wait for the fs freezer to reach an
> > end state and try the freeze again.
> > 
> > Signed-off-by: Darrick J. Wong <djwong@xxxxxxxxxx>
> 
> You can maybe copy my reasoning from my reply to your patch 1/3 to the
> changelog to explain why waiting is more desirable.

Done.

"Neither caller can do anything sensible with this race other than retry
but they cannot really distinguish EBUSY as in "someone other holder of
the same type has the sb already frozen" from "freezing raced with
holder of a different type"

is now added to the commit message.

> > @@ -1690,11 +1699,13 @@ static int freeze_frozen_super(struct super_block *sb, enum freeze_holder who)
> >   */
> >  int freeze_super(struct super_block *sb, enum freeze_holder who)
> >  {
> > +	bool try_again = true;
> >  	int ret;
> >  
> >  	atomic_inc(&sb->s_active);
> >  	down_write(&sb->s_umount);
> >  
> > +retry:
> >  	if (sb->s_writers.frozen == SB_FREEZE_COMPLETE) {
> >  		ret = freeze_frozen_super(sb, who);
> >  		deactivate_locked_super(sb);
> > @@ -1702,8 +1713,14 @@ int freeze_super(struct super_block *sb, enum freeze_holder who)
> >  	}
> >  
> >  	if (sb->s_writers.frozen != SB_UNFROZEN) {
> > -		deactivate_locked_super(sb);
> > -		return -EBUSY;
> > +		if (!try_again) {
> > +			deactivate_locked_super(sb);
> > +			return -EBUSY;
> > +		}
> 
> Why only one retry? Do you envision someone will be freezing & thawing
> filesystem so intensively that livelocks are possible?

I was worried about that, but the more I think about it, the more I can
convince myself that we can take the simpler approach of cycling
s_umount until we get the state we want.

> In that case I'd
> think the system has other problems than freeze taking long... Honestly,
> I'd be looping as long as we either succeed or return error for other
> reasons.

Ok.

> What we could be doing to limit unnecessary waiting is that we'd update
> freeze_holders already when we enter freeze_super() and lock s_umount
> (bailing if our holder type is already set). That way we'd have at most one
> process for each holder type freezing the fs / waiting for freezing to
> complete.

<shrug> I don't know how often we even really have threads contending
for s_umount and elevated freeze state.  How about we go with the
simpler wait_for_partially_frozen and see if complaints crop up?

> > +
> > +		wait_for_partially_frozen(sb);
> > +		try_again = false;
> > +		goto retry;
> >  	}
> >  
> ...
> > @@ -1810,6 +1831,7 @@ static int thaw_super_locked(struct super_block *sb, enum freeze_holder who)
> >  	if (sb_rdonly(sb)) {
> >  		sb->s_writers.freeze_holders &= ~who;
> >  		sb->s_writers.frozen = SB_UNFROZEN;
> > +		wake_up_var(&sb->s_writers.frozen);
> >  		goto out;
> >  	}
> >  
> > @@ -1828,6 +1850,7 @@ static int thaw_super_locked(struct super_block *sb, enum freeze_holder who)
> >  
> >  	sb->s_writers.freeze_holders &= ~who;
> >  	sb->s_writers.frozen = SB_UNFROZEN;
> > +	wake_up_var(&sb->s_writers.frozen);
> >  	sb_freeze_unlock(sb, SB_FREEZE_FS);
> >  out:
> >  	wake_up(&sb->s_writers.wait_unfrozen);
> 
> I don't think wakeups on thaw are really needed because the waiter should
> be woken already once the freezing task exits from freeze_super(). I guess
> you've sprinkled them here just for good measure?

I've changed wait_for_partially_frozen to cycle s_umount until we find
the locked state we want, so the wake_up_var calls aren't needed
anymore.

--D

> 								Honza
> -- 
> Jan Kara <jack@xxxxxxxx>
> SUSE Labs, CR