Re: nfsd: another possible delegation race

Jeff Layton <jlayton@xxxxxxxxxx> · Tue, 04 Oct 2022 07:19:49 -0400

On Tue, 2022-10-04 at 16:14 +1100, NeilBrown wrote:
> Hi,
>  I have a customer who experienced a crash in nfsd which appears to be
>  related to delegation return.  I cannot completely rule-out
>   Commit 548ec0805c39 ("nfsd: fix use-after-free due to delegation race")
>  as the kernel being used didn't have that commit, but the symptoms are
>  quite different, and while exploring I found, I think, a different
>  race.  This new race doesn't obviously address all the symptoms, but
>  maybe it does...
> 
>  The symptoms were:
>   1/   WARN_ON(!unhash_delegation_locked(dp));
>     in nfs4_laundromat complained (delegation wasn't hashed!)
>   2/   refcount_t: saturated; leaking memory
>     This came from the refcount_inc in revoke_delegation() called from
>     nfs4_laundromat(), a few lines below the above warning

Well, that is odd! Chuck has caught this a couple of times:

    https://bugzilla.linux-nfs.org/show_bug.cgi?id=394

...but that's an underflow.

>   3/ BUG: kernel NULL pointer dereference, address: 0000000000000028
>      This is from the destroy_unhashed_deleg() call at the end of
>      that same revoke_delegation() call, which calls
>      nfs4_unlock_deleg_lease() and passes fp->fi_deleg_file, which is
>      NULL (!!!), to vfs_setlease().
>   These three happened in a 200usec window.
> 
>  What I imagine might be happening is that the nfsd_break_deleg_cb()
>  callback is called after destroy_delegation() has unhashed the deleg,
>  but before destroy_unhashed_delegation() gets called.
> 

Ok, so a DELEGRETURN is racing with a lease break?

>  If nfsd_break_deleg_cb() is called before the unhash - and particularly
>  if nfsd_break_one_deleg()->nfsd4_run_cb() is called before, then the
>  unhash will disconnect the delegation from the recall list, and no
>  harm can be done.
>  Once vfs_setlease(F_UNLCK) is called, the callback can no longer be
>  called, so again no harm is possible.
> 
>  Between these two is a race window.  The delegation can be put on the
>  recall list, but the deleg will be unhashed and put_deleg_file() will
>  have set fi_deleg_file to NULL - resulting in first WARNING and the
>  BUG.
> 
>  I cannot see how the refcount_t warning can be generated ...  so maybe
>  I've missed something.
> 
>  My proposed solution is to test delegation_hashed() while holding
>  fp->fi_lock in nfsd_break_deleg_cb().  If the delegation is locked, we
>  can safely schedule the recall.  If it isn't, then the lease is about
>  to be unlocked and there is no need to schedule anything.
> 

I think you mean "If the delegation is hashed, we can safely schedule
the recall."

That sounds like it might be the right approach. Once we've unhashed the
stateid, I think we can safely assume that it's on its way out the door
and that we don't need to issue a recall.

>  I don't know this code at all well, so I thought it safest to ask for
>  comments before posting a proper patch.
>  I'm particularly curious to know if anyone has ideas about the refcount
>  overflow.  Corruption is unlikely as the deleg looked mostly OK and the
>  memory has been freed, but not reallocated (though it is possible it
>  was freed, reallocated, and freed again).
>  This wasn't a refcount_inc on a zero refcount - that gives a different
>  error.  I don't know what the refcount value was, it has already been
>  changed to the 'saturated' value - 0xc0000000.
> 
> 

That would be this, I think:

        else if (unlikely(old < 0 || old + i < 0))
                refcount_warn_saturate(r, REFCOUNT_ADD_OVF);

So the old or new value was < 0?

No idea how you get there though. I would think if we were leaking
delegations to that degree that we'd see leaked memory warnings when
shutting down nfsd.

> 
> 
> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> index c5d199d7e6b4..e02d638df6be 100644
> --- a/fs/nfsd/nfs4state.c
> +++ b/fs/nfsd/nfs4state.c
> @@ -4822,8 +4822,10 @@ nfsd_break_deleg_cb(struct file_lock *fl)
>  	fl->fl_break_time = 0;
>  
>  	spin_lock(&fp->fi_lock);
> -	fp->fi_had_conflict = true;
> -	nfsd_break_one_deleg(dp);
> +	if (delegation_hashed(dp)) {
> +		fp->fi_had_conflict = true;
> +		nfsd_break_one_deleg(dp);
> +	}
>  	spin_unlock(&fp->fi_lock);
>  	return ret;
>  }
> 
> 

This looks reasonable to me.
-- 
Jeff Layton <jlayton@xxxxxxxxxx>