nfsd: another possible delegation race

"NeilBrown" <neilb@xxxxxxx> · Tue, 04 Oct 2022 16:14:47 +1100

Hi,
 I have a customer who experienced a crash in nfsd which appears to be
 related to delegation return.  I cannot completely rule-out
  Commit 548ec0805c39 ("nfsd: fix use-after-free due to delegation race")
 as the kernel being used didn't have that commit, but the symptoms are
 quite different, and while exploring I found, I think, a different
 race.  This new race doesn't obviously address all the symptoms, but
 maybe it does...

 The symptoms were:
  1/   WARN_ON(!unhash_delegation_locked(dp));
    in nfs4_laundromat complained (delegation wasn't hashed!)
  2/   refcount_t: saturated; leaking memory
    This came from the refcount_inc in revoke_delegation() called from
    nfs4_laundromat(), a few lines below the above warning
  3/ BUG: kernel NULL pointer dereference, address: 0000000000000028
     This is from the destroy_unhashed_deleg() call at the end of
     that same revoke_delegation() call, which calls
     nfs4_unlock_deleg_lease() and passes fp->fi_deleg_file, which is
     NULL (!!!), to vfs_setlease().
  These three happened in a 200usec window.

 What I imagine might be happening is that the nfsd_break_deleg_cb()
 callback is called after destroy_delegation() has unhashed the deleg,
 but before destroy_unhashed_delegation() gets called.

 If nfsd_break_deleg_cb() is called before the unhash - and particularly
 if nfsd_break_one_deleg()->nfsd4_run_cb() is called before, then the
 unhash will disconnect the delegation from the recall list, and no
 harm can be done.
 Once vfs_setlease(F_UNLCK) is called, the callback can no longer be
 called, so again no harm is possible.

 Between these two is a race window.  The delegation can be put on the
 recall list, but the deleg will be unhashed and put_deleg_file() will
 have set fi_deleg_file to NULL - resulting in first WARNING and the
 BUG.

 I cannot see how the refcount_t warning can be generated ...  so maybe
 I've missed something.

 My proposed solution is to test delegation_hashed() while holding
 fp->fi_lock in nfsd_break_deleg_cb().  If the delegation is locked, we
 can safely schedule the recall.  If it isn't, then the lease is about
 to be unlocked and there is no need to schedule anything.

 I don't know this code at all well, so I thought it safest to ask for
 comments before posting a proper patch.
 I'm particularly curious to know if anyone has ideas about the refcount
 overflow.  Corruption is unlikely as the deleg looked mostly OK and the
 memory has been freed, but not reallocated (though it is possible it
 was freed, reallocated, and freed again).
 This wasn't a refcount_inc on a zero refcount - that gives a different
 error.  I don't know what the refcount value was, it has already been
 changed to the 'saturated' value - 0xc0000000.

Thanks,
NeilBrown

diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index c5d199d7e6b4..e02d638df6be 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -4822,8 +4822,10 @@ nfsd_break_deleg_cb(struct file_lock *fl)
 	fl->fl_break_time = 0;
 
 	spin_lock(&fp->fi_lock);
-	fp->fi_had_conflict = true;
-	nfsd_break_one_deleg(dp);
+	if (delegation_hashed(dp)) {
+		fp->fi_had_conflict = true;
+		nfsd_break_one_deleg(dp);
+	}
 	spin_unlock(&fp->fi_lock);
 	return ret;
 }