Hi, I have a customer who experienced a crash in nfsd which appears to be related to delegation return. I cannot completely rule-out Commit 548ec0805c39 ("nfsd: fix use-after-free due to delegation race") as the kernel being used didn't have that commit, but the symptoms are quite different, and while exploring I found, I think, a different race. This new race doesn't obviously address all the symptoms, but maybe it does... The symptoms were: 1/ WARN_ON(!unhash_delegation_locked(dp)); in nfs4_laundromat complained (delegation wasn't hashed!) 2/ refcount_t: saturated; leaking memory This came from the refcount_inc in revoke_delegation() called from nfs4_laundromat(), a few lines below the above warning 3/ BUG: kernel NULL pointer dereference, address: 0000000000000028 This is from the destroy_unhashed_deleg() call at the end of that same revoke_delegation() call, which calls nfs4_unlock_deleg_lease() and passes fp->fi_deleg_file, which is NULL (!!!), to vfs_setlease(). These three happened in a 200usec window. What I imagine might be happening is that the nfsd_break_deleg_cb() callback is called after destroy_delegation() has unhashed the deleg, but before destroy_unhashed_delegation() gets called. If nfsd_break_deleg_cb() is called before the unhash - and particularly if nfsd_break_one_deleg()->nfsd4_run_cb() is called before, then the unhash will disconnect the delegation from the recall list, and no harm can be done. Once vfs_setlease(F_UNLCK) is called, the callback can no longer be called, so again no harm is possible. Between these two is a race window. The delegation can be put on the recall list, but the deleg will be unhashed and put_deleg_file() will have set fi_deleg_file to NULL - resulting in first WARNING and the BUG. I cannot see how the refcount_t warning can be generated ... so maybe I've missed something. My proposed solution is to test delegation_hashed() while holding fp->fi_lock in nfsd_break_deleg_cb(). If the delegation is locked, we can safely schedule the recall. If it isn't, then the lease is about to be unlocked and there is no need to schedule anything. I don't know this code at all well, so I thought it safest to ask for comments before posting a proper patch. I'm particularly curious to know if anyone has ideas about the refcount overflow. Corruption is unlikely as the deleg looked mostly OK and the memory has been freed, but not reallocated (though it is possible it was freed, reallocated, and freed again). This wasn't a refcount_inc on a zero refcount - that gives a different error. I don't know what the refcount value was, it has already been changed to the 'saturated' value - 0xc0000000. Thanks, NeilBrown diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index c5d199d7e6b4..e02d638df6be 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -4822,8 +4822,10 @@ nfsd_break_deleg_cb(struct file_lock *fl) fl->fl_break_time = 0; spin_lock(&fp->fi_lock); - fp->fi_had_conflict = true; - nfsd_break_one_deleg(dp); + if (delegation_hashed(dp)) { + fp->fi_had_conflict = true; + nfsd_break_one_deleg(dp); + } spin_unlock(&fp->fi_lock); return ret; }