On Tue, 08 Oct 2024, Olga Kornievskaia wrote: > On Sat, Oct 5, 2024 at 5:09 PM NeilBrown <neilb@xxxxxxx> wrote: > > > > On Sun, 06 Oct 2024, Chuck Lever wrote: > > > On Sat, Oct 05, 2024 at 12:20:48PM -0400, Jeff Layton wrote: > > > > On Sat, 2024-10-05 at 10:53 -0400, Chuck Lever wrote: > > > > > On Fri, Oct 04, 2024 at 06:04:03PM -0400, Olga Kornievskaia wrote: > > > > > > When multiple FREE_STATEIDs are sent for the same delegation stateid, > > > > > > it can lead to a possible either use-after-tree or counter refcount > > > > > > underflow errors. > > > > > > > > > > > > In nfsd4_free_stateid() under the client lock we find a delegation > > > > > > stateid, however the code drops the lock before calling nfs4_put_stid(), > > > > > > that allows another FREE_STATE to find the stateid again. The first one > > > > > > will proceed to then free the stateid which leads to either > > > > > > use-after-free or decrementing already zerod counter. > > > > > > > > > > > > CC: stable@xxxxxxxxxxxxxxx > > > > > > > > > > I assume that the broken commit is pretty old, but this fix does not > > > > > apply before v6.9 (where sc_status is introduced). I can add > > > > > "# v6.9+" to the Cc: stable tag. > > > > > > > > > > > > > I don't know. It looks like nfsd4_free_stateid always returned > > > > NFS4ERR_LOCKS_HELD on a delegation stateid until 3f29cc82a84c. > > > > > > > > > But what do folks think about a Fixes: tag? > > > > > > > > > > Could be e1ca12dfb1be ("NFSD: added FREE_STATEID operation"), but > > > > > that doesn't have the switch statement, which was added by > > > > > 2da1cec713bc ("nfsd4: simplify free_stateid"). > > > > > > > > > > > > > > > > > > Maybe this one? > > > > > > > > 3f29cc82a84c nfsd: split sc_status out of sc_type > > > > > > > > That particular bit of the code (and the SC_STATUS_CLOSED flag) was > > > > added in that patch, and I don't think you'd want to apply this patch > > > > to anything that didn't have it. > > > > > > OK, if we believe that 3f29cc82 is where the misbehavior started, > > > then I can replace the "Cc: stable@" with "Fixes: 3f29cc82a84c". > > > > I believe the misbehaviour started with > > Commit: b0fc29d6fcd0 ("nfsd: Ensure stateids remain unique until they are freed") > > in v3.18. > > > > The bug in the current code is that it assumes that > > > > list_del_init(&dp->dl_recall_lru); > > > > actually removes from the the dl_recall_lru, and so a reference must be > > dropped. But if it wasn't on the list, then that is wrong. > > I've actually been chasing a different problem (a UAF) and Ben noticed > a problem with doing a double free (by double free_stateid) which this > patch addresses. But, this particular line list_del_init() in > nfsd4_free_stateid() has been bothering me as I thought this access > should be guarded by the "state_lock"? Though I have to admit I've > tried that and it doesn't seem to help my UAF problem. Anyway where > I'm going with it perhaps the guard "if > (!list_empty(&dp->dl_recall_lru))" is still needed (not for double > free_stateid by from other possibilities)? dl_recall_lru is a bit confusing. Sometimes it is on nn->del_recall_lru in which case it is protected by state_lock. Sometimes it is on clp->cl_revoked in which case it is protected by clp->cl_lock. And sometimes it is on reaplist which doesn't need protection. So it is important to update the state of the delegation when it is moved between lists or removed from a list. I think we now do thanks to your patch, but it wouldn't hurt to audit all accesses again... NeilBrown > > I was wondering if the nfsd4_free_stateid() somehow could steal the > entries from the list while the laundromat is going thru the > revocation process. The problem I believe is that the laundromat > thread marks the delegation "revoked" but somehow never ends up > calling destroy_delegation() (destoy_delegation is the place that > frees the lease -- but instead we are left with a lease on the file > which causes a new open to call into break_lease() which ends up doing > a UAF on a freed delegation stateid -- which was freed by the > free_stateid). > > > > So a "if (!list_empty(&dp->dl_recall_lru))" guard might also fix the > > bug (though adding SC_STATUS_CLOSED is a better fix I think). > > > > Prior to the above 3.17 commit, the relevant code was > > > > static void destroy_revoked_delegation(struct nfs4_delegation *dp) > > { > > list_del_init(&dp->dl_recall_lru); > > remove_stid(&dp->dl_stid); > > nfs4_put_delegation(dp); > > } > > > > so the revoked delegation would be removed (remove_stid) from the idr > > and a subsequent FREE_STATEID request would not find it. > > The commit removed the remove_stid() call but didn't do anything to > > prevent the free_stateid being repeated. > > In that kernel it might have been appropriate to set > > dp->dl_stid.sc_type = NFS4_CLOSED_DELEG_STID; > > was done to unhash_delegation() in that patch. > > > > So I think we should declare > > Fixes: b0fc29d6fcd0 ("nfsd: Ensure stateids remain unique until they are freed") > > > > and be prepared to provide alternate patches for older kernels. > > > > NeilBrown > > > > > > > > > > > > > > Signed-off-by: Olga Kornievskaia <okorniev@xxxxxxxxxx> > > > > > > --- > > > > > > fs/nfsd/nfs4state.c | 1 + > > > > > > 1 file changed, 1 insertion(+) > > > > > > > > > > > > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c > > > > > > index ac1859c7cc9d..56b261608af4 100644 > > > > > > --- a/fs/nfsd/nfs4state.c > > > > > > +++ b/fs/nfsd/nfs4state.c > > > > > > @@ -7154,6 +7154,7 @@ nfsd4_free_stateid(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, > > > > > > switch (s->sc_type) { > > > > > > case SC_TYPE_DELEG: > > > > > > if (s->sc_status & SC_STATUS_REVOKED) { > > > > > > + s->sc_status |= SC_STATUS_CLOSED; > > > > > > spin_unlock(&s->sc_lock); > > > > > > dp = delegstateid(s); > > > > > > list_del_init(&dp->dl_recall_lru); > > > > > > -- > > > > > > 2.43.5 > > > > > > > > > > > > > > > > > > > -- > > > > Jeff Layton <jlayton@xxxxxxxxxx> > > > > > > -- > > > Chuck Lever > > > > > > > >