On Wed, 2021-11-24 at 10:59 -0500, Trond Myklebust wrote: > On Wed, 2021-11-24 at 10:29 -0500, Bruce Fields wrote: > > On Mon, Nov 22, 2021 at 03:17:28PM +0000, Chuck Lever III wrote: > > > > > > > > > > On Nov 22, 2021, at 4:15 AM, Olivier Monaco > > > > <olivier@xxxxxxxxxxxxxxx> wrote: > > > > > > > > Hi, > > > > > > > > I think my problem is related to this thread. > > > > > > > > We are running a VMware vCenter platform running 9 groups of > > > > virtual machines. Each group includes a VM with NFS for file > > > > sharing and 3 VMs with NFS clients, so we are running 9 > > > > independent file servers. > > > > > > I've opened https://bugzilla.linux-nfs.org/show_bug.cgi?id=371 > > > > > > Just a random thought: would enabling KASAN shed some light? > > > > In fact, we've gotten reports from Redhat QE of a KASAN use-after- > > free > > warning in the laundromat thread, which I think might be the same > > bug. > > We've been getting occasional reports of problems here for a long > > time, > > but they've been very hard to reproduce. > > > > After fooling with their reproducer, I think I finally have it. > > Embarrasingly, it's nothing that complicated. You can make it much > > easier to reproduce by adding an msleep call after the vfs_setlease > > in > > nfs4_set_delegation. > > > > If it's possible to run a patched kernel in production, you could > > try > > the following, and I'd definitely be interested in any results. > > > > Otherwise, you can probably work around the problem by disabling > > delegations. Something like > > > > sudo echo "fs.leases-enable = 0" >/etc/sysctl.d/nfsd- > > workaround.conf > > > > should do it. > > > > Not sure if this fix is best or if there's something simpler. > > > > --b. > > > > commit 6de51237589e > > Author: J. Bruce Fields <bfields@xxxxxxxxxx> > > Date: Tue Nov 23 22:31:04 2021 -0500 > > > > nfsd: fix use-after-free due to delegation race > > > > A delegation break could arrive as soon as we've called > > vfs_setlease. A > > delegation break runs a callback which immediately (in > > nfsd4_cb_recall_prepare) adds the delegation to > > del_recall_lru. > > If we > > then exit nfs4_set_delegation without hashing the delegation, > > it > > will be > > freed as soon as the callback is done with it, without ever > > being > > removed from del_recall_lru. > > > > Symptoms show up later as use-after-free or list corruption > > warnings, > > usually in the laundromat thread. > > > > Signed-off-by: J. Bruce Fields <bfields@xxxxxxxxxx> > > > > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c > > index bfad94c70b84..8e063f49240b 100644 > > --- a/fs/nfsd/nfs4state.c > > +++ b/fs/nfsd/nfs4state.c > > @@ -5159,15 +5159,16 @@ nfs4_set_delegation(struct nfs4_client > > *clp, > > struct svc_fh *fh, > > locks_free_lock(fl); > > if (status) > > goto out_clnt_odstate; > > + > > status = nfsd4_check_conflicting_opens(clp, fp); > > - if (status) > > - goto out_unlock; > > > > spin_lock(&state_lock); > > spin_lock(&fp->fi_lock); > > - if (fp->fi_had_conflict) > > + if (status || fp->fi_had_conflict) { > > + list_del_init(&dp->dl_recall_lru); > > + dp->dl_time++; > > status = -EAGAIN; > > - else > > + } else > > status = hash_delegation_locked(dp, fp); > > spin_unlock(&fp->fi_lock); > > spin_unlock(&state_lock); > > Why won't this leak a reference to the stid? Afaics > nfsd_break_one_deleg() does take a reference before launching the > callback daemon, and that reference is released when the delegation > is > later processed by the laundromat. > > Hmm... Isn't the real bug here rather that the laundromat is > corrupting > both the nn->client_lru and nn->del_recall_lru lists because it is > using list_add() instead of list_move() when adding these objects to > the reaplist? > OK. Looks like mark_client_expired_locked() does call list_del() on clp->cl_lru on success, so I'm wrong about that. However I don't see where anything is removing dp->dl_recall_lru, so I still think the laundromat is corrupting the nn->del_recall_lru list. -- Trond Myklebust CTO, Hammerspace Inc 4984 El Camino Real, Suite 208 Los Altos, CA 94022 www.hammer.space