On Wed, Nov 24, 2021 at 03:59:47PM +0000, Trond Myklebust wrote: > On Wed, 2021-11-24 at 10:29 -0500, Bruce Fields wrote: > > On Mon, Nov 22, 2021 at 03:17:28PM +0000, Chuck Lever III wrote: > > > > > > > > > > On Nov 22, 2021, at 4:15 AM, Olivier Monaco > > > > <olivier@xxxxxxxxxxxxxxx> wrote: > > > > > > > > Hi, > > > > > > > > I think my problem is related to this thread. > > > > > > > > We are running a VMware vCenter platform running 9 groups of > > > > virtual machines. Each group includes a VM with NFS for file > > > > sharing and 3 VMs with NFS clients, so we are running 9 > > > > independent file servers. > > > > > > I've opened https://bugzilla.linux-nfs.org/show_bug.cgi?id=371 > > > > > > Just a random thought: would enabling KASAN shed some light? > > > > In fact, we've gotten reports from Redhat QE of a KASAN use-after- > > free > > warning in the laundromat thread, which I think might be the same > > bug. > > We've been getting occasional reports of problems here for a long > > time, > > but they've been very hard to reproduce. > > > > After fooling with their reproducer, I think I finally have it. > > Embarrasingly, it's nothing that complicated. You can make it much > > easier to reproduce by adding an msleep call after the vfs_setlease > > in > > nfs4_set_delegation. > > > > If it's possible to run a patched kernel in production, you could try > > the following, and I'd definitely be interested in any results. > > > > Otherwise, you can probably work around the problem by disabling > > delegations. Something like > > > > sudo echo "fs.leases-enable = 0" >/etc/sysctl.d/nfsd- > > workaround.conf > > > > should do it. > > > > Not sure if this fix is best or if there's something simpler. > > > > --b. > > > > commit 6de51237589e > > Author: J. Bruce Fields <bfields@xxxxxxxxxx> > > Date: Tue Nov 23 22:31:04 2021 -0500 > > > > nfsd: fix use-after-free due to delegation race > > > > A delegation break could arrive as soon as we've called > > vfs_setlease. A > > delegation break runs a callback which immediately (in > > nfsd4_cb_recall_prepare) adds the delegation to del_recall_lru. > > If we > > then exit nfs4_set_delegation without hashing the delegation, it > > will be > > freed as soon as the callback is done with it, without ever being > > removed from del_recall_lru. > > > > Symptoms show up later as use-after-free or list corruption > > warnings, > > usually in the laundromat thread. > > > > Signed-off-by: J. Bruce Fields <bfields@xxxxxxxxxx> > > > > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c > > index bfad94c70b84..8e063f49240b 100644 > > --- a/fs/nfsd/nfs4state.c > > +++ b/fs/nfsd/nfs4state.c > > @@ -5159,15 +5159,16 @@ nfs4_set_delegation(struct nfs4_client *clp, > > struct svc_fh *fh, > > locks_free_lock(fl); > > if (status) > > goto out_clnt_odstate; > > + > > status = nfsd4_check_conflicting_opens(clp, fp); > > - if (status) > > - goto out_unlock; > > > > spin_lock(&state_lock); > > spin_lock(&fp->fi_lock); > > - if (fp->fi_had_conflict) > > + if (status || fp->fi_had_conflict) { > > + list_del_init(&dp->dl_recall_lru); > > + dp->dl_time++; > > status = -EAGAIN; > > - else > > + } else > > status = hash_delegation_locked(dp, fp); > > spin_unlock(&fp->fi_lock); > > spin_unlock(&state_lock); > > Why won't this leak a reference to the stid? I'm not seeing it. > Afaics nfsd_break_one_deleg() does take a reference before launching > the callback daemon, and that reference is released when the > delegation is later processed by the laundromat. Right. Basically, there are two "long-lived" references, one taken as long as the callback's using it, one taken as long as it's "hashed" (see hash_delegation_locked/unhash_delegation_locked). In the -EAGAIN case above, we're holding a temporary reference which will be dropped on exit from this function; if a callback's in progress, it will then drop the final reference. > Hmm... Isn't the real bug here rather that the laundromat is corrupting > both the nn->client_lru and nn->del_recall_lru lists because it is > using list_add() instead of list_move() when adding these objects to > the reaplist? If that were the problem, I think we'd be hitting it all the time. Looking.... No, unhash_delegation_locked did a list_del_init(). (Is the WARN_ON there too ugly?) --b.