Re: Leaked nfsd_file due to race condition and early unhash (fs/nfsd/filecache.c)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



No problem. Thanks.

On Thu, Jul 11, 2024 at 10:59 AM Jeff Layton <jlayton@xxxxxxxxxx> wrote:
>
> Great, thanks for testing them! Can we add a Tested-by: for you to the
> patches? Consider posting your testcase as well. Maybe we can add it to
> fstests or another testsuite.
>
> Thanks,
> Jeff
>
> On Thu, 2024-07-11 at 10:54 -0400, Youzhong Yang wrote:
> > I'd like to report that your patches + the "list_head nf_gc" patch
> > together have been tested under heavy nfs load generated by machines
> > in our testing farm, no leak has been observed.
> >
> > Surely my reproducer can no longer reproduce the issue once the
> > patches are applied.
> >
> > FYI here is the reproducer:
> > https://github.com/youzhongyang/nfsd-file-leaks
> >
> > On Wed, Jul 10, 2024 at 4:22 PM Jeff Layton <jlayton@xxxxxxxxxx>
> > wrote:
> > >
> > > Great! If you're able to test the patches I sent earlier today,
> > > please
> > > let us know if they help with your reproducer.
> > >
> > >
> > > On Wed, 2024-07-10 at 10:49 -0400, Youzhong Yang wrote:
> > > > Thanks Jeff. Just sent out a separate e-mail.
> > > >
> > > > I have a simple reproducer now, it's a C program, I will upload
> > > > it to
> > > > a github repository together with a readme file and then post the
> > > > link
> > > > (if allowed) here.
> > > >
> > > > On Wed, Jul 10, 2024 at 9:40 AM Jeff Layton <jlayton@xxxxxxxxxx>
> > > > wrote:
> > > > >
> > > > > Can you send it as a separate [PATCH] email? I think your MUA
> > > > > is
> > > > > mangling the patches as I couldn't apply it earlier. Either use
> > > > > git-
> > > > > send-email or have a look over this page:
> > > > >
> > > > >
> > > > >
> > > > > https://www.kernel.org/doc/html/latest/process/email-clients.html#email-clients
> > > > >
> > > > > Thanks!
> > > > > Jeff
> > > > >
> > > > > On Wed, 2024-07-10 at 09:33 -0400, Youzhong Yang wrote:
> > > > > > Thanks. Here it is again:
> > > > > >
> > > > > > commit c6e69cebc052cb82d91fc81a98aee20749fe8d47
> > > > > > Author: Youzhong Yang <youzhong@xxxxxxxxx>
> > > > > > Date:   Thu Jul 4 11:25:40 2024 -0400
> > > > > >
> > > > > >     nfsd: fix nfsd_file leaking due to mixed use of nf-
> > > > > > >nf_lru
> > > > > >
> > > > > >     nfsd_file_put() in one thread can race with another
> > > > > > thread doing
> > > > > >     garbage collection (running nfsd_file_gc() ->
> > > > > > list_lru_walk() ->
> > > > > >     nfsd_file_lru_cb()):
> > > > > >
> > > > > >       * In nfsd_file_put(), nf->nf_ref is 1, so it tries to
> > > > > > do
> > > > > > nfsd_file_lru_add().
> > > > > >       * nfsd_file_lru_add() returns true (with
> > > > > > NFSD_FILE_REFERENCED
> > > > > > bit set)
> > > > > >       * garbage collector kicks in, nfsd_file_lru_cb() clears
> > > > > > REFERENCED bit and
> > > > > >         returns LRU_ROTATE.
> > > > > >       * garbage collector kicks in again, nfsd_file_lru_cb()
> > > > > > now
> > > > > > decrements nf->nf_ref
> > > > > >         to 0, runs nfsd_file_unhash(), removes it from the
> > > > > > LRU and
> > > > > > adds to the dispose
> > > > > >         list [list_lru_isolate_move(lru, &nf->nf_lru, head)]
> > > > > >       * nfsd_file_put() detects NFSD_FILE_HASHED bit is
> > > > > > cleared, so
> > > > > > it
> > > > > > tries to remove
> > > > > >         the 'nf' from the LRU [if
> > > > > > (!nfsd_file_lru_remove(nf))]. The
> > > > > > 'nf' has been added
> > > > > >         to the 'dispose' list by nfsd_file_lru_cb(), so
> > > > > > nfsd_file_lru_remove(nf) simply
> > > > > >         treats it as part of the LRU and removes it, which
> > > > > > leads to
> > > > > > its removal from
> > > > > >         the 'dispose' list.
> > > > > >       * At this moment, 'nf' is unhashed with its nf_ref
> > > > > > being 0, and
> > > > > > not on the LRU.
> > > > > >         nfsd_file_put() continues its execution [if
> > > > > > (refcount_dec_and_test(&nf->nf_ref))],
> > > > > >         as nf->nf_ref is already 0, nf->nf_ref is set to
> > > > > > REFCOUNT_SATURATED, and the 'nf'
> > > > > >         gets no chance of being freed.
> > > > > >
> > > > > >     nfsd_file_put() can also race with
> > > > > > nfsd_file_cond_queue():
> > > > > >       * In nfsd_file_put(), nf->nf_ref is 1, so it tries to
> > > > > > do
> > > > > > nfsd_file_lru_add().
> > > > > >       * nfsd_file_lru_add() sets REFERENCED bit and returns
> > > > > > true.
> > > > > >       * Some userland application runs 'exportfs -f' or
> > > > > > something
> > > > > > like
> > > > > > that, which triggers
> > > > > >         __nfsd_file_cache_purge() -> nfsd_file_cond_queue().
> > > > > >       * In nfsd_file_cond_queue(), it runs [if
> > > > > > (!nfsd_file_unhash(nf))], unhash is done
> > > > > >         successfully.
> > > > > >       * nfsd_file_cond_queue() runs [if
> > > > > > (!nfsd_file_get(nf))], now
> > > > > > nf->nf_ref goes to 2.
> > > > > >       * nfsd_file_cond_queue() runs [if
> > > > > > (nfsd_file_lru_remove(nf))],
> > > > > > it succeeds.
> > > > > >       * nfsd_file_cond_queue() runs [if
> > > > > > (refcount_sub_and_test(decrement, &nf->nf_ref))]
> > > > > >         (with "decrement" being 2), so the nf->nf_ref goes to
> > > > > > 0, the
> > > > > > 'nf' is added to the
> > > > > >         dispose list [list_add(&nf->nf_lru, dispose)]
> > > > > >       * nfsd_file_put() detects NFSD_FILE_HASHED bit is
> > > > > > cleared, so
> > > > > > it
> > > > > > tries to remove
> > > > > >         the 'nf' from the LRU [if
> > > > > > (!nfsd_file_lru_remove(nf))],
> > > > > > although the 'nf' is not
> > > > > >         in the LRU, but it is linked in the 'dispose' list,
> > > > > > nfsd_file_lru_remove() simply
> > > > > >         treats it as part of the LRU and removes it. This
> > > > > > leads to
> > > > > > its
> > > > > > removal from
> > > > > >         the 'dispose' list!
> > > > > >       * Now nf->ref is 0, unhashed. nfsd_file_put() continues
> > > > > > its
> > > > > > execution and set
> > > > > >         nf->nf_ref to REFCOUNT_SATURATED.
> > > > > >
> > > > > >     As shown in the above analysis, using nf_lru for both the
> > > > > > LRU
> > > > > > list
> > > > > > and dispose list
> > > > > >     can cause the leaks. This patch adds a new list_head
> > > > > > nf_gc in
> > > > > > struct nfsd_file, and uses
> > > > > >     it for the dispose list.
> > > > > >
> > > > > >     Signed-off-by: Youzhong Yang <youzhong@xxxxxxxxx>
> > > > > >
> > > > > > diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
> > > > > > index ad9083ca144b..22ebd7fb8639 100644
> > > > > > --- a/fs/nfsd/filecache.c
> > > > > > +++ b/fs/nfsd/filecache.c
> > > > > > @@ -216,6 +216,7 @@ nfsd_file_alloc(struct net *net, struct
> > > > > > inode
> > > > > > *inode, unsigned char need,
> > > > > >                 return NULL;
> > > > > >
> > > > > >         INIT_LIST_HEAD(&nf->nf_lru);
> > > > > > +       INIT_LIST_HEAD(&nf->nf_gc);
> > > > > >         nf->nf_birthtime = ktime_get();
> > > > > >         nf->nf_file = NULL;
> > > > > >         nf->nf_cred = get_current_cred();
> > > > > > @@ -393,8 +394,8 @@ nfsd_file_dispose_list(struct list_head
> > > > > > *dispose)
> > > > > >         struct nfsd_file *nf;
> > > > > >
> > > > > >         while (!list_empty(dispose)) {
> > > > > > -               nf = list_first_entry(dispose, struct
> > > > > > nfsd_file,
> > > > > > nf_lru);
> > > > > > -               list_del_init(&nf->nf_lru);
> > > > > > +               nf = list_first_entry(dispose, struct
> > > > > > nfsd_file,
> > > > > > nf_gc);
> > > > > > +               list_del_init(&nf->nf_gc);
> > > > > >                 nfsd_file_free(nf);
> > > > > >         }
> > > > > >  }
> > > > > > @@ -411,12 +412,12 @@ nfsd_file_dispose_list_delayed(struct
> > > > > > list_head
> > > > > > *dispose)
> > > > > >  {
> > > > > >         while(!list_empty(dispose)) {
> > > > > >                 struct nfsd_file *nf =
> > > > > > list_first_entry(dispose,
> > > > > > -                                               struct
> > > > > > nfsd_file,
> > > > > > nf_lru);
> > > > > > +                                               struct
> > > > > > nfsd_file,
> > > > > > nf_gc);
> > > > > >                 struct nfsd_net *nn = net_generic(nf->nf_net,
> > > > > > nfsd_net_id);
> > > > > >                 struct nfsd_fcache_disposal *l = nn-
> > > > > > >fcache_disposal;
> > > > > >
> > > > > >                 spin_lock(&l->lock);
> > > > > > -               list_move_tail(&nf->nf_lru, &l->freeme);
> > > > > > +               list_move_tail(&nf->nf_gc, &l->freeme);
> > > > > >                 spin_unlock(&l->lock);
> > > > > >                 svc_wake_up(nn->nfsd_serv);
> > > > > >         }
> > > > > > @@ -503,7 +504,8 @@ nfsd_file_lru_cb(struct list_head *item,
> > > > > > struct
> > > > > > list_lru_one *lru,
> > > > > >
> > > > > >         /* Refcount went to zero. Unhash it and queue it to
> > > > > > the
> > > > > > dispose list */
> > > > > >         nfsd_file_unhash(nf);
> > > > > > -       list_lru_isolate_move(lru, &nf->nf_lru, head);
> > > > > > +       list_lru_isolate(lru, &nf->nf_lru);
> > > > > > +       list_add(&nf->nf_gc, head);
> > > > > >         this_cpu_inc(nfsd_file_evictions);
> > > > > >         trace_nfsd_file_gc_disposed(nf);
> > > > > >         return LRU_REMOVED;
> > > > > > @@ -578,7 +580,7 @@ nfsd_file_cond_queue(struct nfsd_file
> > > > > > *nf, struct
> > > > > > list_head *dispose)
> > > > > >
> > > > > >         /* If refcount goes to 0, then put on the dispose
> > > > > > list */
> > > > > >         if (refcount_sub_and_test(decrement, &nf->nf_ref)) {
> > > > > > -               list_add(&nf->nf_lru, dispose);
> > > > > > +               list_add(&nf->nf_gc, dispose);
> > > > > >                 trace_nfsd_file_closing(nf);
> > > > > >         }
> > > > > >  }
> > > > > > @@ -654,8 +656,8 @@ nfsd_file_close_inode_sync(struct inode
> > > > > > *inode)
> > > > > >
> > > > > >         nfsd_file_queue_for_close(inode, &dispose);
> > > > > >         while (!list_empty(&dispose)) {
> > > > > > -               nf = list_first_entry(&dispose, struct
> > > > > > nfsd_file,
> > > > > > nf_lru);
> > > > > > -               list_del_init(&nf->nf_lru);
> > > > > > +               nf = list_first_entry(&dispose, struct
> > > > > > nfsd_file,
> > > > > > nf_gc);
> > > > > > +               list_del_init(&nf->nf_gc);
> > > > > >                 nfsd_file_free(nf);
> > > > > >         }
> > > > > >  }
> > > > > > diff --git a/fs/nfsd/filecache.h b/fs/nfsd/filecache.h
> > > > > > index c61884def906..3fbec24eea6c 100644
> > > > > > --- a/fs/nfsd/filecache.h
> > > > > > +++ b/fs/nfsd/filecache.h
> > > > > > @@ -44,6 +44,7 @@ struct nfsd_file {
> > > > > >
> > > > > >         struct nfsd_file_mark   *nf_mark;
> > > > > >         struct list_head        nf_lru;
> > > > > > +       struct list_head        nf_gc;
> > > > > >         struct rcu_head         nf_rcu;
> > > > > >         ktime_t                 nf_birthtime;
> > > > > >  };
> > > > > >
> > > > > > On Wed, Jul 10, 2024 at 8:49 AM Jeff Layton
> > > > > > <jlayton@xxxxxxxxxx>
> > > > > > wrote:
> > > > > > >
> > > > > > > Ok, thanks. I went crawling over the filecache code, and
> > > > > > > found a
> > > > > > > couple
> > > > > > > of potential nfsd_file leaks. I'm testing patches now and
> > > > > > > will
> > > > > > > likely
> > > > > > > send them along later today.
> > > > > > >
> > > > > > > I think we do still want the patch to add nf_gc list_head.
> > > > > > > I'm not
> > > > > > > certain there is a race there, but it's safer to just use a
> > > > > > > separate
> > > > > > > list_head and it doesn't represent a lot of memory.
> > > > > > >
> > > > > > > Could you post that one individually so that Chuck can pick
> > > > > > > it up?
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Jeff
> > > > > > >
> > > > > > > On Tue, 2024-07-09 at 15:13 -0400, Youzhong Yang wrote:
> > > > > > > > It's not on the LRU:
> > > > > > > >
> > > > > > > > crash> struct
> > > > > > > > nfsd_file.nf_flags,nf_ref.refs.counter,nf_lru,nf_gc
> > > > > > > > ffff898107b95980
> > > > > > > >   nf_flags = 12,
> > > > > > > >   nf_ref.refs.counter = 1
> > > > > > > >   nf_lru = {
> > > > > > > >     next = 0xffff898107b959c8,
> > > > > > > >     prev = 0xffff898107b959c8
> > > > > > > >   },
> > > > > > > >   nf_gc = {
> > > > > > > >     next = 0xffff898107b959d8,
> > > > > > > >     prev = 0xffff898107b959d8
> > > > > > > >   },
> > > > > > > >
> > > > > > > > I don't have an easy reproducer now as I am having
> > > > > > > > difficulty
> > > > > > > > generating tons of NFS read/write ops using a limited
> > > > > > > > number of
> > > > > > > > nfs
> > > > > > > > clients. I use our in-house testing farm to test the
> > > > > > > > patch.
> > > > > > > >
> > > > > > > > Thanks.
> > > > > > > >
> > > > > > > > On Tue, Jul 9, 2024 at 3:05 PM Jeff Layton
> > > > > > > > <jlayton@xxxxxxxxxx>
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > On Tue, 2024-07-09 at 14:37 -0400, Youzhong Yang wrote:
> > > > > > > > > > Thanks Jeff.
> > > > > > > > > >
> > > > > > > > > > Unfortunately the early unhash easily leads to leaks:
> > > > > > > > > >
> > > > > > > > > > crash> kmem -S nfsd_file | grep '\[ffff' | sed -e
> > > > > > > > > > 's|\[||' -e
> > > > > > > > > > 's|\]||'
> > > > > > > > > > > xargs -i echo struct
> > > > > > > > > > > nfsd_file.nf_flags,nf_ref.refs.counter
> > > > > > > > > > > '{}' >
> > > > > > > > > > /var/tmp/nfsd_files
> > > > > > > > > > crash> !wc -l /var/tmp/nfsd_files
> > > > > > > > > > 19 /var/tmp/nfsd_files
> > > > > > > > > > crash> < /var/tmp/nfsd_files
> > > > > > > > > > crash> struct nfsd_file.nf_flags,nf_ref.refs.counter
> > > > > > > > > > ffff88865c778900
> > > > > > > > > >   nf_flags = 8,
> > > > > > > > > >   nf_ref.refs.counter = 2
> > > > > > > > > > crash> struct nfsd_file.nf_flags,nf_ref.refs.counter
> > > > > > > > > > ffff88865c778cc0
> > > > > > > > > >   nf_flags = 8,
> > > > > > > > > >   nf_ref.refs.counter = 3
> > > > > > > > > > crash> struct nfsd_file.nf_flags,nf_ref.refs.counter
> > > > > > > > > > ffff8885d5f35e00
> > > > > > > > > >   nf_flags = 8,
> > > > > > > > > >   nf_ref.refs.counter = 1
> > > > > > > > > > crash> struct nfsd_file.nf_flags,nf_ref.refs.counter
> > > > > > > > > > ffff88817443e780
> > > > > > > > > >   nf_flags = 8,
> > > > > > > > > >   nf_ref.refs.counter = 3
> > > > > > > > > > crash> struct nfsd_file.nf_flags,nf_ref.refs.counter
> > > > > > > > > > ffff88818b3f0600
> > > > > > > > > >   nf_flags = 8,
> > > > > > > > > >   nf_ref.refs.counter = 2
> > > > > > > > > > crash> struct nfsd_file.nf_flags,nf_ref.refs.counter
> > > > > > > > > > ffff88a4490f8300
> > > > > > > > > >   nf_flags = 8,
> > > > > > > > > >   nf_ref.refs.counter = 1
> > > > > > > > > > crash> struct nfsd_file.nf_flags,nf_ref.refs.counter
> > > > > > > > > > ffff88a0dab183c0
> > > > > > > > > >   nf_flags = 8,
> > > > > > > > > >   nf_ref.refs.counter = 40
> > > > > > > > >
> > > > > > > > > That's a lot of references! Might be interesting to
> > > > > > > > > look more
> > > > > > > > > closely
> > > > > > > > > at that one, but the refcounts are all over the place,
> > > > > > > > > so it
> > > > > > > > > really
> > > > > > > > > does look like we just have a refcount leak somewhere.
> > > > > > > > >
> > > > > > > > > > ...
> > > > > > > > > > crash> struct nfsd_file.nf_flags,nf_ref.refs.counter
> > > > > > > > > > ffff89209535f200
> > > > > > > > > >   nf_flags = 8,
> > > > > > > > > >   nf_ref.refs.counter = 2
> > > > > > > > > > crash> struct nfsd_file.nf_flags,nf_ref.refs.counter
> > > > > > > > > > ffff8980e15138c0
> > > > > > > > > >   nf_flags = 8,
> > > > > > > > > >   nf_ref.refs.counter = 7
> > > > > > > > > > crash> struct nfsd_file.nf_flags,nf_ref.refs.counter
> > > > > > > > > > ffff898107b95800
> > > > > > > > > >   nf_flags = 8,
> > > > > > > > > >   nf_ref.refs.counter = 3
> > > > > > > > > > crash> struct nfsd_file.nf_flags,nf_ref.refs.counter
> > > > > > > > > > ffff898107b95980
> > > > > > > > > >   nf_flags = 12,
> > > > > > > > >
> > > > > > > > > The others are NFSD_FILE_GC. This one is also
> > > > > > > > > NFSD_FILE_REFERENCED. Are
> > > > > > > > > these objects on the LRU?
> > > > > > > > >
> > > > > > > > > >   nf_ref.refs.counter = 1
> > > > > > > > > >
> > > > > > > > > > nfsd_file_do_acquire() -> nfsd_file_lookup_locked()
> > > > > > > > > > relies on
> > > > > > > > > > the
> > > > > > > > > > hash
> > > > > > > > > > table to find the nfsd_file,
> > > > > > > > > > but I am still scratching my head why and how this
> > > > > > > > > > happens.
> > > > > > > > > >
> > > > > > > > > > FYI, here is the patch I applied for testing:
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > The above suggests to me that there is a garden-variety
> > > > > > > > > refcount leak
> > > > > > > > > somewhere. Whenever some bit of the code takes a
> > > > > > > > > reference to
> > > > > > > > > an object
> > > > > > > > > (like a nfsd_file), it's implicitly required to
> > > > > > > > > eventually put
> > > > > > > > > that
> > > > > > > > > reference. Typically that means that the code needs to
> > > > > > > > > maintain
> > > > > > > > > a
> > > > > > > > > pointer to that object, as it can be unhashed at any
> > > > > > > > > time and
> > > > > > > > > it can't
> > > > > > > > > rely on finding the same object later.
> > > > > > > > >
> > > > > > > > > Do you have a reproducer for this?
> > > > > > > > >
> > > > > > > > > > diff --git a/fs/nfsd/filecache.c
> > > > > > > > > > b/fs/nfsd/filecache.c
> > > > > > > > > > index ad9083ca144b..22ebd7fb8639 100644
> > > > > > > > > > --- a/fs/nfsd/filecache.c
> > > > > > > > > > +++ b/fs/nfsd/filecache.c
> > > > > > > > > > @@ -216,6 +216,7 @@ nfsd_file_alloc(struct net *net,
> > > > > > > > > > struct
> > > > > > > > > > inode
> > > > > > > > > > *inode, unsigned char need,
> > > > > > > > > >                 return NULL;
> > > > > > > > > >
> > > > > > > > > >         INIT_LIST_HEAD(&nf->nf_lru);
> > > > > > > > > > +       INIT_LIST_HEAD(&nf->nf_gc);
> > > > > > > > > >         nf->nf_birthtime = ktime_get();
> > > > > > > > > >         nf->nf_file = NULL;
> > > > > > > > > >         nf->nf_cred = get_current_cred();
> > > > > > > > > > @@ -393,8 +394,8 @@ nfsd_file_dispose_list(struct
> > > > > > > > > > list_head
> > > > > > > > > > *dispose)
> > > > > > > > > >         struct nfsd_file *nf;
> > > > > > > > > >
> > > > > > > > > >         while (!list_empty(dispose)) {
> > > > > > > > > > -               nf = list_first_entry(dispose, struct
> > > > > > > > > > nfsd_file,
> > > > > > > > > > nf_lru);
> > > > > > > > > > -               list_del_init(&nf->nf_lru);
> > > > > > > > > > +               nf = list_first_entry(dispose, struct
> > > > > > > > > > nfsd_file,
> > > > > > > > > > nf_gc);
> > > > > > > > > > +               list_del_init(&nf->nf_gc);
> > > > > > > > > >                 nfsd_file_free(nf);
> > > > > > > > > >         }
> > > > > > > > > >  }
> > > > > > > > > > @@ -411,12 +412,12 @@
> > > > > > > > > > nfsd_file_dispose_list_delayed(struct
> > > > > > > > > > list_head
> > > > > > > > > > *dispose)
> > > > > > > > > >  {
> > > > > > > > > >         while(!list_empty(dispose)) {
> > > > > > > > > >                 struct nfsd_file *nf =
> > > > > > > > > > list_first_entry(dispose,
> > > > > > > > > > -
> > > > > > > > > > struct
> > > > > > > > > > nfsd_file,
> > > > > > > > > > nf_lru);
> > > > > > > > > > +
> > > > > > > > > > struct
> > > > > > > > > > nfsd_file,
> > > > > > > > > > nf_gc);
> > > > > > > > > >                 struct nfsd_net *nn = net_generic(nf-
> > > > > > > > > > >nf_net,
> > > > > > > > > > nfsd_net_id);
> > > > > > > > > >                 struct nfsd_fcache_disposal *l = nn-
> > > > > > > > > > > fcache_disposal;
> > > > > > > > > >
> > > > > > > > > >                 spin_lock(&l->lock);
> > > > > > > > > > -               list_move_tail(&nf->nf_lru, &l-
> > > > > > > > > > >freeme);
> > > > > > > > > > +               list_move_tail(&nf->nf_gc, &l-
> > > > > > > > > > >freeme);
> > > > > > > > > >                 spin_unlock(&l->lock);
> > > > > > > > > >                 svc_wake_up(nn->nfsd_serv);
> > > > > > > > > >         }
> > > > > > > > > > @@ -503,7 +504,8 @@ nfsd_file_lru_cb(struct list_head
> > > > > > > > > > *item,
> > > > > > > > > > struct
> > > > > > > > > > list_lru_one *lru,
> > > > > > > > > >
> > > > > > > > > >         /* Refcount went to zero. Unhash it and queue
> > > > > > > > > > it to
> > > > > > > > > > the
> > > > > > > > > > dispose list */
> > > > > > > > > >         nfsd_file_unhash(nf);
> > > > > > > > > > -       list_lru_isolate_move(lru, &nf->nf_lru,
> > > > > > > > > > head);
> > > > > > > > > > +       list_lru_isolate(lru, &nf->nf_lru);
> > > > > > > > > > +       list_add(&nf->nf_gc, head);
> > > > > > > > > >         this_cpu_inc(nfsd_file_evictions);
> > > > > > > > > >         trace_nfsd_file_gc_disposed(nf);
> > > > > > > > > >         return LRU_REMOVED;
> > > > > > > > > > @@ -578,7 +580,7 @@ nfsd_file_cond_queue(struct
> > > > > > > > > > nfsd_file
> > > > > > > > > > *nf, struct
> > > > > > > > > > list_head *dispose)
> > > > > > > > > >
> > > > > > > > > >         /* If refcount goes to 0, then put on the
> > > > > > > > > > dispose
> > > > > > > > > > list */
> > > > > > > > > >         if (refcount_sub_and_test(decrement, &nf-
> > > > > > > > > > >nf_ref)) {
> > > > > > > > > > -               list_add(&nf->nf_lru, dispose);
> > > > > > > > > > +               list_add(&nf->nf_gc, dispose);
> > > > > > > > > >                 trace_nfsd_file_closing(nf);
> > > > > > > > > >         }
> > > > > > > > > >  }
> > > > > > > > > > @@ -654,8 +656,8 @@ nfsd_file_close_inode_sync(struct
> > > > > > > > > > inode
> > > > > > > > > > *inode)
> > > > > > > > > >
> > > > > > > > > >         nfsd_file_queue_for_close(inode, &dispose);
> > > > > > > > > >         while (!list_empty(&dispose)) {
> > > > > > > > > > -               nf = list_first_entry(&dispose,
> > > > > > > > > > struct
> > > > > > > > > > nfsd_file,
> > > > > > > > > > nf_lru);
> > > > > > > > > > -               list_del_init(&nf->nf_lru);
> > > > > > > > > > +               nf = list_first_entry(&dispose,
> > > > > > > > > > struct
> > > > > > > > > > nfsd_file,
> > > > > > > > > > nf_gc);
> > > > > > > > > > +               list_del_init(&nf->nf_gc);
> > > > > > > > > >                 nfsd_file_free(nf);
> > > > > > > > > >         }
> > > > > > > > > >  }
> > > > > > > > > > diff --git a/fs/nfsd/filecache.h
> > > > > > > > > > b/fs/nfsd/filecache.h
> > > > > > > > > > index c61884def906..3fbec24eea6c 100644
> > > > > > > > > > --- a/fs/nfsd/filecache.h
> > > > > > > > > > +++ b/fs/nfsd/filecache.h
> > > > > > > > > > @@ -44,6 +44,7 @@ struct nfsd_file {
> > > > > > > > > >
> > > > > > > > > >         struct nfsd_file_mark   *nf_mark;
> > > > > > > > > >         struct list_head        nf_lru;
> > > > > > > > > > +       struct list_head        nf_gc;
> > > > > > > > > >         struct rcu_head         nf_rcu;
> > > > > > > > > >         ktime_t                 nf_birthtime;
> > > > > > > > > >  };
> > > > > > > > > >
> > > > > > > > > > On Mon, Jul 8, 2024 at 10:55 AM Jeff Layton
> > > > > > > > > > <jlayton@xxxxxxxxxx>
> > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > On Mon, 2024-07-08 at 10:23 -0400, Youzhong Yang
> > > > > > > > > > > wrote:
> > > > > > > > > > > > Thanks Jeff.
> > > > > > > > > > > >
> > > > > > > > > > > > I am ok with reverting the unhash/dispose list
> > > > > > > > > > > > reordering
> > > > > > > > > > > > in
> > > > > > > > > > > > nfsd_file_lru_cb(), as it doesn't make much of a
> > > > > > > > > > > > difference,
> > > > > > > > > > > > but for nfsd_file_cond_queue(), imagining this:
> > > > > > > > > > > >
> > > > > > > > > > > > - A nfsd_file is hashed
> > > > > > > > > > > > - In nfsd_file_cond_queue(), [if
> > > > > > > > > > > > (!nfsd_file_unhash(nf))]
> > > > > > > > > > > > will
> > > > > > > > > > > > get it
> > > > > > > > > > > > unhashed, doesn't it?
> > > > > > > > > > > > - It continues to get a reference by
> > > > > > > > > > > > nfsd_file_get()
> > > > > > > > > > > > - It continues to remove itself from LRU by
> > > > > > > > > > > > nfsd_file_lru_remove() if
> > > > > > > > > > > > it is on the LRU.
> > > > > > > > > > > > - Now it runs refcount_sub_and_test(), what
> > > > > > > > > > > > happens if
> > > > > > > > > > > > the refcnt
> > > > > > > > > > > > does
> > > > > > > > > > > > not go to 0? How can this nfsd_file be found
> > > > > > > > > > > > again?
> > > > > > > > > > > > Through the
> > > > > > > > > > > > hash
> > > > > > > > > > > > table? Through the LRU walk? how?
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks again.
> > > > > > > > > > > >
> > > > > > > > > > > > -Youzhong
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > It won't need to be found again. The holders of the
> > > > > > > > > > > extra
> > > > > > > > > > > references
> > > > > > > > > > > will put those references when they are finished.
> > > > > > > > > > > Since the
> > > > > > > > > > > object
> > > > > > > > > > > is
> > > > > > > > > > > no longer HASHED, nfsd_file_put just does this:
> > > > > > > > > > >
> > > > > > > > > > >         if (refcount_dec_and_test(&nf->nf_ref))
> > > > > > > > > > >                 nfsd_file_free(nf);
> > > > > > > > > > >
> > > > > > > > > > > So that should be fine.
> > > > > > > > > > >
> > > > > > > > > > > > On Mon, Jul 8, 2024 at 9:35 AM Jeff Layton
> > > > > > > > > > > > <jlayton@xxxxxxxxxx>
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Mon, 2024-07-08 at 08:58 -0400, Youzhong
> > > > > > > > > > > > > Yang wrote:
> > > > > > > > > > > > > > Thank you Jeff for your invaluable insights.
> > > > > > > > > > > > > > I was
> > > > > > > > > > > > > > leaning
> > > > > > > > > > > > > > towards
> > > > > > > > > > > > > > adding a new list_head too, and tested this
> > > > > > > > > > > > > > approach
> > > > > > > > > > > > > > on
> > > > > > > > > > > > > > kernel
> > > > > > > > > > > > > > 6.6 by
> > > > > > > > > > > > > > continuously hammering the server with heavy
> > > > > > > > > > > > > > nfs load
> > > > > > > > > > > > > > for the
> > > > > > > > > > > > > > last few
> > > > > > > > > > > > > > days, not a single leak.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Here goes the patch (based on Linux kernel
> > > > > > > > > > > > > > master
> > > > > > > > > > > > > > branch),
> > > > > > > > > > > > > > please
> > > > > > > > > > > > > > review:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > From: Youzhong Yang <youzhong@xxxxxxxxx>
> > > > > > > > > > > > > > Date: Thu, 4 Jul 2024 11:25:40 -0400
> > > > > > > > > > > > > > Subject: [PATCH] nfsd: fix nfsd_file leaking
> > > > > > > > > > > > > > due to
> > > > > > > > > > > > > > mixed use
> > > > > > > > > > > > > > of
> > > > > > > > > > > > > > nf->nf_lru
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > nfsd_file_put() in one thread can race with
> > > > > > > > > > > > > > another
> > > > > > > > > > > > > > thread
> > > > > > > > > > > > > > doing
> > > > > > > > > > > > > > garbage collection (running nfsd_file_gc() ->
> > > > > > > > > > > > > > list_lru_walk()
> > > > > > > > > > > > > > ->
> > > > > > > > > > > > > > nfsd_file_lru_cb()):
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >   * In nfsd_file_put(), nf->nf_ref is 1, so
> > > > > > > > > > > > > > it tries
> > > > > > > > > > > > > > to do
> > > > > > > > > > > > > > nfsd_file_lru_add().
> > > > > > > > > > > > > >   * nfsd_file_lru_add() returns true (with
> > > > > > > > > > > > > > NFSD_FILE_REFERENCED
> > > > > > > > > > > > > > bit set)
> > > > > > > > > > > > > >   * garbage collector kicks in,
> > > > > > > > > > > > > > nfsd_file_lru_cb()
> > > > > > > > > > > > > > clears
> > > > > > > > > > > > > > REFERENCED bit and
> > > > > > > > > > > > > >     returns LRU_ROTATE.
> > > > > > > > > > > > > >   * garbage collector kicks in again,
> > > > > > > > > > > > > > nfsd_file_lru_cb() now
> > > > > > > > > > > > > > decrements nf->nf_ref
> > > > > > > > > > > > > >     to 0, runs nfsd_file_unhash(), removes it
> > > > > > > > > > > > > > from
> > > > > > > > > > > > > > the LRU
> > > > > > > > > > > > > > and
> > > > > > > > > > > > > > adds to
> > > > > > > > > > > > > > the dispose
> > > > > > > > > > > > > >     list [list_lru_isolate_move(lru, &nf-
> > > > > > > > > > > > > > >nf_lru,
> > > > > > > > > > > > > > head)]
> > > > > > > > > > > > > >   * nfsd_file_put() detects NFSD_FILE_HASHED
> > > > > > > > > > > > > > bit is
> > > > > > > > > > > > > > cleared,
> > > > > > > > > > > > > > so
> > > > > > > > > > > > > > it
> > > > > > > > > > > > > > tries to remove
> > > > > > > > > > > > > >     the 'nf' from the LRU [if
> > > > > > > > > > > > > > (!nfsd_file_lru_remove(nf))].
> > > > > > > > > > > > > > The
> > > > > > > > > > > > > > 'nf'
> > > > > > > > > > > > > > has been added
> > > > > > > > > > > > > >     to the 'dispose' list by
> > > > > > > > > > > > > > nfsd_file_lru_cb(), so
> > > > > > > > > > > > > > nfsd_file_lru_remove(nf) simply
> > > > > > > > > > > > > >     treats it as part of the LRU and removes
> > > > > > > > > > > > > > it,
> > > > > > > > > > > > > > which leads
> > > > > > > > > > > > > > to
> > > > > > > > > > > > > > its removal from
> > > > > > > > > > > > > >     the 'dispose' list.
> > > > > > > > > > > > > >   * At this moment, 'nf' is unhashed with its
> > > > > > > > > > > > > > nf_ref
> > > > > > > > > > > > > > being 0,
> > > > > > > > > > > > > > and
> > > > > > > > > > > > > > not
> > > > > > > > > > > > > > on the LRU.
> > > > > > > > > > > > > >     nfsd_file_put() continues its execution
> > > > > > > > > > > > > > [if
> > > > > > > > > > > > > > (refcount_dec_and_test(&nf->nf_ref))],
> > > > > > > > > > > > > >     as nf->nf_ref is already 0, nf->nf_ref is
> > > > > > > > > > > > > > set to
> > > > > > > > > > > > > > REFCOUNT_SATURATED, and the 'nf'
> > > > > > > > > > > > > >     gets no chance of being freed.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > nfsd_file_put() can also race with
> > > > > > > > > > > > > > nfsd_file_cond_queue():
> > > > > > > > > > > > > >   * In nfsd_file_put(), nf->nf_ref is 1, so
> > > > > > > > > > > > > > it tries
> > > > > > > > > > > > > > to do
> > > > > > > > > > > > > > nfsd_file_lru_add().
> > > > > > > > > > > > > >   * nfsd_file_lru_add() sets REFERENCED bit
> > > > > > > > > > > > > > and
> > > > > > > > > > > > > > returns true.
> > > > > > > > > > > > > >   * Some userland application runs 'exportfs
> > > > > > > > > > > > > > -f' or
> > > > > > > > > > > > > > something
> > > > > > > > > > > > > > like
> > > > > > > > > > > > > > that, which triggers
> > > > > > > > > > > > > >     __nfsd_file_cache_purge() ->
> > > > > > > > > > > > > > nfsd_file_cond_queue().
> > > > > > > > > > > > > >   * In nfsd_file_cond_queue(), it runs [if
> > > > > > > > > > > > > > (!nfsd_file_unhash(nf))],
> > > > > > > > > > > > > > unhash is done
> > > > > > > > > > > > > >     successfully.
> > > > > > > > > > > > > >   * nfsd_file_cond_queue() runs [if
> > > > > > > > > > > > > > (!nfsd_file_get(nf))],
> > > > > > > > > > > > > > now
> > > > > > > > > > > > > > nf->nf_ref goes to 2.
> > > > > > > > > > > > > >   * nfsd_file_cond_queue() runs [if
> > > > > > > > > > > > > > (nfsd_file_lru_remove(nf))],
> > > > > > > > > > > > > > it succeeds.
> > > > > > > > > > > > > >   * nfsd_file_cond_queue() runs [if
> > > > > > > > > > > > > > (refcount_sub_and_test(decrement,
> > > > > > > > > > > > > > &nf->nf_ref))]
> > > > > > > > > > > > > >     (with "decrement" being 2), so the nf-
> > > > > > > > > > > > > > >nf_ref
> > > > > > > > > > > > > > goes to 0,
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > 'nf'
> > > > > > > > > > > > > > is added to the
> > > > > > > > > > > > > >     dispose list [list_add(&nf->nf_lru,
> > > > > > > > > > > > > > dispose)]
> > > > > > > > > > > > > >   * nfsd_file_put() detects NFSD_FILE_HASHED
> > > > > > > > > > > > > > bit is
> > > > > > > > > > > > > > cleared,
> > > > > > > > > > > > > > so
> > > > > > > > > > > > > > it
> > > > > > > > > > > > > > tries to remove
> > > > > > > > > > > > > >     the 'nf' from the LRU [if
> > > > > > > > > > > > > > (!nfsd_file_lru_remove(nf))],
> > > > > > > > > > > > > > although
> > > > > > > > > > > > > > the 'nf' is not
> > > > > > > > > > > > > >     in the LRU, but it is linked in the
> > > > > > > > > > > > > > 'dispose'
> > > > > > > > > > > > > > list,
> > > > > > > > > > > > > > nfsd_file_lru_remove() simply
> > > > > > > > > > > > > >     treats it as part of the LRU and removes
> > > > > > > > > > > > > > it. This
> > > > > > > > > > > > > > leads
> > > > > > > > > > > > > > to
> > > > > > > > > > > > > > its removal from
> > > > > > > > > > > > > >     the 'dispose' list!
> > > > > > > > > > > > > >   * Now nf->ref is 0, unhashed.
> > > > > > > > > > > > > > nfsd_file_put()
> > > > > > > > > > > > > > continues its
> > > > > > > > > > > > > > execution and set
> > > > > > > > > > > > > >     nf->nf_ref to REFCOUNT_SATURATED.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > As shown in the above analysis, using nf_lru
> > > > > > > > > > > > > > for both
> > > > > > > > > > > > > > the LRU
> > > > > > > > > > > > > > list and
> > > > > > > > > > > > > > dispose list
> > > > > > > > > > > > > > can cause the leaks. This patch adds a new
> > > > > > > > > > > > > > list_head
> > > > > > > > > > > > > > nf_gc in
> > > > > > > > > > > > > > struct
> > > > > > > > > > > > > > nfsd_file, and uses
> > > > > > > > > > > > > > it for the dispose list. It's not expected to
> > > > > > > > > > > > > > have a
> > > > > > > > > > > > > > nfsd_file
> > > > > > > > > > > > > > unhashed but it's not
> > > > > > > > > > > > > > added to the dispose list, so in
> > > > > > > > > > > > > > nfsd_file_cond_queue() and
> > > > > > > > > > > > > > nfsd_file_lru_cb() nfsd_file
> > > > > > > > > > > > > > is unhashed after being added to the dispose
> > > > > > > > > > > > > > list.
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > I don't see where we require the object to be
> > > > > > > > > > > > > either
> > > > > > > > > > > > > hashed or
> > > > > > > > > > > > > on
> > > > > > > > > > > > > the
> > > > > > > > > > > > > dispose list.  I think you probably just want
> > > > > > > > > > > > > to do a
> > > > > > > > > > > > > patch
> > > > > > > > > > > > > that
> > > > > > > > > > > > > changes the dispose list to use a dedicated
> > > > > > > > > > > > > list_head
> > > > > > > > > > > > > without
> > > > > > > > > > > > > reordering when the these things are unhashed.
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Signed-off-by: Youzhong Yang
> > > > > > > > > > > > > > <youzhong@xxxxxxxxx>
> > > > > > > > > > > > > > ---
> > > > > > > > > > > > > >  fs/nfsd/filecache.c | 23 ++++++++++++++-----
> > > > > > > > > > > > > > ----
> > > > > > > > > > > > > >  fs/nfsd/filecache.h |  1 +
> > > > > > > > > > > > > >  2 files changed, 15 insertions(+), 9
> > > > > > > > > > > > > > deletions(-)
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > diff --git a/fs/nfsd/filecache.c
> > > > > > > > > > > > > > b/fs/nfsd/filecache.c
> > > > > > > > > > > > > > index ad9083ca144b..3aef2ddfce94 100644
> > > > > > > > > > > > > > --- a/fs/nfsd/filecache.c
> > > > > > > > > > > > > > +++ b/fs/nfsd/filecache.c
> > > > > > > > > > > > > > @@ -216,6 +216,7 @@ nfsd_file_alloc(struct
> > > > > > > > > > > > > > net *net,
> > > > > > > > > > > > > > struct
> > > > > > > > > > > > > > inode
> > > > > > > > > > > > > > *inode, unsigned char need,
> > > > > > > > > > > > > >                 return NULL;
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >         INIT_LIST_HEAD(&nf->nf_lru);
> > > > > > > > > > > > > > +       INIT_LIST_HEAD(&nf->nf_gc);
> > > > > > > > > > > > > >         nf->nf_birthtime = ktime_get();
> > > > > > > > > > > > > >         nf->nf_file = NULL;
> > > > > > > > > > > > > >         nf->nf_cred = get_current_cred();
> > > > > > > > > > > > > > @@ -393,8 +394,8 @@
> > > > > > > > > > > > > > nfsd_file_dispose_list(struct
> > > > > > > > > > > > > > list_head
> > > > > > > > > > > > > > *dispose)
> > > > > > > > > > > > > >         struct nfsd_file *nf;
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >         while (!list_empty(dispose)) {
> > > > > > > > > > > > > > -               nf =
> > > > > > > > > > > > > > list_first_entry(dispose, struct
> > > > > > > > > > > > > > nfsd_file,
> > > > > > > > > > > > > > nf_lru);
> > > > > > > > > > > > > > -               list_del_init(&nf->nf_lru);
> > > > > > > > > > > > > > +               nf =
> > > > > > > > > > > > > > list_first_entry(dispose, struct
> > > > > > > > > > > > > > nfsd_file,
> > > > > > > > > > > > > > nf_gc);
> > > > > > > > > > > > > > +               list_del_init(&nf->nf_gc);
> > > > > > > > > > > > > >                 nfsd_file_free(nf);
> > > > > > > > > > > > > >         }
> > > > > > > > > > > > > >  }
> > > > > > > > > > > > > > @@ -411,12 +412,12 @@
> > > > > > > > > > > > > > nfsd_file_dispose_list_delayed(struct
> > > > > > > > > > > > > > list_head *dispose)
> > > > > > > > > > > > > >  {
> > > > > > > > > > > > > >         while(!list_empty(dispose)) {
> > > > > > > > > > > > > >                 struct nfsd_file *nf =
> > > > > > > > > > > > > > list_first_entry(dispose,
> > > > > > > > > > > > > > -
> > > > > > > > > > > > > > struct
> > > > > > > > > > > > > > nfsd_file,
> > > > > > > > > > > > > > nf_lru);
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > struct
> > > > > > > > > > > > > > nfsd_file,
> > > > > > > > > > > > > > nf_gc);
> > > > > > > > > > > > > >                 struct nfsd_net *nn =
> > > > > > > > > > > > > > net_generic(nf-
> > > > > > > > > > > > > > > nf_net,
> > > > > > > > > > > > > > nfsd_net_id);
> > > > > > > > > > > > > >                 struct nfsd_fcache_disposal
> > > > > > > > > > > > > > *l = nn-
> > > > > > > > > > > > > > > fcache_disposal;
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >                 spin_lock(&l->lock);
> > > > > > > > > > > > > > -               list_move_tail(&nf->nf_lru,
> > > > > > > > > > > > > > &l-
> > > > > > > > > > > > > > > freeme);
> > > > > > > > > > > > > > +               list_move_tail(&nf->nf_gc,
> > > > > > > > > > > > > > &l-
> > > > > > > > > > > > > > > freeme);
> > > > > > > > > > > > > >                 spin_unlock(&l->lock);
> > > > > > > > > > > > > >                 svc_wake_up(nn->nfsd_serv);
> > > > > > > > > > > > > >         }
> > > > > > > > > > > > > > @@ -502,8 +503,10 @@ nfsd_file_lru_cb(struct
> > > > > > > > > > > > > > list_head *item,
> > > > > > > > > > > > > > struct
> > > > > > > > > > > > > > list_lru_one *lru,
> > > > > > > > > > > > > >         }
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >         /* Refcount went to zero. Unhash it
> > > > > > > > > > > > > > and queue
> > > > > > > > > > > > > > it to
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > dispose list */
> > > > > > > > > > > > > > +       list_lru_isolate(lru, &nf->nf_lru);
> > > > > > > > > > > > > > +       list_add(&nf->nf_gc, head);
> > > > > > > > > > > > > > +       /* Unhash after removing from LRU and
> > > > > > > > > > > > > > adding
> > > > > > > > > > > > > > to
> > > > > > > > > > > > > > dispose
> > > > > > > > > > > > > > list */
> > > > > > > > > > > > > >         nfsd_file_unhash(nf);
> > > > > > > > > > > > > > -       list_lru_isolate_move(lru, &nf-
> > > > > > > > > > > > > > >nf_lru,
> > > > > > > > > > > > > > head);
> > > > > > > > > > > > >
> > > > > > > > > > > > > I don't see the point in reordering these
> > > > > > > > > > > > > operations.
> > > > > > > > > > > > > Hashing
> > > > > > > > > > > > > is
> > > > > > > > > > > > > all
> > > > > > > > > > > > > about making the thing findable by nfsd
> > > > > > > > > > > > > operations. The
> > > > > > > > > > > > > _last_
> > > > > > > > > > > > > thing we
> > > > > > > > > > > > > want to do is put it on the dispose list while
> > > > > > > > > > > > > the
> > > > > > > > > > > > > thing can
> > > > > > > > > > > > > still
> > > > > > > > > > > > > be
> > > > > > > > > > > > > found by nfsd threads doing operations.
> > > > > > > > > > > > >
> > > > > > > > > > > > > >         this_cpu_inc(nfsd_file_evictions);
> > > > > > > > > > > > > >         trace_nfsd_file_gc_disposed(nf);
> > > > > > > > > > > > > >         return LRU_REMOVED;
> > > > > > > > > > > > > > @@ -565,7 +568,7 @@
> > > > > > > > > > > > > > nfsd_file_cond_queue(struct
> > > > > > > > > > > > > > nfsd_file
> > > > > > > > > > > > > > *nf,
> > > > > > > > > > > > > > struct
> > > > > > > > > > > > > > list_head *dispose)
> > > > > > > > > > > > > >         int decrement = 1;
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >         /* If we raced with someone else
> > > > > > > > > > > > > > unhashing,
> > > > > > > > > > > > > > ignore it
> > > > > > > > > > > > > > */
> > > > > > > > > > > > > > -       if (!nfsd_file_unhash(nf))
> > > > > > > > > > > > > > +       if (!test_bit(NFSD_FILE_HASHED, &nf-
> > > > > > > > > > > > > > > nf_flags))
> > > > > > > > > > > > >
> > > > > > > > > > > > > The above change looks wrong. I don't think we
> > > > > > > > > > > > > need to
> > > > > > > > > > > > > change
> > > > > > > > > > > > > this.
> > > > > > > > > > > > >
> > > > > > > > > > > > > >                 return;
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >         /* If we can't get a reference,
> > > > > > > > > > > > > > ignore it */
> > > > > > > > > > > > > > @@ -578,7 +581,9 @@
> > > > > > > > > > > > > > nfsd_file_cond_queue(struct
> > > > > > > > > > > > > > nfsd_file
> > > > > > > > > > > > > > *nf,
> > > > > > > > > > > > > > struct
> > > > > > > > > > > > > > list_head *dispose)
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >         /* If refcount goes to 0, then put on
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > dispose
> > > > > > > > > > > > > > list */
> > > > > > > > > > > > > >         if (refcount_sub_and_test(decrement,
> > > > > > > > > > > > > > &nf-
> > > > > > > > > > > > > > > nf_ref)) {
> > > > > > > > > > > > > > -               list_add(&nf->nf_lru,
> > > > > > > > > > > > > > dispose);
> > > > > > > > > > > > > > +               list_add(&nf->nf_gc,
> > > > > > > > > > > > > > dispose);
> > > > > > > > > > > > > > +               /* Unhash after adding to
> > > > > > > > > > > > > > dispose
> > > > > > > > > > > > > > list */
> > > > > > > > > > > > > > +               nfsd_file_unhash(nf);
> > > > > > > > > > > > >
> > > > > > > > > > > > > This too looks wrong? Maybe I'm unclear on the
> > > > > > > > > > > > > race
> > > > > > > > > > > > > you're
> > > > > > > > > > > > > trying
> > > > > > > > > > > > > to
> > > > > > > > > > > > > fix with this? What's the harm in unhashing it
> > > > > > > > > > > > > early?
> > > > > > > > > > > > >
> > > > > > > > > > > > > >                 trace_nfsd_file_closing(nf);
> > > > > > > > > > > > > >         }
> > > > > > > > > > > > > >  }
> > > > > > > > > > > > > > @@ -654,8 +659,8 @@
> > > > > > > > > > > > > > nfsd_file_close_inode_sync(struct
> > > > > > > > > > > > > > inode
> > > > > > > > > > > > > > *inode)
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >         nfsd_file_queue_for_close(inode,
> > > > > > > > > > > > > > &dispose);
> > > > > > > > > > > > > >         while (!list_empty(&dispose)) {
> > > > > > > > > > > > > > -               nf =
> > > > > > > > > > > > > > list_first_entry(&dispose,
> > > > > > > > > > > > > > struct
> > > > > > > > > > > > > > nfsd_file,
> > > > > > > > > > > > > > nf_lru);
> > > > > > > > > > > > > > -               list_del_init(&nf->nf_lru);
> > > > > > > > > > > > > > +               nf =
> > > > > > > > > > > > > > list_first_entry(&dispose,
> > > > > > > > > > > > > > struct
> > > > > > > > > > > > > > nfsd_file,
> > > > > > > > > > > > > > nf_gc);
> > > > > > > > > > > > > > +               list_del_init(&nf->nf_gc);
> > > > > > > > > > > > > >                 nfsd_file_free(nf);
> > > > > > > > > > > > > >         }
> > > > > > > > > > > > > >  }
> > > > > > > > > > > > > > diff --git a/fs/nfsd/filecache.h
> > > > > > > > > > > > > > b/fs/nfsd/filecache.h
> > > > > > > > > > > > > > index c61884def906..3fbec24eea6c 100644
> > > > > > > > > > > > > > --- a/fs/nfsd/filecache.h
> > > > > > > > > > > > > > +++ b/fs/nfsd/filecache.h
> > > > > > > > > > > > > > @@ -44,6 +44,7 @@ struct nfsd_file {
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >         struct nfsd_file_mark   *nf_mark;
> > > > > > > > > > > > > >         struct list_head        nf_lru;
> > > > > > > > > > > > > > +       struct list_head        nf_gc;
> > > > > > > > > > > > > >         struct rcu_head         nf_rcu;
> > > > > > > > > > > > > >         ktime_t                 nf_birthtime;
> > > > > > > > > > > > > >  };
> > > > > > > > > > > > > > --
> > > > > > > > > > > > > > 2.34.1
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Thu, Jul 4, 2024 at 7:14 AM Jeff Layton
> > > > > > > > > > > > > > <jlayton@xxxxxxxxxxxxxxx> wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Wed, 2024-07-03 at 16:46 -0400, Youzhong
> > > > > > > > > > > > > > > Yang
> > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > Thank you Chuck. Here are my quick
> > > > > > > > > > > > > > > > answers to
> > > > > > > > > > > > > > > > your
> > > > > > > > > > > > > > > > comments:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > - I don't have a quick reproducer. I
> > > > > > > > > > > > > > > > reproduced
> > > > > > > > > > > > > > > > it by
> > > > > > > > > > > > > > > > using
> > > > > > > > > > > > > > > > hundreds
> > > > > > > > > > > > > > > > of nfs clients generating +600K ops under
> > > > > > > > > > > > > > > > our
> > > > > > > > > > > > > > > > workload in
> > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > testing
> > > > > > > > > > > > > > > > environment. Theoretically it should be
> > > > > > > > > > > > > > > > possible
> > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > simplify
> > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > reproduction but I am still working on
> > > > > > > > > > > > > > > > it.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > -  I understand zfs is an out-of-tree
> > > > > > > > > > > > > > > > file
> > > > > > > > > > > > > > > > system. That's
> > > > > > > > > > > > > > > > fine. But
> > > > > > > > > > > > > > > > this leaking can happen to any file
> > > > > > > > > > > > > > > > system, and
> > > > > > > > > > > > > > > > leaking
> > > > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > not a good
> > > > > > > > > > > > > > > > thing no matter what file system it is.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > -  I will try to come up with a
> > > > > > > > > > > > > > > > reproducer using
> > > > > > > > > > > > > > > > xfs or
> > > > > > > > > > > > > > > > btrfs
> > > > > > > > > > > > > > > > if possible.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Now back to the problem itself, here are
> > > > > > > > > > > > > > > > my
> > > > > > > > > > > > > > > > findings:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > - nfsd_file_put() in one thread can race
> > > > > > > > > > > > > > > > with
> > > > > > > > > > > > > > > > another
> > > > > > > > > > > > > > > > thread
> > > > > > > > > > > > > > > > doing
> > > > > > > > > > > > > > > > garbage collection (running
> > > > > > > > > > > > > > > > nfsd_file_gc() ->
> > > > > > > > > > > > > > > > list_lru_walk()
> > > > > > > > > > > > > > > > ->
> > > > > > > > > > > > > > > > nfsd_file_lru_cb()):
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >   * In nfsd_file_put(), nf->nf_ref is 1,
> > > > > > > > > > > > > > > > so it
> > > > > > > > > > > > > > > > tries to
> > > > > > > > > > > > > > > > do
> > > > > > > > > > > > > > > > nfsd_file_lru_add().
> > > > > > > > > > > > > > > >   * nfsd_file_lru_add() returns true
> > > > > > > > > > > > > > > > (thus
> > > > > > > > > > > > > > > > NFSD_FILE_REFERENCED bit
> > > > > > > > > > > > > > > > set for nf->nf_flags)
> > > > > > > > > > > > > > > >   * garbage collector kicks in,
> > > > > > > > > > > > > > > > nfsd_file_lru_cb() clears
> > > > > > > > > > > > > > > > REFERENCED
> > > > > > > > > > > > > > > > bit and returns LRU_ROTATE.
> > > > > > > > > > > > > > > >   * garbage collector kicks in again,
> > > > > > > > > > > > > > > > nfsd_file_lru_cb()
> > > > > > > > > > > > > > > > now
> > > > > > > > > > > > > > > > decrements nf->nf_ref to 0, runs
> > > > > > > > > > > > > > > > nfsd_file_unhash(),
> > > > > > > > > > > > > > > > removes
> > > > > > > > > > > > > > > > it from
> > > > > > > > > > > > > > > > the LRU and adds to the dispose list
> > > > > > > > > > > > > > > > [list_lru_isolate_move(lru,
> > > > > > > > > > > > > > > > &nf->nf_lru, head);]
> > > > > > > > > > > > > > > >   * nfsd_file_put() detects
> > > > > > > > > > > > > > > > NFSD_FILE_HASHED bit
> > > > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > cleared,
> > > > > > > > > > > > > > > > so it
> > > > > > > > > > > > > > > > tries to remove the 'nf' from the LRU [if
> > > > > > > > > > > > > > > > (!nfsd_file_lru_remove(nf))]
> > > > > > > > > > > > > > > >   * The 'nf' has been added to the
> > > > > > > > > > > > > > > > 'dispose' list
> > > > > > > > > > > > > > > > by
> > > > > > > > > > > > > > > > nfsd_file_lru_cb(), so
> > > > > > > > > > > > > > > > nfsd_file_lru_remove(nf)
> > > > > > > > > > > > > > > > simply
> > > > > > > > > > > > > > > > treats
> > > > > > > > > > > > > > > > it as
> > > > > > > > > > > > > > > > part of the LRU and removes it, which
> > > > > > > > > > > > > > > > leads it to
> > > > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > > removed
> > > > > > > > > > > > > > > > from the
> > > > > > > > > > > > > > > > 'dispose' list.
> > > > > > > > > > > > > > > >   * At this moment, nf->nf_ref is 0, it's
> > > > > > > > > > > > > > > > unhashed, and
> > > > > > > > > > > > > > > > not
> > > > > > > > > > > > > > > > on the
> > > > > > > > > > > > > > > > LRU. nfsd_file_put() continues its
> > > > > > > > > > > > > > > > execution [if
> > > > > > > > > > > > > > > > (refcount_dec_and_test(&nf->nf_ref))], as
> > > > > > > > > > > > > > > > nf-
> > > > > > > > > > > > > > > > > nf_ref is
> > > > > > > > > > > > > > > > already 0, now
> > > > > > > > > > > > > > > > bad thing happens: nf->nf_ref is set to
> > > > > > > > > > > > > > > > REFCOUNT_SATURATED,
> > > > > > > > > > > > > > > > and the
> > > > > > > > > > > > > > > > 'nf' is leaked.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > To make this happen, the right timing is
> > > > > > > > > > > > > > > > crucial.
> > > > > > > > > > > > > > > > It can
> > > > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > > reproduced
> > > > > > > > > > > > > > > > by adding artifical delays in
> > > > > > > > > > > > > > > > filecache.c, or
> > > > > > > > > > > > > > > > hammering
> > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > nfsd with
> > > > > > > > > > > > > > > > tons of ops.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > - Let's see how nfsd_file_put() can race
> > > > > > > > > > > > > > > > with
> > > > > > > > > > > > > > > > nfsd_file_cond_queue():
> > > > > > > > > > > > > > > >   * In nfsd_file_put(), nf->nf_ref is 1,
> > > > > > > > > > > > > > > > so it
> > > > > > > > > > > > > > > > tries to
> > > > > > > > > > > > > > > > do
> > > > > > > > > > > > > > > > nfsd_file_lru_add().
> > > > > > > > > > > > > > > >   * nfsd_file_lru_add() sets REFERENCED
> > > > > > > > > > > > > > > > bit and
> > > > > > > > > > > > > > > > returns
> > > > > > > > > > > > > > > > true.
> > > > > > > > > > > > > > > >   * 'exportfs -f' or something like that
> > > > > > > > > > > > > > > > triggers
> > > > > > > > > > > > > > > > __nfsd_file_cache_purge() ->
> > > > > > > > > > > > > > > > nfsd_file_cond_queue().
> > > > > > > > > > > > > > > >   * In nfsd_file_cond_queue(), it runs
> > > > > > > > > > > > > > > > [if
> > > > > > > > > > > > > > > > (!nfsd_file_unhash(nf))],
> > > > > > > > > > > > > > > > unhash is done successfully.
> > > > > > > > > > > > > > > >   * nfsd_file_cond_queue() runs [if
> > > > > > > > > > > > > > > > (!nfsd_file_get(nf))],
> > > > > > > > > > > > > > > > now
> > > > > > > > > > > > > > > > nf->nf_ref goes to 2.
> > > > > > > > > > > > > > > >   * nfsd_file_cond_queue() runs [if
> > > > > > > > > > > > > > > > (nfsd_file_lru_remove(nf))], it succeeds.
> > > > > > > > > > > > > > > >   * nfsd_file_cond_queue() runs [if
> > > > > > > > > > > > > > > > (refcount_sub_and_test(decrement,
> > > > > > > > > > > > > > > > &nf->nf_ref))] (with "decrement" being
> > > > > > > > > > > > > > > > 2), so the
> > > > > > > > > > > > > > > > nf-
> > > > > > > > > > > > > > > > > nf_ref
> > > > > > > > > > > > > > > > goes to
> > > > > > > > > > > > > > > > 0, the 'nf' is added to the dispost list
> > > > > > > > > > > > > > > > [list_add(&nf-
> > > > > > > > > > > > > > > > > nf_lru,
> > > > > > > > > > > > > > > > dispose)]
> > > > > > > > > > > > > > > >   * nfsd_file_put() detects
> > > > > > > > > > > > > > > > NFSD_FILE_HASHED bit
> > > > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > cleared,
> > > > > > > > > > > > > > > > so it
> > > > > > > > > > > > > > > > tries to remove the 'nf' from the LRU [if
> > > > > > > > > > > > > > > > (!nfsd_file_lru_remove(nf))], although
> > > > > > > > > > > > > > > > the 'nf'
> > > > > > > > > > > > > > > > is not in
> > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > LRU, but
> > > > > > > > > > > > > > > > it is linked in the 'dispose' list,
> > > > > > > > > > > > > > > > nfsd_file_lru_remove()
> > > > > > > > > > > > > > > > simply
> > > > > > > > > > > > > > > > treats it as part of the LRU and removes
> > > > > > > > > > > > > > > > it. This
> > > > > > > > > > > > > > > > leads
> > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > its removal
> > > > > > > > > > > > > > > > from the 'dispose' list!
> > > > > > > > > > > > > > > >   * Now nf->ref is 0, unhashed.
> > > > > > > > > > > > > > > > nfsd_file_put()
> > > > > > > > > > > > > > > > continues
> > > > > > > > > > > > > > > > its
> > > > > > > > > > > > > > > > execution and sets nf->nf_ref to
> > > > > > > > > > > > > > > > REFCOUNT_SATURATED.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > The purpose of nf->nf_lru is problematic.
> > > > > > > > > > > > > > > > As you
> > > > > > > > > > > > > > > > can see,
> > > > > > > > > > > > > > > > it
> > > > > > > > > > > > > > > > is used
> > > > > > > > > > > > > > > > for the LRU list, and also the 'dispose'
> > > > > > > > > > > > > > > > list.
> > > > > > > > > > > > > > > > Adding
> > > > > > > > > > > > > > > > another
> > > > > > > > > > > > > > > > 'struct
> > > > > > > > > > > > > > > > list_head' specifically for the 'dispose'
> > > > > > > > > > > > > > > > list
> > > > > > > > > > > > > > > > seems to
> > > > > > > > > > > > > > > > be a
> > > > > > > > > > > > > > > > better
> > > > > > > > > > > > > > > > way of fixing this race condition. Either
> > > > > > > > > > > > > > > > way
> > > > > > > > > > > > > > > > works for
> > > > > > > > > > > > > > > > me.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Would you agree my above analysis makes
> > > > > > > > > > > > > > > > sense?
> > > > > > > > > > > > > > > > Thanks.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I think so. It's been a while since I've
> > > > > > > > > > > > > > > done much
> > > > > > > > > > > > > > > work in
> > > > > > > > > > > > > > > this
> > > > > > > > > > > > > > > code,
> > > > > > > > > > > > > > > but it does sound like there is a race in
> > > > > > > > > > > > > > > the LRU
> > > > > > > > > > > > > > > handling.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Like Chuck said, the nf->nf_lru list should
> > > > > > > > > > > > > > > be safe
> > > > > > > > > > > > > > > to use
> > > > > > > > > > > > > > > for
> > > > > > > > > > > > > > > multiple
> > > > > > > > > > > > > > > purposes, but that's only the case if we're
> > > > > > > > > > > > > > > not
> > > > > > > > > > > > > > > using that
> > > > > > > > > > > > > > > list
> > > > > > > > > > > > > > > as an
> > > > > > > > > > > > > > > indicator.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > The list_lru code does check this:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >     if (!list_empty(item)) {
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > ...so if we ever check this while it's
> > > > > > > > > > > > > > > sitting on
> > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > dispose
> > > > > > > > > > > > > > > list, it
> > > > > > > > > > > > > > > will handle it incorrectly. It sounds like
> > > > > > > > > > > > > > > that's
> > > > > > > > > > > > > > > the root
> > > > > > > > > > > > > > > cause of the
> > > > > > > > > > > > > > > problem you're seeing?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > If so, then maybe a separate list_head for
> > > > > > > > > > > > > > > disposal
> > > > > > > > > > > > > > > would
> > > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > better.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Here is my patch with signed-off-by:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > From: Youzhong Yang <youzhong@xxxxxxxxx>
> > > > > > > > > > > > > > > > Date: Mon, 1 Jul 2024 06:45:22 -0400
> > > > > > > > > > > > > > > > Subject: [PATCH] nfsd: fix nfsd_file
> > > > > > > > > > > > > > > > leaking due
> > > > > > > > > > > > > > > > to race
> > > > > > > > > > > > > > > > condition and early
> > > > > > > > > > > > > > > >  unhash
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Signed-off-by: Youzhong Yang
> > > > > > > > > > > > > > > > <youzhong@xxxxxxxxx>
> > > > > > > > > > > > > > > > ---
> > > > > > > > > > > > > > > >  fs/nfsd/filecache.c | 14 +++++++++++++-
> > > > > > > > > > > > > > > >  1 file changed, 13 insertions(+), 1
> > > > > > > > > > > > > > > > deletion(-)
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > diff --git a/fs/nfsd/filecache.c
> > > > > > > > > > > > > > > > b/fs/nfsd/filecache.c
> > > > > > > > > > > > > > > > index 1a6d5d000b85..2323829f7208 100644
> > > > > > > > > > > > > > > > --- a/fs/nfsd/filecache.c
> > > > > > > > > > > > > > > > +++ b/fs/nfsd/filecache.c
> > > > > > > > > > > > > > > > @@ -389,6 +389,17 @@ nfsd_file_put(struct
> > > > > > > > > > > > > > > > nfsd_file *nf)
> > > > > > > > > > > > > > > >                         if
> > > > > > > > > > > > > > > > (!nfsd_file_lru_remove(nf))
> > > > > > > > > > > > > > > >                                 return;
> > > > > > > > > > > > > > > >                 }
> > > > > > > > > > > > > > > > +               /*
> > > > > > > > > > > > > > > > +                * Racing with
> > > > > > > > > > > > > > > > nfsd_file_cond_queue() or
> > > > > > > > > > > > > > > > nfsd_file_lru_cb(),
> > > > > > > > > > > > > > > > +                * it's unhashed but then
> > > > > > > > > > > > > > > > removed
> > > > > > > > > > > > > > > > from
> > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > dispose list,
> > > > > > > > > > > > > > > > +                * so we need to free it.
> > > > > > > > > > > > > > > > +                */
> > > > > > > > > > > > > > > > +               if (refcount_read(&nf-
> > > > > > > > > > > > > > > > >nf_ref) ==
> > > > > > > > > > > > > > > > 0 &&
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > A refcount_read in this path is a red flag
> > > > > > > > > > > > > > > to me.
> > > > > > > > > > > > > > > Anytime
> > > > > > > > > > > > > > > you're just
> > > > > > > > > > > > > > > looking at the refcount without changing
> > > > > > > > > > > > > > > anything
> > > > > > > > > > > > > > > just
> > > > > > > > > > > > > > > screams
> > > > > > > > > > > > > > > out
> > > > > > > > > > > > > > > "race condition".
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > In this case, what guarantee is there that
> > > > > > > > > > > > > > > this
> > > > > > > > > > > > > > > won't run
> > > > > > > > > > > > > > > afoul
> > > > > > > > > > > > > > > of the
> > > > > > > > > > > > > > > timing? We could check this and find out
> > > > > > > > > > > > > > > it's 1
> > > > > > > > > > > > > > > just before
> > > > > > > > > > > > > > > it
> > > > > > > > > > > > > > > goes to
> > > > > > > > > > > > > > > 0 and you check the other conditions.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Does anything prevent that?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > !test_bit(NFSD_FILE_HASHED,
> > > > > > > > > > > > > > > > &nf-
> > > > > > > > > > > > > > > > > nf_flags) &&
> > > > > > > > > > > > > > > > +                   list_empty(&nf-
> > > > > > > > > > > > > > > > >nf_lru)) {
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > nfsd_file_free(nf);
> > > > > > > > > > > > > > > > +                       return;
> > > > > > > > > > > > > > > > +               }
> > > > > > > > > > > > > > > >         }
> > > > > > > > > > > > > > > >         if (refcount_dec_and_test(&nf-
> > > > > > > > > > > > > > > > >nf_ref))
> > > > > > > > > > > > > > > >                 nfsd_file_free(nf);
> > > > > > > > > > > > > > > > @@ -576,7 +587,7 @@
> > > > > > > > > > > > > > > > nfsd_file_cond_queue(struct
> > > > > > > > > > > > > > > > nfsd_file
> > > > > > > > > > > > > > > > *nf, struct
> > > > > > > > > > > > > > > > list_head *dispose)
> > > > > > > > > > > > > > > >         int decrement = 1;
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >         /* If we raced with someone else
> > > > > > > > > > > > > > > > unhashing,
> > > > > > > > > > > > > > > > ignore it
> > > > > > > > > > > > > > > > */
> > > > > > > > > > > > > > > > -       if (!nfsd_file_unhash(nf))
> > > > > > > > > > > > > > > > +       if (!test_bit(NFSD_FILE_HASHED,
> > > > > > > > > > > > > > > > &nf-
> > > > > > > > > > > > > > > > > nf_flags))
> > > > > > > > > > > > > > > >                 return;
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Same here: you're just testing for the
> > > > > > > > > > > > > > > HASHED bit,
> > > > > > > > > > > > > > > but
> > > > > > > > > > > > > > > could
> > > > > > > > > > > > > > > this also
> > > > > > > > > > > > > > > race with someone who is setting it just
> > > > > > > > > > > > > > > after you
> > > > > > > > > > > > > > > get
> > > > > > > > > > > > > > > here.
> > > > > > > > > > > > > > > Why is
> > > > > > > > > > > > > > > that not a problem?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >         /* If we can't get a reference,
> > > > > > > > > > > > > > > > ignore it
> > > > > > > > > > > > > > > > */
> > > > > > > > > > > > > > > > @@ -590,6 +601,7 @@
> > > > > > > > > > > > > > > > nfsd_file_cond_queue(struct
> > > > > > > > > > > > > > > > nfsd_file
> > > > > > > > > > > > > > > > *nf, struct
> > > > > > > > > > > > > > > > list_head *dispose)
> > > > > > > > > > > > > > > >         /* If refcount goes to 0, then
> > > > > > > > > > > > > > > > put on the
> > > > > > > > > > > > > > > > dispose
> > > > > > > > > > > > > > > > list */
> > > > > > > > > > > > > > > >         if
> > > > > > > > > > > > > > > > (refcount_sub_and_test(decrement, &nf-
> > > > > > > > > > > > > > > > > nf_ref)) {
> > > > > > > > > > > > > > > >                 list_add(&nf->nf_lru,
> > > > > > > > > > > > > > > > dispose);
> > > > > > > > > > > > > > > > +               nfsd_file_unhash(nf);
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > trace_nfsd_file_closing(nf);
> > > > > > > > > > > > > > > >         }
> > > > > > > > > > > > > > > >  }
> > > > > > > > > > > > > > > > --
> > > > > > > > > > > > > > > > 2.43.0
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On Wed, Jul 3, 2024 at 2:21 PM Chuck
> > > > > > > > > > > > > > > > Lever
> > > > > > > > > > > > > > > > <chuck.lever@xxxxxxxxxx> wrote:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > On Wed, Jul 03, 2024 at 10:12:33AM -
> > > > > > > > > > > > > > > > > 0400,
> > > > > > > > > > > > > > > > > Youzhong Yang
> > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > Hello,
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > I'd like to report a nfsd_file
> > > > > > > > > > > > > > > > > > leaking issue
> > > > > > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > > propose
> > > > > > > > > > > > > > > > > > a fix for it.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > When I tested Linux kernel 6.8 and
> > > > > > > > > > > > > > > > > > 6.6, I
> > > > > > > > > > > > > > > > > > noticed
> > > > > > > > > > > > > > > > > > nfsd_file leaks
> > > > > > > > > > > > > > > > > > which led to undestroyable file
> > > > > > > > > > > > > > > > > > systems
> > > > > > > > > > > > > > > > > > (zfs),
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Thanks for the report. Some initial
> > > > > > > > > > > > > > > > > comments:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > - Do you have a specific reproducer? In
> > > > > > > > > > > > > > > > > other
> > > > > > > > > > > > > > > > > words,
> > > > > > > > > > > > > > > > > what
> > > > > > > > > > > > > > > > > is the
> > > > > > > > > > > > > > > > >   simplest program that can run on an
> > > > > > > > > > > > > > > > > NFS
> > > > > > > > > > > > > > > > > client that
> > > > > > > > > > > > > > > > > will
> > > > > > > > > > > > > > > > > trigger
> > > > > > > > > > > > > > > > >   this leak, and can you post it?
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > - "zfs" is an out-of-tree file system,
> > > > > > > > > > > > > > > > > so it's
> > > > > > > > > > > > > > > > > not
> > > > > > > > > > > > > > > > > directly
> > > > > > > > > > > > > > > > >   supported for NFSD.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > - The guidelines for patch submission
> > > > > > > > > > > > > > > > > require
> > > > > > > > > > > > > > > > > us to fix
> > > > > > > > > > > > > > > > > issues in
> > > > > > > > > > > > > > > > >   upstream Linux first (currently
> > > > > > > > > > > > > > > > > that's v6.10-
> > > > > > > > > > > > > > > > > rc6).
> > > > > > > > > > > > > > > > > Then
> > > > > > > > > > > > > > > > > that fix
> > > > > > > > > > > > > > > > >   can be backported to older stable
> > > > > > > > > > > > > > > > > kernels
> > > > > > > > > > > > > > > > > like 6.6.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Can you reproduce the leak with one of
> > > > > > > > > > > > > > > > > the in-
> > > > > > > > > > > > > > > > > kernel
> > > > > > > > > > > > > > > > > filesystems
> > > > > > > > > > > > > > > > > (either xfs or btrfs would be great)
> > > > > > > > > > > > > > > > > and with
> > > > > > > > > > > > > > > > > NFSD in
> > > > > > > > > > > > > > > > > 6.10-
> > > > > > > > > > > > > > > > > rc6?
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > One more comment below.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > here are some examples:
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > crash> struct nfsd_file -x
> > > > > > > > > > > > > > > > > > ffff88e160db0460
> > > > > > > > > > > > > > > > > > struct nfsd_file {
> > > > > > > > > > > > > > > > > >   nf_rlist = {
> > > > > > > > > > > > > > > > > >     rhead = {
> > > > > > > > > > > > > > > > > >       next = 0xffff8921fa2392f1
> > > > > > > > > > > > > > > > > >     },
> > > > > > > > > > > > > > > > > >     next = 0x0
> > > > > > > > > > > > > > > > > >   },
> > > > > > > > > > > > > > > > > >   nf_inode = 0xffff8882bc312ef8,
> > > > > > > > > > > > > > > > > >   nf_file = 0xffff88e2015b1500,
> > > > > > > > > > > > > > > > > >   nf_cred = 0xffff88e3ab0e7800,
> > > > > > > > > > > > > > > > > >   nf_net = 0xffffffff83d41600
> > > > > > > > > > > > > > > > > > <init_net>,
> > > > > > > > > > > > > > > > > >   nf_flags = 0x8,
> > > > > > > > > > > > > > > > > >   nf_ref = {
> > > > > > > > > > > > > > > > > >     refs = {
> > > > > > > > > > > > > > > > > >       counter = 0xc0000000
> > > > > > > > > > > > > > > > > >     }
> > > > > > > > > > > > > > > > > >   },
> > > > > > > > > > > > > > > > > >   nf_may = 0x4,
> > > > > > > > > > > > > > > > > >   nf_mark = 0xffff88e1bddfb320,
> > > > > > > > > > > > > > > > > >   nf_lru = {
> > > > > > > > > > > > > > > > > >     next = 0xffff88e160db04a8,
> > > > > > > > > > > > > > > > > >     prev = 0xffff88e160db04a8
> > > > > > > > > > > > > > > > > >   },
> > > > > > > > > > > > > > > > > >   nf_rcu = {
> > > > > > > > > > > > > > > > > >     next = 0x10000000000,
> > > > > > > > > > > > > > > > > >     func = 0x0
> > > > > > > > > > > > > > > > > >   },
> > > > > > > > > > > > > > > > > >   nf_birthtime = 0x73d22fc1728
> > > > > > > > > > > > > > > > > > }
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > crash> struct
> > > > > > > > > > > > > > > > > > nfsd_file.nf_flags,nf_ref.refs.counte
> > > > > > > > > > > > > > > > > > r,nf_lru
> > > > > > > > > > > > > > > > > > ,nf_file
> > > > > > > > > > > > > > > > > > -x
> > > > > > > > > > > > > > > > > > ffff88839a53d850
> > > > > > > > > > > > > > > > > >   nf_flags = 0x8,
> > > > > > > > > > > > > > > > > >   nf_ref.refs.counter = 0x0
> > > > > > > > > > > > > > > > > >   nf_lru = {
> > > > > > > > > > > > > > > > > >     next = 0xffff88839a53d898,
> > > > > > > > > > > > > > > > > >     prev = 0xffff88839a53d898
> > > > > > > > > > > > > > > > > >   },
> > > > > > > > > > > > > > > > > >   nf_file = 0xffff88810ede8700,
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > crash> struct
> > > > > > > > > > > > > > > > > > nfsd_file.nf_flags,nf_ref.refs.counte
> > > > > > > > > > > > > > > > > > r,nf_lru
> > > > > > > > > > > > > > > > > > ,nf_file
> > > > > > > > > > > > > > > > > > -x
> > > > > > > > > > > > > > > > > > ffff88c32b11e850
> > > > > > > > > > > > > > > > > >   nf_flags = 0x8,
> > > > > > > > > > > > > > > > > >   nf_ref.refs.counter = 0x0
> > > > > > > > > > > > > > > > > >   nf_lru = {
> > > > > > > > > > > > > > > > > >     next = 0xffff88c32b11e898,
> > > > > > > > > > > > > > > > > >     prev = 0xffff88c32b11e898
> > > > > > > > > > > > > > > > > >   },
> > > > > > > > > > > > > > > > > >   nf_file = 0xffff88c20a701c00,
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > crash> struct
> > > > > > > > > > > > > > > > > > nfsd_file.nf_flags,nf_ref.refs.counte
> > > > > > > > > > > > > > > > > > r,nf_lru
> > > > > > > > > > > > > > > > > > ,nf_file
> > > > > > > > > > > > > > > > > > -x
> > > > > > > > > > > > > > > > > > ffff88e372709700
> > > > > > > > > > > > > > > > > >   nf_flags = 0xc,
> > > > > > > > > > > > > > > > > >   nf_ref.refs.counter = 0x0
> > > > > > > > > > > > > > > > > >   nf_lru = {
> > > > > > > > > > > > > > > > > >     next = 0xffff88e372709748,
> > > > > > > > > > > > > > > > > >     prev = 0xffff88e372709748
> > > > > > > > > > > > > > > > > >   },
> > > > > > > > > > > > > > > > > >   nf_file = 0xffff88e0725e6400,
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > crash> struct
> > > > > > > > > > > > > > > > > > nfsd_file.nf_flags,nf_ref.refs.counte
> > > > > > > > > > > > > > > > > > r,nf_lru
> > > > > > > > > > > > > > > > > > ,nf_file
> > > > > > > > > > > > > > > > > > -x
> > > > > > > > > > > > > > > > > > ffff8982864944d0
> > > > > > > > > > > > > > > > > >   nf_flags = 0xc,
> > > > > > > > > > > > > > > > > >   nf_ref.refs.counter = 0x0
> > > > > > > > > > > > > > > > > >   nf_lru = {
> > > > > > > > > > > > > > > > > >     next = 0xffff898286494518,
> > > > > > > > > > > > > > > > > >     prev = 0xffff898286494518
> > > > > > > > > > > > > > > > > >   },
> > > > > > > > > > > > > > > > > >   nf_file = 0xffff89803c0ff700,
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > The leak occurs when nfsd_file_put()
> > > > > > > > > > > > > > > > > > races
> > > > > > > > > > > > > > > > > > with
> > > > > > > > > > > > > > > > > > nfsd_file_cond_queue()
> > > > > > > > > > > > > > > > > > or nfsd_file_lru_cb(). With the
> > > > > > > > > > > > > > > > > > following
> > > > > > > > > > > > > > > > > > patch, I
> > > > > > > > > > > > > > > > > > haven't observed
> > > > > > > > > > > > > > > > > > any leak after a few days heavy nfs
> > > > > > > > > > > > > > > > > > load:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Our patch submission guidelines require
> > > > > > > > > > > > > > > > > a
> > > > > > > > > > > > > > > > > Signed-off-
> > > > > > > > > > > > > > > > > by:
> > > > > > > > > > > > > > > > > line at the end of the patch
> > > > > > > > > > > > > > > > > description. See
> > > > > > > > > > > > > > > > > the "Sign
> > > > > > > > > > > > > > > > > your work -
> > > > > > > > > > > > > > > > > the Developer's Certificate of Origin"
> > > > > > > > > > > > > > > > > section
> > > > > > > > > > > > > > > > > of
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/submitting-patches.rst?h=v6.10-rc6
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > (Needed here in case your fix is
> > > > > > > > > > > > > > > > > acceptable).
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > diff --git a/fs/nfsd/filecache.c
> > > > > > > > > > > > > > > > > > b/fs/nfsd/filecache.c
> > > > > > > > > > > > > > > > > > index 1a6d5d000b85..2323829f7208
> > > > > > > > > > > > > > > > > > 100644
> > > > > > > > > > > > > > > > > > --- a/fs/nfsd/filecache.c
> > > > > > > > > > > > > > > > > > +++ b/fs/nfsd/filecache.c
> > > > > > > > > > > > > > > > > > @@ -389,6 +389,17 @@
> > > > > > > > > > > > > > > > > > nfsd_file_put(struct
> > > > > > > > > > > > > > > > > > nfsd_file
> > > > > > > > > > > > > > > > > > *nf)
> > > > > > > > > > > > > > > > > >   if (!nfsd_file_lru_remove(nf))
> > > > > > > > > > > > > > > > > >   return;
> > > > > > > > > > > > > > > > > >   }
> > > > > > > > > > > > > > > > > > + /*
> > > > > > > > > > > > > > > > > > + * Racing with
> > > > > > > > > > > > > > > > > > nfsd_file_cond_queue() or
> > > > > > > > > > > > > > > > > > nfsd_file_lru_cb(),
> > > > > > > > > > > > > > > > > > + * it's unhashed but then removed
> > > > > > > > > > > > > > > > > > from the
> > > > > > > > > > > > > > > > > > dispose
> > > > > > > > > > > > > > > > > > list,
> > > > > > > > > > > > > > > > > > + * so we need to free it.
> > > > > > > > > > > > > > > > > > + */
> > > > > > > > > > > > > > > > > > + if (refcount_read(&nf->nf_ref) == 0
> > > > > > > > > > > > > > > > > > &&
> > > > > > > > > > > > > > > > > > +     !test_bit(NFSD_FILE_HASHED,
> > > > > > > > > > > > > > > > > > &nf-
> > > > > > > > > > > > > > > > > > > nf_flags) &&
> > > > > > > > > > > > > > > > > > +     list_empty(&nf->nf_lru)) {
> > > > > > > > > > > > > > > > > > + nfsd_file_free(nf);
> > > > > > > > > > > > > > > > > > + return;
> > > > > > > > > > > > > > > > > > + }
> > > > > > > > > > > > > > > > > >   }
> > > > > > > > > > > > > > > > > >   if (refcount_dec_and_test(&nf-
> > > > > > > > > > > > > > > > > > >nf_ref))
> > > > > > > > > > > > > > > > > >   nfsd_file_free(nf);
> > > > > > > > > > > > > > > > > > @@ -576,7 +587,7 @@
> > > > > > > > > > > > > > > > > > nfsd_file_cond_queue(struct
> > > > > > > > > > > > > > > > > > nfsd_file
> > > > > > > > > > > > > > > > > > *nf, struct
> > > > > > > > > > > > > > > > > > list_head *dispose)
> > > > > > > > > > > > > > > > > >   int decrement = 1;
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >   /* If we raced with someone else
> > > > > > > > > > > > > > > > > > unhashing,
> > > > > > > > > > > > > > > > > > ignore
> > > > > > > > > > > > > > > > > > it
> > > > > > > > > > > > > > > > > > */
> > > > > > > > > > > > > > > > > > - if (!nfsd_file_unhash(nf))
> > > > > > > > > > > > > > > > > > + if (!test_bit(NFSD_FILE_HASHED,
> > > > > > > > > > > > > > > > > > &nf-
> > > > > > > > > > > > > > > > > > > nf_flags))
> > > > > > > > > > > > > > > > > >   return;
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >   /* If we can't get a reference,
> > > > > > > > > > > > > > > > > > ignore it
> > > > > > > > > > > > > > > > > > */
> > > > > > > > > > > > > > > > > > @@ -590,6 +601,7 @@
> > > > > > > > > > > > > > > > > > nfsd_file_cond_queue(struct
> > > > > > > > > > > > > > > > > > nfsd_file
> > > > > > > > > > > > > > > > > > *nf, struct
> > > > > > > > > > > > > > > > > > list_head *dispose)
> > > > > > > > > > > > > > > > > >   /* If refcount goes to 0, then put
> > > > > > > > > > > > > > > > > > on the
> > > > > > > > > > > > > > > > > > dispose
> > > > > > > > > > > > > > > > > > list
> > > > > > > > > > > > > > > > > > */
> > > > > > > > > > > > > > > > > >   if
> > > > > > > > > > > > > > > > > > (refcount_sub_and_test(decrement,
> > > > > > > > > > > > > > > > > > &nf-
> > > > > > > > > > > > > > > > > > > nf_ref))
> > > > > > > > > > > > > > > > > > {
> > > > > > > > > > > > > > > > > >   list_add(&nf->nf_lru, dispose);
> > > > > > > > > > > > > > > > > > + nfsd_file_unhash(nf);
> > > > > > > > > > > > > > > > > >   trace_nfsd_file_closing(nf);
> > > > > > > > > > > > > > > > > >   }
> > > > > > > > > > > > > > > > > >  }
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Please kindly review the patch and
> > > > > > > > > > > > > > > > > > let me
> > > > > > > > > > > > > > > > > > know if it
> > > > > > > > > > > > > > > > > > makes sense.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > -Youzhong
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > --
> > > > > > > > > > > > > > > > > Chuck Lever
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > --
> > > > > > > > > > > > > > > Jeff Layton <jlayton@xxxxxxxxxxxxxxx>
> > > > > > > > > > > > >
> > > > > > > > > > > > > --
> > > > > > > > > > > > > Jeff Layton <jlayton@xxxxxxxxxxxxxxx>
> > > > > > > > > > > > >
> > > > > > > > > > > > > --
> > > > > > > > > > > > > Jeff Layton <jlayton@xxxxxxxxxx>
> > > > > > > > > > >
> > > > > > > > > > > --
> > > > > > > > > > > Jeff Layton <jlayton@xxxxxxxxxx>
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > Jeff Layton <jlayton@xxxxxxxxxx>
> > > > > > >
> > > > > > > --
> > > > > > > Jeff Layton <jlayton@xxxxxxxxxx>
> > > > >
> > > > > --
> > > > > Jeff Layton <jlayton@xxxxxxxxxx>
> > > >
> > >
> > > --
> > > Jeff Layton <jlayton@xxxxxxxxxx>
>
> --
> Jeff Layton <jlayton@xxxxxxxxxx>





[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux