Re: Leaked nfsd_file due to race condition and early unhash (fs/nfsd/filecache.c)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I'd like to report that your patches + the "list_head nf_gc" patch
together have been tested under heavy nfs load generated by machines
in our testing farm, no leak has been observed.

Surely my reproducer can no longer reproduce the issue once the
patches are applied.

FYI here is the reproducer:
https://github.com/youzhongyang/nfsd-file-leaks

On Wed, Jul 10, 2024 at 4:22 PM Jeff Layton <jlayton@xxxxxxxxxx> wrote:
>
> Great! If you're able to test the patches I sent earlier today, please
> let us know if they help with your reproducer.
>
>
> On Wed, 2024-07-10 at 10:49 -0400, Youzhong Yang wrote:
> > Thanks Jeff. Just sent out a separate e-mail.
> >
> > I have a simple reproducer now, it's a C program, I will upload it to
> > a github repository together with a readme file and then post the link
> > (if allowed) here.
> >
> > On Wed, Jul 10, 2024 at 9:40 AM Jeff Layton <jlayton@xxxxxxxxxx> wrote:
> > >
> > > Can you send it as a separate [PATCH] email? I think your MUA is
> > > mangling the patches as I couldn't apply it earlier. Either use git-
> > > send-email or have a look over this page:
> > >
> > >
> > >     https://www.kernel.org/doc/html/latest/process/email-clients.html#email-clients
> > >
> > > Thanks!
> > > Jeff
> > >
> > > On Wed, 2024-07-10 at 09:33 -0400, Youzhong Yang wrote:
> > > > Thanks. Here it is again:
> > > >
> > > > commit c6e69cebc052cb82d91fc81a98aee20749fe8d47
> > > > Author: Youzhong Yang <youzhong@xxxxxxxxx>
> > > > Date:   Thu Jul 4 11:25:40 2024 -0400
> > > >
> > > >     nfsd: fix nfsd_file leaking due to mixed use of nf->nf_lru
> > > >
> > > >     nfsd_file_put() in one thread can race with another thread doing
> > > >     garbage collection (running nfsd_file_gc() -> list_lru_walk() ->
> > > >     nfsd_file_lru_cb()):
> > > >
> > > >       * In nfsd_file_put(), nf->nf_ref is 1, so it tries to do
> > > > nfsd_file_lru_add().
> > > >       * nfsd_file_lru_add() returns true (with NFSD_FILE_REFERENCED
> > > > bit set)
> > > >       * garbage collector kicks in, nfsd_file_lru_cb() clears
> > > > REFERENCED bit and
> > > >         returns LRU_ROTATE.
> > > >       * garbage collector kicks in again, nfsd_file_lru_cb() now
> > > > decrements nf->nf_ref
> > > >         to 0, runs nfsd_file_unhash(), removes it from the LRU and
> > > > adds to the dispose
> > > >         list [list_lru_isolate_move(lru, &nf->nf_lru, head)]
> > > >       * nfsd_file_put() detects NFSD_FILE_HASHED bit is cleared, so
> > > > it
> > > > tries to remove
> > > >         the 'nf' from the LRU [if (!nfsd_file_lru_remove(nf))]. The
> > > > 'nf' has been added
> > > >         to the 'dispose' list by nfsd_file_lru_cb(), so
> > > > nfsd_file_lru_remove(nf) simply
> > > >         treats it as part of the LRU and removes it, which leads to
> > > > its removal from
> > > >         the 'dispose' list.
> > > >       * At this moment, 'nf' is unhashed with its nf_ref being 0, and
> > > > not on the LRU.
> > > >         nfsd_file_put() continues its execution [if
> > > > (refcount_dec_and_test(&nf->nf_ref))],
> > > >         as nf->nf_ref is already 0, nf->nf_ref is set to
> > > > REFCOUNT_SATURATED, and the 'nf'
> > > >         gets no chance of being freed.
> > > >
> > > >     nfsd_file_put() can also race with nfsd_file_cond_queue():
> > > >       * In nfsd_file_put(), nf->nf_ref is 1, so it tries to do
> > > > nfsd_file_lru_add().
> > > >       * nfsd_file_lru_add() sets REFERENCED bit and returns true.
> > > >       * Some userland application runs 'exportfs -f' or something
> > > > like
> > > > that, which triggers
> > > >         __nfsd_file_cache_purge() -> nfsd_file_cond_queue().
> > > >       * In nfsd_file_cond_queue(), it runs [if
> > > > (!nfsd_file_unhash(nf))], unhash is done
> > > >         successfully.
> > > >       * nfsd_file_cond_queue() runs [if (!nfsd_file_get(nf))], now
> > > > nf->nf_ref goes to 2.
> > > >       * nfsd_file_cond_queue() runs [if (nfsd_file_lru_remove(nf))],
> > > > it succeeds.
> > > >       * nfsd_file_cond_queue() runs [if
> > > > (refcount_sub_and_test(decrement, &nf->nf_ref))]
> > > >         (with "decrement" being 2), so the nf->nf_ref goes to 0, the
> > > > 'nf' is added to the
> > > >         dispose list [list_add(&nf->nf_lru, dispose)]
> > > >       * nfsd_file_put() detects NFSD_FILE_HASHED bit is cleared, so
> > > > it
> > > > tries to remove
> > > >         the 'nf' from the LRU [if (!nfsd_file_lru_remove(nf))],
> > > > although the 'nf' is not
> > > >         in the LRU, but it is linked in the 'dispose' list,
> > > > nfsd_file_lru_remove() simply
> > > >         treats it as part of the LRU and removes it. This leads to
> > > > its
> > > > removal from
> > > >         the 'dispose' list!
> > > >       * Now nf->ref is 0, unhashed. nfsd_file_put() continues its
> > > > execution and set
> > > >         nf->nf_ref to REFCOUNT_SATURATED.
> > > >
> > > >     As shown in the above analysis, using nf_lru for both the LRU
> > > > list
> > > > and dispose list
> > > >     can cause the leaks. This patch adds a new list_head nf_gc in
> > > > struct nfsd_file, and uses
> > > >     it for the dispose list.
> > > >
> > > >     Signed-off-by: Youzhong Yang <youzhong@xxxxxxxxx>
> > > >
> > > > diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
> > > > index ad9083ca144b..22ebd7fb8639 100644
> > > > --- a/fs/nfsd/filecache.c
> > > > +++ b/fs/nfsd/filecache.c
> > > > @@ -216,6 +216,7 @@ nfsd_file_alloc(struct net *net, struct inode
> > > > *inode, unsigned char need,
> > > >                 return NULL;
> > > >
> > > >         INIT_LIST_HEAD(&nf->nf_lru);
> > > > +       INIT_LIST_HEAD(&nf->nf_gc);
> > > >         nf->nf_birthtime = ktime_get();
> > > >         nf->nf_file = NULL;
> > > >         nf->nf_cred = get_current_cred();
> > > > @@ -393,8 +394,8 @@ nfsd_file_dispose_list(struct list_head *dispose)
> > > >         struct nfsd_file *nf;
> > > >
> > > >         while (!list_empty(dispose)) {
> > > > -               nf = list_first_entry(dispose, struct nfsd_file,
> > > > nf_lru);
> > > > -               list_del_init(&nf->nf_lru);
> > > > +               nf = list_first_entry(dispose, struct nfsd_file,
> > > > nf_gc);
> > > > +               list_del_init(&nf->nf_gc);
> > > >                 nfsd_file_free(nf);
> > > >         }
> > > >  }
> > > > @@ -411,12 +412,12 @@ nfsd_file_dispose_list_delayed(struct list_head
> > > > *dispose)
> > > >  {
> > > >         while(!list_empty(dispose)) {
> > > >                 struct nfsd_file *nf = list_first_entry(dispose,
> > > > -                                               struct nfsd_file,
> > > > nf_lru);
> > > > +                                               struct nfsd_file,
> > > > nf_gc);
> > > >                 struct nfsd_net *nn = net_generic(nf->nf_net,
> > > > nfsd_net_id);
> > > >                 struct nfsd_fcache_disposal *l = nn->fcache_disposal;
> > > >
> > > >                 spin_lock(&l->lock);
> > > > -               list_move_tail(&nf->nf_lru, &l->freeme);
> > > > +               list_move_tail(&nf->nf_gc, &l->freeme);
> > > >                 spin_unlock(&l->lock);
> > > >                 svc_wake_up(nn->nfsd_serv);
> > > >         }
> > > > @@ -503,7 +504,8 @@ nfsd_file_lru_cb(struct list_head *item, struct
> > > > list_lru_one *lru,
> > > >
> > > >         /* Refcount went to zero. Unhash it and queue it to the
> > > > dispose list */
> > > >         nfsd_file_unhash(nf);
> > > > -       list_lru_isolate_move(lru, &nf->nf_lru, head);
> > > > +       list_lru_isolate(lru, &nf->nf_lru);
> > > > +       list_add(&nf->nf_gc, head);
> > > >         this_cpu_inc(nfsd_file_evictions);
> > > >         trace_nfsd_file_gc_disposed(nf);
> > > >         return LRU_REMOVED;
> > > > @@ -578,7 +580,7 @@ nfsd_file_cond_queue(struct nfsd_file *nf, struct
> > > > list_head *dispose)
> > > >
> > > >         /* If refcount goes to 0, then put on the dispose list */
> > > >         if (refcount_sub_and_test(decrement, &nf->nf_ref)) {
> > > > -               list_add(&nf->nf_lru, dispose);
> > > > +               list_add(&nf->nf_gc, dispose);
> > > >                 trace_nfsd_file_closing(nf);
> > > >         }
> > > >  }
> > > > @@ -654,8 +656,8 @@ nfsd_file_close_inode_sync(struct inode *inode)
> > > >
> > > >         nfsd_file_queue_for_close(inode, &dispose);
> > > >         while (!list_empty(&dispose)) {
> > > > -               nf = list_first_entry(&dispose, struct nfsd_file,
> > > > nf_lru);
> > > > -               list_del_init(&nf->nf_lru);
> > > > +               nf = list_first_entry(&dispose, struct nfsd_file,
> > > > nf_gc);
> > > > +               list_del_init(&nf->nf_gc);
> > > >                 nfsd_file_free(nf);
> > > >         }
> > > >  }
> > > > diff --git a/fs/nfsd/filecache.h b/fs/nfsd/filecache.h
> > > > index c61884def906..3fbec24eea6c 100644
> > > > --- a/fs/nfsd/filecache.h
> > > > +++ b/fs/nfsd/filecache.h
> > > > @@ -44,6 +44,7 @@ struct nfsd_file {
> > > >
> > > >         struct nfsd_file_mark   *nf_mark;
> > > >         struct list_head        nf_lru;
> > > > +       struct list_head        nf_gc;
> > > >         struct rcu_head         nf_rcu;
> > > >         ktime_t                 nf_birthtime;
> > > >  };
> > > >
> > > > On Wed, Jul 10, 2024 at 8:49 AM Jeff Layton <jlayton@xxxxxxxxxx>
> > > > wrote:
> > > > >
> > > > > Ok, thanks. I went crawling over the filecache code, and found a
> > > > > couple
> > > > > of potential nfsd_file leaks. I'm testing patches now and will
> > > > > likely
> > > > > send them along later today.
> > > > >
> > > > > I think we do still want the patch to add nf_gc list_head. I'm not
> > > > > certain there is a race there, but it's safer to just use a
> > > > > separate
> > > > > list_head and it doesn't represent a lot of memory.
> > > > >
> > > > > Could you post that one individually so that Chuck can pick it up?
> > > > >
> > > > > Thanks,
> > > > > Jeff
> > > > >
> > > > > On Tue, 2024-07-09 at 15:13 -0400, Youzhong Yang wrote:
> > > > > > It's not on the LRU:
> > > > > >
> > > > > > crash> struct nfsd_file.nf_flags,nf_ref.refs.counter,nf_lru,nf_gc
> > > > > > ffff898107b95980
> > > > > >   nf_flags = 12,
> > > > > >   nf_ref.refs.counter = 1
> > > > > >   nf_lru = {
> > > > > >     next = 0xffff898107b959c8,
> > > > > >     prev = 0xffff898107b959c8
> > > > > >   },
> > > > > >   nf_gc = {
> > > > > >     next = 0xffff898107b959d8,
> > > > > >     prev = 0xffff898107b959d8
> > > > > >   },
> > > > > >
> > > > > > I don't have an easy reproducer now as I am having difficulty
> > > > > > generating tons of NFS read/write ops using a limited number of
> > > > > > nfs
> > > > > > clients. I use our in-house testing farm to test the patch.
> > > > > >
> > > > > > Thanks.
> > > > > >
> > > > > > On Tue, Jul 9, 2024 at 3:05 PM Jeff Layton <jlayton@xxxxxxxxxx>
> > > > > > wrote:
> > > > > > >
> > > > > > > On Tue, 2024-07-09 at 14:37 -0400, Youzhong Yang wrote:
> > > > > > > > Thanks Jeff.
> > > > > > > >
> > > > > > > > Unfortunately the early unhash easily leads to leaks:
> > > > > > > >
> > > > > > > > crash> kmem -S nfsd_file | grep '\[ffff' | sed -e 's|\[||' -e
> > > > > > > > 's|\]||'
> > > > > > > > > xargs -i echo struct nfsd_file.nf_flags,nf_ref.refs.counter
> > > > > > > > > '{}' >
> > > > > > > > /var/tmp/nfsd_files
> > > > > > > > crash> !wc -l /var/tmp/nfsd_files
> > > > > > > > 19 /var/tmp/nfsd_files
> > > > > > > > crash> < /var/tmp/nfsd_files
> > > > > > > > crash> struct nfsd_file.nf_flags,nf_ref.refs.counter
> > > > > > > > ffff88865c778900
> > > > > > > >   nf_flags = 8,
> > > > > > > >   nf_ref.refs.counter = 2
> > > > > > > > crash> struct nfsd_file.nf_flags,nf_ref.refs.counter
> > > > > > > > ffff88865c778cc0
> > > > > > > >   nf_flags = 8,
> > > > > > > >   nf_ref.refs.counter = 3
> > > > > > > > crash> struct nfsd_file.nf_flags,nf_ref.refs.counter
> > > > > > > > ffff8885d5f35e00
> > > > > > > >   nf_flags = 8,
> > > > > > > >   nf_ref.refs.counter = 1
> > > > > > > > crash> struct nfsd_file.nf_flags,nf_ref.refs.counter
> > > > > > > > ffff88817443e780
> > > > > > > >   nf_flags = 8,
> > > > > > > >   nf_ref.refs.counter = 3
> > > > > > > > crash> struct nfsd_file.nf_flags,nf_ref.refs.counter
> > > > > > > > ffff88818b3f0600
> > > > > > > >   nf_flags = 8,
> > > > > > > >   nf_ref.refs.counter = 2
> > > > > > > > crash> struct nfsd_file.nf_flags,nf_ref.refs.counter
> > > > > > > > ffff88a4490f8300
> > > > > > > >   nf_flags = 8,
> > > > > > > >   nf_ref.refs.counter = 1
> > > > > > > > crash> struct nfsd_file.nf_flags,nf_ref.refs.counter
> > > > > > > > ffff88a0dab183c0
> > > > > > > >   nf_flags = 8,
> > > > > > > >   nf_ref.refs.counter = 40
> > > > > > >
> > > > > > > That's a lot of references! Might be interesting to look more
> > > > > > > closely
> > > > > > > at that one, but the refcounts are all over the place, so it
> > > > > > > really
> > > > > > > does look like we just have a refcount leak somewhere.
> > > > > > >
> > > > > > > > ...
> > > > > > > > crash> struct nfsd_file.nf_flags,nf_ref.refs.counter
> > > > > > > > ffff89209535f200
> > > > > > > >   nf_flags = 8,
> > > > > > > >   nf_ref.refs.counter = 2
> > > > > > > > crash> struct nfsd_file.nf_flags,nf_ref.refs.counter
> > > > > > > > ffff8980e15138c0
> > > > > > > >   nf_flags = 8,
> > > > > > > >   nf_ref.refs.counter = 7
> > > > > > > > crash> struct nfsd_file.nf_flags,nf_ref.refs.counter
> > > > > > > > ffff898107b95800
> > > > > > > >   nf_flags = 8,
> > > > > > > >   nf_ref.refs.counter = 3
> > > > > > > > crash> struct nfsd_file.nf_flags,nf_ref.refs.counter
> > > > > > > > ffff898107b95980
> > > > > > > >   nf_flags = 12,
> > > > > > >
> > > > > > > The others are NFSD_FILE_GC. This one is also
> > > > > > > NFSD_FILE_REFERENCED. Are
> > > > > > > these objects on the LRU?
> > > > > > >
> > > > > > > >   nf_ref.refs.counter = 1
> > > > > > > >
> > > > > > > > nfsd_file_do_acquire() -> nfsd_file_lookup_locked() relies on
> > > > > > > > the
> > > > > > > > hash
> > > > > > > > table to find the nfsd_file,
> > > > > > > > but I am still scratching my head why and how this happens.
> > > > > > > >
> > > > > > > > FYI, here is the patch I applied for testing:
> > > > > > > >
> > > > > > >
> > > > > > > The above suggests to me that there is a garden-variety
> > > > > > > refcount leak
> > > > > > > somewhere. Whenever some bit of the code takes a reference to
> > > > > > > an object
> > > > > > > (like a nfsd_file), it's implicitly required to eventually put
> > > > > > > that
> > > > > > > reference. Typically that means that the code needs to maintain
> > > > > > > a
> > > > > > > pointer to that object, as it can be unhashed at any time and
> > > > > > > it can't
> > > > > > > rely on finding the same object later.
> > > > > > >
> > > > > > > Do you have a reproducer for this?
> > > > > > >
> > > > > > > > diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
> > > > > > > > index ad9083ca144b..22ebd7fb8639 100644
> > > > > > > > --- a/fs/nfsd/filecache.c
> > > > > > > > +++ b/fs/nfsd/filecache.c
> > > > > > > > @@ -216,6 +216,7 @@ nfsd_file_alloc(struct net *net, struct
> > > > > > > > inode
> > > > > > > > *inode, unsigned char need,
> > > > > > > >                 return NULL;
> > > > > > > >
> > > > > > > >         INIT_LIST_HEAD(&nf->nf_lru);
> > > > > > > > +       INIT_LIST_HEAD(&nf->nf_gc);
> > > > > > > >         nf->nf_birthtime = ktime_get();
> > > > > > > >         nf->nf_file = NULL;
> > > > > > > >         nf->nf_cred = get_current_cred();
> > > > > > > > @@ -393,8 +394,8 @@ nfsd_file_dispose_list(struct list_head
> > > > > > > > *dispose)
> > > > > > > >         struct nfsd_file *nf;
> > > > > > > >
> > > > > > > >         while (!list_empty(dispose)) {
> > > > > > > > -               nf = list_first_entry(dispose, struct
> > > > > > > > nfsd_file,
> > > > > > > > nf_lru);
> > > > > > > > -               list_del_init(&nf->nf_lru);
> > > > > > > > +               nf = list_first_entry(dispose, struct
> > > > > > > > nfsd_file,
> > > > > > > > nf_gc);
> > > > > > > > +               list_del_init(&nf->nf_gc);
> > > > > > > >                 nfsd_file_free(nf);
> > > > > > > >         }
> > > > > > > >  }
> > > > > > > > @@ -411,12 +412,12 @@ nfsd_file_dispose_list_delayed(struct
> > > > > > > > list_head
> > > > > > > > *dispose)
> > > > > > > >  {
> > > > > > > >         while(!list_empty(dispose)) {
> > > > > > > >                 struct nfsd_file *nf =
> > > > > > > > list_first_entry(dispose,
> > > > > > > > -                                               struct
> > > > > > > > nfsd_file,
> > > > > > > > nf_lru);
> > > > > > > > +                                               struct
> > > > > > > > nfsd_file,
> > > > > > > > nf_gc);
> > > > > > > >                 struct nfsd_net *nn = net_generic(nf->nf_net,
> > > > > > > > nfsd_net_id);
> > > > > > > >                 struct nfsd_fcache_disposal *l = nn-
> > > > > > > > > fcache_disposal;
> > > > > > > >
> > > > > > > >                 spin_lock(&l->lock);
> > > > > > > > -               list_move_tail(&nf->nf_lru, &l->freeme);
> > > > > > > > +               list_move_tail(&nf->nf_gc, &l->freeme);
> > > > > > > >                 spin_unlock(&l->lock);
> > > > > > > >                 svc_wake_up(nn->nfsd_serv);
> > > > > > > >         }
> > > > > > > > @@ -503,7 +504,8 @@ nfsd_file_lru_cb(struct list_head *item,
> > > > > > > > struct
> > > > > > > > list_lru_one *lru,
> > > > > > > >
> > > > > > > >         /* Refcount went to zero. Unhash it and queue it to
> > > > > > > > the
> > > > > > > > dispose list */
> > > > > > > >         nfsd_file_unhash(nf);
> > > > > > > > -       list_lru_isolate_move(lru, &nf->nf_lru, head);
> > > > > > > > +       list_lru_isolate(lru, &nf->nf_lru);
> > > > > > > > +       list_add(&nf->nf_gc, head);
> > > > > > > >         this_cpu_inc(nfsd_file_evictions);
> > > > > > > >         trace_nfsd_file_gc_disposed(nf);
> > > > > > > >         return LRU_REMOVED;
> > > > > > > > @@ -578,7 +580,7 @@ nfsd_file_cond_queue(struct nfsd_file
> > > > > > > > *nf, struct
> > > > > > > > list_head *dispose)
> > > > > > > >
> > > > > > > >         /* If refcount goes to 0, then put on the dispose
> > > > > > > > list */
> > > > > > > >         if (refcount_sub_and_test(decrement, &nf->nf_ref)) {
> > > > > > > > -               list_add(&nf->nf_lru, dispose);
> > > > > > > > +               list_add(&nf->nf_gc, dispose);
> > > > > > > >                 trace_nfsd_file_closing(nf);
> > > > > > > >         }
> > > > > > > >  }
> > > > > > > > @@ -654,8 +656,8 @@ nfsd_file_close_inode_sync(struct inode
> > > > > > > > *inode)
> > > > > > > >
> > > > > > > >         nfsd_file_queue_for_close(inode, &dispose);
> > > > > > > >         while (!list_empty(&dispose)) {
> > > > > > > > -               nf = list_first_entry(&dispose, struct
> > > > > > > > nfsd_file,
> > > > > > > > nf_lru);
> > > > > > > > -               list_del_init(&nf->nf_lru);
> > > > > > > > +               nf = list_first_entry(&dispose, struct
> > > > > > > > nfsd_file,
> > > > > > > > nf_gc);
> > > > > > > > +               list_del_init(&nf->nf_gc);
> > > > > > > >                 nfsd_file_free(nf);
> > > > > > > >         }
> > > > > > > >  }
> > > > > > > > diff --git a/fs/nfsd/filecache.h b/fs/nfsd/filecache.h
> > > > > > > > index c61884def906..3fbec24eea6c 100644
> > > > > > > > --- a/fs/nfsd/filecache.h
> > > > > > > > +++ b/fs/nfsd/filecache.h
> > > > > > > > @@ -44,6 +44,7 @@ struct nfsd_file {
> > > > > > > >
> > > > > > > >         struct nfsd_file_mark   *nf_mark;
> > > > > > > >         struct list_head        nf_lru;
> > > > > > > > +       struct list_head        nf_gc;
> > > > > > > >         struct rcu_head         nf_rcu;
> > > > > > > >         ktime_t                 nf_birthtime;
> > > > > > > >  };
> > > > > > > >
> > > > > > > > On Mon, Jul 8, 2024 at 10:55 AM Jeff Layton
> > > > > > > > <jlayton@xxxxxxxxxx>
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > On Mon, 2024-07-08 at 10:23 -0400, Youzhong Yang wrote:
> > > > > > > > > > Thanks Jeff.
> > > > > > > > > >
> > > > > > > > > > I am ok with reverting the unhash/dispose list reordering
> > > > > > > > > > in
> > > > > > > > > > nfsd_file_lru_cb(), as it doesn't make much of a
> > > > > > > > > > difference,
> > > > > > > > > > but for nfsd_file_cond_queue(), imagining this:
> > > > > > > > > >
> > > > > > > > > > - A nfsd_file is hashed
> > > > > > > > > > - In nfsd_file_cond_queue(), [if (!nfsd_file_unhash(nf))]
> > > > > > > > > > will
> > > > > > > > > > get it
> > > > > > > > > > unhashed, doesn't it?
> > > > > > > > > > - It continues to get a reference by nfsd_file_get()
> > > > > > > > > > - It continues to remove itself from LRU by
> > > > > > > > > > nfsd_file_lru_remove() if
> > > > > > > > > > it is on the LRU.
> > > > > > > > > > - Now it runs refcount_sub_and_test(), what happens if
> > > > > > > > > > the refcnt
> > > > > > > > > > does
> > > > > > > > > > not go to 0? How can this nfsd_file be found again?
> > > > > > > > > > Through the
> > > > > > > > > > hash
> > > > > > > > > > table? Through the LRU walk? how?
> > > > > > > > > >
> > > > > > > > > > Thanks again.
> > > > > > > > > >
> > > > > > > > > > -Youzhong
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > It won't need to be found again. The holders of the extra
> > > > > > > > > references
> > > > > > > > > will put those references when they are finished. Since the
> > > > > > > > > object
> > > > > > > > > is
> > > > > > > > > no longer HASHED, nfsd_file_put just does this:
> > > > > > > > >
> > > > > > > > >         if (refcount_dec_and_test(&nf->nf_ref))
> > > > > > > > >                 nfsd_file_free(nf);
> > > > > > > > >
> > > > > > > > > So that should be fine.
> > > > > > > > >
> > > > > > > > > > On Mon, Jul 8, 2024 at 9:35 AM Jeff Layton
> > > > > > > > > > <jlayton@xxxxxxxxxx>
> > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > On Mon, 2024-07-08 at 08:58 -0400, Youzhong Yang wrote:
> > > > > > > > > > > > Thank you Jeff for your invaluable insights. I was
> > > > > > > > > > > > leaning
> > > > > > > > > > > > towards
> > > > > > > > > > > > adding a new list_head too, and tested this approach
> > > > > > > > > > > > on
> > > > > > > > > > > > kernel
> > > > > > > > > > > > 6.6 by
> > > > > > > > > > > > continuously hammering the server with heavy nfs load
> > > > > > > > > > > > for the
> > > > > > > > > > > > last few
> > > > > > > > > > > > days, not a single leak.
> > > > > > > > > > > >
> > > > > > > > > > > > Here goes the patch (based on Linux kernel master
> > > > > > > > > > > > branch),
> > > > > > > > > > > > please
> > > > > > > > > > > > review:
> > > > > > > > > > > >
> > > > > > > > > > > > From: Youzhong Yang <youzhong@xxxxxxxxx>
> > > > > > > > > > > > Date: Thu, 4 Jul 2024 11:25:40 -0400
> > > > > > > > > > > > Subject: [PATCH] nfsd: fix nfsd_file leaking due to
> > > > > > > > > > > > mixed use
> > > > > > > > > > > > of
> > > > > > > > > > > > nf->nf_lru
> > > > > > > > > > > >
> > > > > > > > > > > > nfsd_file_put() in one thread can race with another
> > > > > > > > > > > > thread
> > > > > > > > > > > > doing
> > > > > > > > > > > > garbage collection (running nfsd_file_gc() ->
> > > > > > > > > > > > list_lru_walk()
> > > > > > > > > > > > ->
> > > > > > > > > > > > nfsd_file_lru_cb()):
> > > > > > > > > > > >
> > > > > > > > > > > >   * In nfsd_file_put(), nf->nf_ref is 1, so it tries
> > > > > > > > > > > > to do
> > > > > > > > > > > > nfsd_file_lru_add().
> > > > > > > > > > > >   * nfsd_file_lru_add() returns true (with
> > > > > > > > > > > > NFSD_FILE_REFERENCED
> > > > > > > > > > > > bit set)
> > > > > > > > > > > >   * garbage collector kicks in, nfsd_file_lru_cb()
> > > > > > > > > > > > clears
> > > > > > > > > > > > REFERENCED bit and
> > > > > > > > > > > >     returns LRU_ROTATE.
> > > > > > > > > > > >   * garbage collector kicks in again,
> > > > > > > > > > > > nfsd_file_lru_cb() now
> > > > > > > > > > > > decrements nf->nf_ref
> > > > > > > > > > > >     to 0, runs nfsd_file_unhash(), removes it from
> > > > > > > > > > > > the LRU
> > > > > > > > > > > > and
> > > > > > > > > > > > adds to
> > > > > > > > > > > > the dispose
> > > > > > > > > > > >     list [list_lru_isolate_move(lru, &nf->nf_lru,
> > > > > > > > > > > > head)]
> > > > > > > > > > > >   * nfsd_file_put() detects NFSD_FILE_HASHED bit is
> > > > > > > > > > > > cleared,
> > > > > > > > > > > > so
> > > > > > > > > > > > it
> > > > > > > > > > > > tries to remove
> > > > > > > > > > > >     the 'nf' from the LRU [if
> > > > > > > > > > > > (!nfsd_file_lru_remove(nf))].
> > > > > > > > > > > > The
> > > > > > > > > > > > 'nf'
> > > > > > > > > > > > has been added
> > > > > > > > > > > >     to the 'dispose' list by nfsd_file_lru_cb(), so
> > > > > > > > > > > > nfsd_file_lru_remove(nf) simply
> > > > > > > > > > > >     treats it as part of the LRU and removes it,
> > > > > > > > > > > > which leads
> > > > > > > > > > > > to
> > > > > > > > > > > > its removal from
> > > > > > > > > > > >     the 'dispose' list.
> > > > > > > > > > > >   * At this moment, 'nf' is unhashed with its nf_ref
> > > > > > > > > > > > being 0,
> > > > > > > > > > > > and
> > > > > > > > > > > > not
> > > > > > > > > > > > on the LRU.
> > > > > > > > > > > >     nfsd_file_put() continues its execution [if
> > > > > > > > > > > > (refcount_dec_and_test(&nf->nf_ref))],
> > > > > > > > > > > >     as nf->nf_ref is already 0, nf->nf_ref is set to
> > > > > > > > > > > > REFCOUNT_SATURATED, and the 'nf'
> > > > > > > > > > > >     gets no chance of being freed.
> > > > > > > > > > > >
> > > > > > > > > > > > nfsd_file_put() can also race with
> > > > > > > > > > > > nfsd_file_cond_queue():
> > > > > > > > > > > >   * In nfsd_file_put(), nf->nf_ref is 1, so it tries
> > > > > > > > > > > > to do
> > > > > > > > > > > > nfsd_file_lru_add().
> > > > > > > > > > > >   * nfsd_file_lru_add() sets REFERENCED bit and
> > > > > > > > > > > > returns true.
> > > > > > > > > > > >   * Some userland application runs 'exportfs -f' or
> > > > > > > > > > > > something
> > > > > > > > > > > > like
> > > > > > > > > > > > that, which triggers
> > > > > > > > > > > >     __nfsd_file_cache_purge() ->
> > > > > > > > > > > > nfsd_file_cond_queue().
> > > > > > > > > > > >   * In nfsd_file_cond_queue(), it runs [if
> > > > > > > > > > > > (!nfsd_file_unhash(nf))],
> > > > > > > > > > > > unhash is done
> > > > > > > > > > > >     successfully.
> > > > > > > > > > > >   * nfsd_file_cond_queue() runs [if
> > > > > > > > > > > > (!nfsd_file_get(nf))],
> > > > > > > > > > > > now
> > > > > > > > > > > > nf->nf_ref goes to 2.
> > > > > > > > > > > >   * nfsd_file_cond_queue() runs [if
> > > > > > > > > > > > (nfsd_file_lru_remove(nf))],
> > > > > > > > > > > > it succeeds.
> > > > > > > > > > > >   * nfsd_file_cond_queue() runs [if
> > > > > > > > > > > > (refcount_sub_and_test(decrement,
> > > > > > > > > > > > &nf->nf_ref))]
> > > > > > > > > > > >     (with "decrement" being 2), so the nf->nf_ref
> > > > > > > > > > > > goes to 0,
> > > > > > > > > > > > the
> > > > > > > > > > > > 'nf'
> > > > > > > > > > > > is added to the
> > > > > > > > > > > >     dispose list [list_add(&nf->nf_lru, dispose)]
> > > > > > > > > > > >   * nfsd_file_put() detects NFSD_FILE_HASHED bit is
> > > > > > > > > > > > cleared,
> > > > > > > > > > > > so
> > > > > > > > > > > > it
> > > > > > > > > > > > tries to remove
> > > > > > > > > > > >     the 'nf' from the LRU [if
> > > > > > > > > > > > (!nfsd_file_lru_remove(nf))],
> > > > > > > > > > > > although
> > > > > > > > > > > > the 'nf' is not
> > > > > > > > > > > >     in the LRU, but it is linked in the 'dispose'
> > > > > > > > > > > > list,
> > > > > > > > > > > > nfsd_file_lru_remove() simply
> > > > > > > > > > > >     treats it as part of the LRU and removes it. This
> > > > > > > > > > > > leads
> > > > > > > > > > > > to
> > > > > > > > > > > > its removal from
> > > > > > > > > > > >     the 'dispose' list!
> > > > > > > > > > > >   * Now nf->ref is 0, unhashed. nfsd_file_put()
> > > > > > > > > > > > continues its
> > > > > > > > > > > > execution and set
> > > > > > > > > > > >     nf->nf_ref to REFCOUNT_SATURATED.
> > > > > > > > > > > >
> > > > > > > > > > > > As shown in the above analysis, using nf_lru for both
> > > > > > > > > > > > the LRU
> > > > > > > > > > > > list and
> > > > > > > > > > > > dispose list
> > > > > > > > > > > > can cause the leaks. This patch adds a new list_head
> > > > > > > > > > > > nf_gc in
> > > > > > > > > > > > struct
> > > > > > > > > > > > nfsd_file, and uses
> > > > > > > > > > > > it for the dispose list. It's not expected to have a
> > > > > > > > > > > > nfsd_file
> > > > > > > > > > > > unhashed but it's not
> > > > > > > > > > > > added to the dispose list, so in
> > > > > > > > > > > > nfsd_file_cond_queue() and
> > > > > > > > > > > > nfsd_file_lru_cb() nfsd_file
> > > > > > > > > > > > is unhashed after being added to the dispose list.
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > I don't see where we require the object to be either
> > > > > > > > > > > hashed or
> > > > > > > > > > > on
> > > > > > > > > > > the
> > > > > > > > > > > dispose list.  I think you probably just want to do a
> > > > > > > > > > > patch
> > > > > > > > > > > that
> > > > > > > > > > > changes the dispose list to use a dedicated list_head
> > > > > > > > > > > without
> > > > > > > > > > > reordering when the these things are unhashed.
> > > > > > > > > > >
> > > > > > > > > > > > Signed-off-by: Youzhong Yang <youzhong@xxxxxxxxx>
> > > > > > > > > > > > ---
> > > > > > > > > > > >  fs/nfsd/filecache.c | 23 ++++++++++++++---------
> > > > > > > > > > > >  fs/nfsd/filecache.h |  1 +
> > > > > > > > > > > >  2 files changed, 15 insertions(+), 9 deletions(-)
> > > > > > > > > > > >
> > > > > > > > > > > > diff --git a/fs/nfsd/filecache.c
> > > > > > > > > > > > b/fs/nfsd/filecache.c
> > > > > > > > > > > > index ad9083ca144b..3aef2ddfce94 100644
> > > > > > > > > > > > --- a/fs/nfsd/filecache.c
> > > > > > > > > > > > +++ b/fs/nfsd/filecache.c
> > > > > > > > > > > > @@ -216,6 +216,7 @@ nfsd_file_alloc(struct net *net,
> > > > > > > > > > > > struct
> > > > > > > > > > > > inode
> > > > > > > > > > > > *inode, unsigned char need,
> > > > > > > > > > > >                 return NULL;
> > > > > > > > > > > >
> > > > > > > > > > > >         INIT_LIST_HEAD(&nf->nf_lru);
> > > > > > > > > > > > +       INIT_LIST_HEAD(&nf->nf_gc);
> > > > > > > > > > > >         nf->nf_birthtime = ktime_get();
> > > > > > > > > > > >         nf->nf_file = NULL;
> > > > > > > > > > > >         nf->nf_cred = get_current_cred();
> > > > > > > > > > > > @@ -393,8 +394,8 @@ nfsd_file_dispose_list(struct
> > > > > > > > > > > > list_head
> > > > > > > > > > > > *dispose)
> > > > > > > > > > > >         struct nfsd_file *nf;
> > > > > > > > > > > >
> > > > > > > > > > > >         while (!list_empty(dispose)) {
> > > > > > > > > > > > -               nf = list_first_entry(dispose, struct
> > > > > > > > > > > > nfsd_file,
> > > > > > > > > > > > nf_lru);
> > > > > > > > > > > > -               list_del_init(&nf->nf_lru);
> > > > > > > > > > > > +               nf = list_first_entry(dispose, struct
> > > > > > > > > > > > nfsd_file,
> > > > > > > > > > > > nf_gc);
> > > > > > > > > > > > +               list_del_init(&nf->nf_gc);
> > > > > > > > > > > >                 nfsd_file_free(nf);
> > > > > > > > > > > >         }
> > > > > > > > > > > >  }
> > > > > > > > > > > > @@ -411,12 +412,12 @@
> > > > > > > > > > > > nfsd_file_dispose_list_delayed(struct
> > > > > > > > > > > > list_head *dispose)
> > > > > > > > > > > >  {
> > > > > > > > > > > >         while(!list_empty(dispose)) {
> > > > > > > > > > > >                 struct nfsd_file *nf =
> > > > > > > > > > > > list_first_entry(dispose,
> > > > > > > > > > > > -
> > > > > > > > > > > > struct
> > > > > > > > > > > > nfsd_file,
> > > > > > > > > > > > nf_lru);
> > > > > > > > > > > > +
> > > > > > > > > > > > struct
> > > > > > > > > > > > nfsd_file,
> > > > > > > > > > > > nf_gc);
> > > > > > > > > > > >                 struct nfsd_net *nn = net_generic(nf-
> > > > > > > > > > > > > nf_net,
> > > > > > > > > > > > nfsd_net_id);
> > > > > > > > > > > >                 struct nfsd_fcache_disposal *l = nn-
> > > > > > > > > > > > > fcache_disposal;
> > > > > > > > > > > >
> > > > > > > > > > > >                 spin_lock(&l->lock);
> > > > > > > > > > > > -               list_move_tail(&nf->nf_lru, &l-
> > > > > > > > > > > > > freeme);
> > > > > > > > > > > > +               list_move_tail(&nf->nf_gc, &l-
> > > > > > > > > > > > > freeme);
> > > > > > > > > > > >                 spin_unlock(&l->lock);
> > > > > > > > > > > >                 svc_wake_up(nn->nfsd_serv);
> > > > > > > > > > > >         }
> > > > > > > > > > > > @@ -502,8 +503,10 @@ nfsd_file_lru_cb(struct
> > > > > > > > > > > > list_head *item,
> > > > > > > > > > > > struct
> > > > > > > > > > > > list_lru_one *lru,
> > > > > > > > > > > >         }
> > > > > > > > > > > >
> > > > > > > > > > > >         /* Refcount went to zero. Unhash it and queue
> > > > > > > > > > > > it to
> > > > > > > > > > > > the
> > > > > > > > > > > > dispose list */
> > > > > > > > > > > > +       list_lru_isolate(lru, &nf->nf_lru);
> > > > > > > > > > > > +       list_add(&nf->nf_gc, head);
> > > > > > > > > > > > +       /* Unhash after removing from LRU and adding
> > > > > > > > > > > > to
> > > > > > > > > > > > dispose
> > > > > > > > > > > > list */
> > > > > > > > > > > >         nfsd_file_unhash(nf);
> > > > > > > > > > > > -       list_lru_isolate_move(lru, &nf->nf_lru,
> > > > > > > > > > > > head);
> > > > > > > > > > >
> > > > > > > > > > > I don't see the point in reordering these operations.
> > > > > > > > > > > Hashing
> > > > > > > > > > > is
> > > > > > > > > > > all
> > > > > > > > > > > about making the thing findable by nfsd operations. The
> > > > > > > > > > > _last_
> > > > > > > > > > > thing we
> > > > > > > > > > > want to do is put it on the dispose list while the
> > > > > > > > > > > thing can
> > > > > > > > > > > still
> > > > > > > > > > > be
> > > > > > > > > > > found by nfsd threads doing operations.
> > > > > > > > > > >
> > > > > > > > > > > >         this_cpu_inc(nfsd_file_evictions);
> > > > > > > > > > > >         trace_nfsd_file_gc_disposed(nf);
> > > > > > > > > > > >         return LRU_REMOVED;
> > > > > > > > > > > > @@ -565,7 +568,7 @@ nfsd_file_cond_queue(struct
> > > > > > > > > > > > nfsd_file
> > > > > > > > > > > > *nf,
> > > > > > > > > > > > struct
> > > > > > > > > > > > list_head *dispose)
> > > > > > > > > > > >         int decrement = 1;
> > > > > > > > > > > >
> > > > > > > > > > > >         /* If we raced with someone else unhashing,
> > > > > > > > > > > > ignore it
> > > > > > > > > > > > */
> > > > > > > > > > > > -       if (!nfsd_file_unhash(nf))
> > > > > > > > > > > > +       if (!test_bit(NFSD_FILE_HASHED, &nf-
> > > > > > > > > > > > > nf_flags))
> > > > > > > > > > >
> > > > > > > > > > > The above change looks wrong. I don't think we need to
> > > > > > > > > > > change
> > > > > > > > > > > this.
> > > > > > > > > > >
> > > > > > > > > > > >                 return;
> > > > > > > > > > > >
> > > > > > > > > > > >         /* If we can't get a reference, ignore it */
> > > > > > > > > > > > @@ -578,7 +581,9 @@ nfsd_file_cond_queue(struct
> > > > > > > > > > > > nfsd_file
> > > > > > > > > > > > *nf,
> > > > > > > > > > > > struct
> > > > > > > > > > > > list_head *dispose)
> > > > > > > > > > > >
> > > > > > > > > > > >         /* If refcount goes to 0, then put on the
> > > > > > > > > > > > dispose
> > > > > > > > > > > > list */
> > > > > > > > > > > >         if (refcount_sub_and_test(decrement, &nf-
> > > > > > > > > > > > > nf_ref)) {
> > > > > > > > > > > > -               list_add(&nf->nf_lru, dispose);
> > > > > > > > > > > > +               list_add(&nf->nf_gc, dispose);
> > > > > > > > > > > > +               /* Unhash after adding to dispose
> > > > > > > > > > > > list */
> > > > > > > > > > > > +               nfsd_file_unhash(nf);
> > > > > > > > > > >
> > > > > > > > > > > This too looks wrong? Maybe I'm unclear on the race
> > > > > > > > > > > you're
> > > > > > > > > > > trying
> > > > > > > > > > > to
> > > > > > > > > > > fix with this? What's the harm in unhashing it early?
> > > > > > > > > > >
> > > > > > > > > > > >                 trace_nfsd_file_closing(nf);
> > > > > > > > > > > >         }
> > > > > > > > > > > >  }
> > > > > > > > > > > > @@ -654,8 +659,8 @@ nfsd_file_close_inode_sync(struct
> > > > > > > > > > > > inode
> > > > > > > > > > > > *inode)
> > > > > > > > > > > >
> > > > > > > > > > > >         nfsd_file_queue_for_close(inode, &dispose);
> > > > > > > > > > > >         while (!list_empty(&dispose)) {
> > > > > > > > > > > > -               nf = list_first_entry(&dispose,
> > > > > > > > > > > > struct
> > > > > > > > > > > > nfsd_file,
> > > > > > > > > > > > nf_lru);
> > > > > > > > > > > > -               list_del_init(&nf->nf_lru);
> > > > > > > > > > > > +               nf = list_first_entry(&dispose,
> > > > > > > > > > > > struct
> > > > > > > > > > > > nfsd_file,
> > > > > > > > > > > > nf_gc);
> > > > > > > > > > > > +               list_del_init(&nf->nf_gc);
> > > > > > > > > > > >                 nfsd_file_free(nf);
> > > > > > > > > > > >         }
> > > > > > > > > > > >  }
> > > > > > > > > > > > diff --git a/fs/nfsd/filecache.h
> > > > > > > > > > > > b/fs/nfsd/filecache.h
> > > > > > > > > > > > index c61884def906..3fbec24eea6c 100644
> > > > > > > > > > > > --- a/fs/nfsd/filecache.h
> > > > > > > > > > > > +++ b/fs/nfsd/filecache.h
> > > > > > > > > > > > @@ -44,6 +44,7 @@ struct nfsd_file {
> > > > > > > > > > > >
> > > > > > > > > > > >         struct nfsd_file_mark   *nf_mark;
> > > > > > > > > > > >         struct list_head        nf_lru;
> > > > > > > > > > > > +       struct list_head        nf_gc;
> > > > > > > > > > > >         struct rcu_head         nf_rcu;
> > > > > > > > > > > >         ktime_t                 nf_birthtime;
> > > > > > > > > > > >  };
> > > > > > > > > > > > --
> > > > > > > > > > > > 2.34.1
> > > > > > > > > > > >
> > > > > > > > > > > > On Thu, Jul 4, 2024 at 7:14 AM Jeff Layton
> > > > > > > > > > > > <jlayton@xxxxxxxxxxxxxxx> wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Wed, 2024-07-03 at 16:46 -0400, Youzhong Yang
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > Thank you Chuck. Here are my quick answers to
> > > > > > > > > > > > > > your
> > > > > > > > > > > > > > comments:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > - I don't have a quick reproducer. I reproduced
> > > > > > > > > > > > > > it by
> > > > > > > > > > > > > > using
> > > > > > > > > > > > > > hundreds
> > > > > > > > > > > > > > of nfs clients generating +600K ops under our
> > > > > > > > > > > > > > workload in
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > testing
> > > > > > > > > > > > > > environment. Theoretically it should be possible
> > > > > > > > > > > > > > to
> > > > > > > > > > > > > > simplify
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > reproduction but I am still working on it.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > -  I understand zfs is an out-of-tree file
> > > > > > > > > > > > > > system. That's
> > > > > > > > > > > > > > fine. But
> > > > > > > > > > > > > > this leaking can happen to any file system, and
> > > > > > > > > > > > > > leaking
> > > > > > > > > > > > > > is
> > > > > > > > > > > > > > not a good
> > > > > > > > > > > > > > thing no matter what file system it is.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > -  I will try to come up with a reproducer using
> > > > > > > > > > > > > > xfs or
> > > > > > > > > > > > > > btrfs
> > > > > > > > > > > > > > if possible.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Now back to the problem itself, here are my
> > > > > > > > > > > > > > findings:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > - nfsd_file_put() in one thread can race with
> > > > > > > > > > > > > > another
> > > > > > > > > > > > > > thread
> > > > > > > > > > > > > > doing
> > > > > > > > > > > > > > garbage collection (running nfsd_file_gc() ->
> > > > > > > > > > > > > > list_lru_walk()
> > > > > > > > > > > > > > ->
> > > > > > > > > > > > > > nfsd_file_lru_cb()):
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >   * In nfsd_file_put(), nf->nf_ref is 1, so it
> > > > > > > > > > > > > > tries to
> > > > > > > > > > > > > > do
> > > > > > > > > > > > > > nfsd_file_lru_add().
> > > > > > > > > > > > > >   * nfsd_file_lru_add() returns true (thus
> > > > > > > > > > > > > > NFSD_FILE_REFERENCED bit
> > > > > > > > > > > > > > set for nf->nf_flags)
> > > > > > > > > > > > > >   * garbage collector kicks in,
> > > > > > > > > > > > > > nfsd_file_lru_cb() clears
> > > > > > > > > > > > > > REFERENCED
> > > > > > > > > > > > > > bit and returns LRU_ROTATE.
> > > > > > > > > > > > > >   * garbage collector kicks in again,
> > > > > > > > > > > > > > nfsd_file_lru_cb()
> > > > > > > > > > > > > > now
> > > > > > > > > > > > > > decrements nf->nf_ref to 0, runs
> > > > > > > > > > > > > > nfsd_file_unhash(),
> > > > > > > > > > > > > > removes
> > > > > > > > > > > > > > it from
> > > > > > > > > > > > > > the LRU and adds to the dispose list
> > > > > > > > > > > > > > [list_lru_isolate_move(lru,
> > > > > > > > > > > > > > &nf->nf_lru, head);]
> > > > > > > > > > > > > >   * nfsd_file_put() detects NFSD_FILE_HASHED bit
> > > > > > > > > > > > > > is
> > > > > > > > > > > > > > cleared,
> > > > > > > > > > > > > > so it
> > > > > > > > > > > > > > tries to remove the 'nf' from the LRU [if
> > > > > > > > > > > > > > (!nfsd_file_lru_remove(nf))]
> > > > > > > > > > > > > >   * The 'nf' has been added to the 'dispose' list
> > > > > > > > > > > > > > by
> > > > > > > > > > > > > > nfsd_file_lru_cb(), so nfsd_file_lru_remove(nf)
> > > > > > > > > > > > > > simply
> > > > > > > > > > > > > > treats
> > > > > > > > > > > > > > it as
> > > > > > > > > > > > > > part of the LRU and removes it, which leads it to
> > > > > > > > > > > > > > be
> > > > > > > > > > > > > > removed
> > > > > > > > > > > > > > from the
> > > > > > > > > > > > > > 'dispose' list.
> > > > > > > > > > > > > >   * At this moment, nf->nf_ref is 0, it's
> > > > > > > > > > > > > > unhashed, and
> > > > > > > > > > > > > > not
> > > > > > > > > > > > > > on the
> > > > > > > > > > > > > > LRU. nfsd_file_put() continues its execution [if
> > > > > > > > > > > > > > (refcount_dec_and_test(&nf->nf_ref))], as nf-
> > > > > > > > > > > > > > > nf_ref is
> > > > > > > > > > > > > > already 0, now
> > > > > > > > > > > > > > bad thing happens: nf->nf_ref is set to
> > > > > > > > > > > > > > REFCOUNT_SATURATED,
> > > > > > > > > > > > > > and the
> > > > > > > > > > > > > > 'nf' is leaked.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > To make this happen, the right timing is crucial.
> > > > > > > > > > > > > > It can
> > > > > > > > > > > > > > be
> > > > > > > > > > > > > > reproduced
> > > > > > > > > > > > > > by adding artifical delays in filecache.c, or
> > > > > > > > > > > > > > hammering
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > nfsd with
> > > > > > > > > > > > > > tons of ops.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > - Let's see how nfsd_file_put() can race with
> > > > > > > > > > > > > > nfsd_file_cond_queue():
> > > > > > > > > > > > > >   * In nfsd_file_put(), nf->nf_ref is 1, so it
> > > > > > > > > > > > > > tries to
> > > > > > > > > > > > > > do
> > > > > > > > > > > > > > nfsd_file_lru_add().
> > > > > > > > > > > > > >   * nfsd_file_lru_add() sets REFERENCED bit and
> > > > > > > > > > > > > > returns
> > > > > > > > > > > > > > true.
> > > > > > > > > > > > > >   * 'exportfs -f' or something like that triggers
> > > > > > > > > > > > > > __nfsd_file_cache_purge() ->
> > > > > > > > > > > > > > nfsd_file_cond_queue().
> > > > > > > > > > > > > >   * In nfsd_file_cond_queue(), it runs [if
> > > > > > > > > > > > > > (!nfsd_file_unhash(nf))],
> > > > > > > > > > > > > > unhash is done successfully.
> > > > > > > > > > > > > >   * nfsd_file_cond_queue() runs [if
> > > > > > > > > > > > > > (!nfsd_file_get(nf))],
> > > > > > > > > > > > > > now
> > > > > > > > > > > > > > nf->nf_ref goes to 2.
> > > > > > > > > > > > > >   * nfsd_file_cond_queue() runs [if
> > > > > > > > > > > > > > (nfsd_file_lru_remove(nf))], it succeeds.
> > > > > > > > > > > > > >   * nfsd_file_cond_queue() runs [if
> > > > > > > > > > > > > > (refcount_sub_and_test(decrement,
> > > > > > > > > > > > > > &nf->nf_ref))] (with "decrement" being 2), so the
> > > > > > > > > > > > > > nf-
> > > > > > > > > > > > > > > nf_ref
> > > > > > > > > > > > > > goes to
> > > > > > > > > > > > > > 0, the 'nf' is added to the dispost list
> > > > > > > > > > > > > > [list_add(&nf-
> > > > > > > > > > > > > > > nf_lru,
> > > > > > > > > > > > > > dispose)]
> > > > > > > > > > > > > >   * nfsd_file_put() detects NFSD_FILE_HASHED bit
> > > > > > > > > > > > > > is
> > > > > > > > > > > > > > cleared,
> > > > > > > > > > > > > > so it
> > > > > > > > > > > > > > tries to remove the 'nf' from the LRU [if
> > > > > > > > > > > > > > (!nfsd_file_lru_remove(nf))], although the 'nf'
> > > > > > > > > > > > > > is not in
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > LRU, but
> > > > > > > > > > > > > > it is linked in the 'dispose' list,
> > > > > > > > > > > > > > nfsd_file_lru_remove()
> > > > > > > > > > > > > > simply
> > > > > > > > > > > > > > treats it as part of the LRU and removes it. This
> > > > > > > > > > > > > > leads
> > > > > > > > > > > > > > to
> > > > > > > > > > > > > > its removal
> > > > > > > > > > > > > > from the 'dispose' list!
> > > > > > > > > > > > > >   * Now nf->ref is 0, unhashed. nfsd_file_put()
> > > > > > > > > > > > > > continues
> > > > > > > > > > > > > > its
> > > > > > > > > > > > > > execution and sets nf->nf_ref to
> > > > > > > > > > > > > > REFCOUNT_SATURATED.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > The purpose of nf->nf_lru is problematic. As you
> > > > > > > > > > > > > > can see,
> > > > > > > > > > > > > > it
> > > > > > > > > > > > > > is used
> > > > > > > > > > > > > > for the LRU list, and also the 'dispose' list.
> > > > > > > > > > > > > > Adding
> > > > > > > > > > > > > > another
> > > > > > > > > > > > > > 'struct
> > > > > > > > > > > > > > list_head' specifically for the 'dispose' list
> > > > > > > > > > > > > > seems to
> > > > > > > > > > > > > > be a
> > > > > > > > > > > > > > better
> > > > > > > > > > > > > > way of fixing this race condition. Either way
> > > > > > > > > > > > > > works for
> > > > > > > > > > > > > > me.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Would you agree my above analysis makes sense?
> > > > > > > > > > > > > > Thanks.
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > I think so. It's been a while since I've done much
> > > > > > > > > > > > > work in
> > > > > > > > > > > > > this
> > > > > > > > > > > > > code,
> > > > > > > > > > > > > but it does sound like there is a race in the LRU
> > > > > > > > > > > > > handling.
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > Like Chuck said, the nf->nf_lru list should be safe
> > > > > > > > > > > > > to use
> > > > > > > > > > > > > for
> > > > > > > > > > > > > multiple
> > > > > > > > > > > > > purposes, but that's only the case if we're not
> > > > > > > > > > > > > using that
> > > > > > > > > > > > > list
> > > > > > > > > > > > > as an
> > > > > > > > > > > > > indicator.
> > > > > > > > > > > > >
> > > > > > > > > > > > > The list_lru code does check this:
> > > > > > > > > > > > >
> > > > > > > > > > > > >     if (!list_empty(item)) {
> > > > > > > > > > > > >
> > > > > > > > > > > > > ...so if we ever check this while it's sitting on
> > > > > > > > > > > > > the
> > > > > > > > > > > > > dispose
> > > > > > > > > > > > > list, it
> > > > > > > > > > > > > will handle it incorrectly. It sounds like that's
> > > > > > > > > > > > > the root
> > > > > > > > > > > > > cause of the
> > > > > > > > > > > > > problem you're seeing?
> > > > > > > > > > > > >
> > > > > > > > > > > > > If so, then maybe a separate list_head for disposal
> > > > > > > > > > > > > would
> > > > > > > > > > > > > be
> > > > > > > > > > > > > better.
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Here is my patch with signed-off-by:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > From: Youzhong Yang <youzhong@xxxxxxxxx>
> > > > > > > > > > > > > > Date: Mon, 1 Jul 2024 06:45:22 -0400
> > > > > > > > > > > > > > Subject: [PATCH] nfsd: fix nfsd_file leaking due
> > > > > > > > > > > > > > to race
> > > > > > > > > > > > > > condition and early
> > > > > > > > > > > > > >  unhash
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Signed-off-by: Youzhong Yang <youzhong@xxxxxxxxx>
> > > > > > > > > > > > > > ---
> > > > > > > > > > > > > >  fs/nfsd/filecache.c | 14 +++++++++++++-
> > > > > > > > > > > > > >  1 file changed, 13 insertions(+), 1 deletion(-)
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > diff --git a/fs/nfsd/filecache.c
> > > > > > > > > > > > > > b/fs/nfsd/filecache.c
> > > > > > > > > > > > > > index 1a6d5d000b85..2323829f7208 100644
> > > > > > > > > > > > > > --- a/fs/nfsd/filecache.c
> > > > > > > > > > > > > > +++ b/fs/nfsd/filecache.c
> > > > > > > > > > > > > > @@ -389,6 +389,17 @@ nfsd_file_put(struct
> > > > > > > > > > > > > > nfsd_file *nf)
> > > > > > > > > > > > > >                         if
> > > > > > > > > > > > > > (!nfsd_file_lru_remove(nf))
> > > > > > > > > > > > > >                                 return;
> > > > > > > > > > > > > >                 }
> > > > > > > > > > > > > > +               /*
> > > > > > > > > > > > > > +                * Racing with
> > > > > > > > > > > > > > nfsd_file_cond_queue() or
> > > > > > > > > > > > > > nfsd_file_lru_cb(),
> > > > > > > > > > > > > > +                * it's unhashed but then removed
> > > > > > > > > > > > > > from
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > dispose list,
> > > > > > > > > > > > > > +                * so we need to free it.
> > > > > > > > > > > > > > +                */
> > > > > > > > > > > > > > +               if (refcount_read(&nf->nf_ref) ==
> > > > > > > > > > > > > > 0 &&
> > > > > > > > > > > > >
> > > > > > > > > > > > > A refcount_read in this path is a red flag to me.
> > > > > > > > > > > > > Anytime
> > > > > > > > > > > > > you're just
> > > > > > > > > > > > > looking at the refcount without changing anything
> > > > > > > > > > > > > just
> > > > > > > > > > > > > screams
> > > > > > > > > > > > > out
> > > > > > > > > > > > > "race condition".
> > > > > > > > > > > > >
> > > > > > > > > > > > > In this case, what guarantee is there that this
> > > > > > > > > > > > > won't run
> > > > > > > > > > > > > afoul
> > > > > > > > > > > > > of the
> > > > > > > > > > > > > timing? We could check this and find out it's 1
> > > > > > > > > > > > > just before
> > > > > > > > > > > > > it
> > > > > > > > > > > > > goes to
> > > > > > > > > > > > > 0 and you check the other conditions.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Does anything prevent that?
> > > > > > > > > > > > >
> > > > > > > > > > > > > > +                   !test_bit(NFSD_FILE_HASHED,
> > > > > > > > > > > > > > &nf-
> > > > > > > > > > > > > > > nf_flags) &&
> > > > > > > > > > > > > > +                   list_empty(&nf->nf_lru)) {
> > > > > > > > > > > > > > +                       nfsd_file_free(nf);
> > > > > > > > > > > > > > +                       return;
> > > > > > > > > > > > > > +               }
> > > > > > > > > > > > > >         }
> > > > > > > > > > > > > >         if (refcount_dec_and_test(&nf->nf_ref))
> > > > > > > > > > > > > >                 nfsd_file_free(nf);
> > > > > > > > > > > > > > @@ -576,7 +587,7 @@ nfsd_file_cond_queue(struct
> > > > > > > > > > > > > > nfsd_file
> > > > > > > > > > > > > > *nf, struct
> > > > > > > > > > > > > > list_head *dispose)
> > > > > > > > > > > > > >         int decrement = 1;
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >         /* If we raced with someone else
> > > > > > > > > > > > > > unhashing,
> > > > > > > > > > > > > > ignore it
> > > > > > > > > > > > > > */
> > > > > > > > > > > > > > -       if (!nfsd_file_unhash(nf))
> > > > > > > > > > > > > > +       if (!test_bit(NFSD_FILE_HASHED, &nf-
> > > > > > > > > > > > > > > nf_flags))
> > > > > > > > > > > > > >                 return;
> > > > > > > > > > > > >
> > > > > > > > > > > > > Same here: you're just testing for the HASHED bit,
> > > > > > > > > > > > > but
> > > > > > > > > > > > > could
> > > > > > > > > > > > > this also
> > > > > > > > > > > > > race with someone who is setting it just after you
> > > > > > > > > > > > > get
> > > > > > > > > > > > > here.
> > > > > > > > > > > > > Why is
> > > > > > > > > > > > > that not a problem?
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >         /* If we can't get a reference, ignore it
> > > > > > > > > > > > > > */
> > > > > > > > > > > > > > @@ -590,6 +601,7 @@ nfsd_file_cond_queue(struct
> > > > > > > > > > > > > > nfsd_file
> > > > > > > > > > > > > > *nf, struct
> > > > > > > > > > > > > > list_head *dispose)
> > > > > > > > > > > > > >         /* If refcount goes to 0, then put on the
> > > > > > > > > > > > > > dispose
> > > > > > > > > > > > > > list */
> > > > > > > > > > > > > >         if (refcount_sub_and_test(decrement, &nf-
> > > > > > > > > > > > > > > nf_ref)) {
> > > > > > > > > > > > > >                 list_add(&nf->nf_lru, dispose);
> > > > > > > > > > > > > > +               nfsd_file_unhash(nf);
> > > > > > > > > > > > > >                 trace_nfsd_file_closing(nf);
> > > > > > > > > > > > > >         }
> > > > > > > > > > > > > >  }
> > > > > > > > > > > > > > --
> > > > > > > > > > > > > > 2.43.0
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Wed, Jul 3, 2024 at 2:21 PM Chuck Lever
> > > > > > > > > > > > > > <chuck.lever@xxxxxxxxxx> wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Wed, Jul 03, 2024 at 10:12:33AM -0400,
> > > > > > > > > > > > > > > Youzhong Yang
> > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > Hello,
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > I'd like to report a nfsd_file leaking issue
> > > > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > propose
> > > > > > > > > > > > > > > > a fix for it.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > When I tested Linux kernel 6.8 and 6.6, I
> > > > > > > > > > > > > > > > noticed
> > > > > > > > > > > > > > > > nfsd_file leaks
> > > > > > > > > > > > > > > > which led to undestroyable file systems
> > > > > > > > > > > > > > > > (zfs),
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Thanks for the report. Some initial comments:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > - Do you have a specific reproducer? In other
> > > > > > > > > > > > > > > words,
> > > > > > > > > > > > > > > what
> > > > > > > > > > > > > > > is the
> > > > > > > > > > > > > > >   simplest program that can run on an NFS
> > > > > > > > > > > > > > > client that
> > > > > > > > > > > > > > > will
> > > > > > > > > > > > > > > trigger
> > > > > > > > > > > > > > >   this leak, and can you post it?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > - "zfs" is an out-of-tree file system, so it's
> > > > > > > > > > > > > > > not
> > > > > > > > > > > > > > > directly
> > > > > > > > > > > > > > >   supported for NFSD.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > - The guidelines for patch submission require
> > > > > > > > > > > > > > > us to fix
> > > > > > > > > > > > > > > issues in
> > > > > > > > > > > > > > >   upstream Linux first (currently that's v6.10-
> > > > > > > > > > > > > > > rc6).
> > > > > > > > > > > > > > > Then
> > > > > > > > > > > > > > > that fix
> > > > > > > > > > > > > > >   can be backported to older stable kernels
> > > > > > > > > > > > > > > like 6.6.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Can you reproduce the leak with one of the in-
> > > > > > > > > > > > > > > kernel
> > > > > > > > > > > > > > > filesystems
> > > > > > > > > > > > > > > (either xfs or btrfs would be great) and with
> > > > > > > > > > > > > > > NFSD in
> > > > > > > > > > > > > > > 6.10-
> > > > > > > > > > > > > > > rc6?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > One more comment below.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > here are some examples:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > crash> struct nfsd_file -x ffff88e160db0460
> > > > > > > > > > > > > > > > struct nfsd_file {
> > > > > > > > > > > > > > > >   nf_rlist = {
> > > > > > > > > > > > > > > >     rhead = {
> > > > > > > > > > > > > > > >       next = 0xffff8921fa2392f1
> > > > > > > > > > > > > > > >     },
> > > > > > > > > > > > > > > >     next = 0x0
> > > > > > > > > > > > > > > >   },
> > > > > > > > > > > > > > > >   nf_inode = 0xffff8882bc312ef8,
> > > > > > > > > > > > > > > >   nf_file = 0xffff88e2015b1500,
> > > > > > > > > > > > > > > >   nf_cred = 0xffff88e3ab0e7800,
> > > > > > > > > > > > > > > >   nf_net = 0xffffffff83d41600 <init_net>,
> > > > > > > > > > > > > > > >   nf_flags = 0x8,
> > > > > > > > > > > > > > > >   nf_ref = {
> > > > > > > > > > > > > > > >     refs = {
> > > > > > > > > > > > > > > >       counter = 0xc0000000
> > > > > > > > > > > > > > > >     }
> > > > > > > > > > > > > > > >   },
> > > > > > > > > > > > > > > >   nf_may = 0x4,
> > > > > > > > > > > > > > > >   nf_mark = 0xffff88e1bddfb320,
> > > > > > > > > > > > > > > >   nf_lru = {
> > > > > > > > > > > > > > > >     next = 0xffff88e160db04a8,
> > > > > > > > > > > > > > > >     prev = 0xffff88e160db04a8
> > > > > > > > > > > > > > > >   },
> > > > > > > > > > > > > > > >   nf_rcu = {
> > > > > > > > > > > > > > > >     next = 0x10000000000,
> > > > > > > > > > > > > > > >     func = 0x0
> > > > > > > > > > > > > > > >   },
> > > > > > > > > > > > > > > >   nf_birthtime = 0x73d22fc1728
> > > > > > > > > > > > > > > > }
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > crash> struct
> > > > > > > > > > > > > > > > nfsd_file.nf_flags,nf_ref.refs.counter,nf_lru
> > > > > > > > > > > > > > > > ,nf_file
> > > > > > > > > > > > > > > > -x
> > > > > > > > > > > > > > > > ffff88839a53d850
> > > > > > > > > > > > > > > >   nf_flags = 0x8,
> > > > > > > > > > > > > > > >   nf_ref.refs.counter = 0x0
> > > > > > > > > > > > > > > >   nf_lru = {
> > > > > > > > > > > > > > > >     next = 0xffff88839a53d898,
> > > > > > > > > > > > > > > >     prev = 0xffff88839a53d898
> > > > > > > > > > > > > > > >   },
> > > > > > > > > > > > > > > >   nf_file = 0xffff88810ede8700,
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > crash> struct
> > > > > > > > > > > > > > > > nfsd_file.nf_flags,nf_ref.refs.counter,nf_lru
> > > > > > > > > > > > > > > > ,nf_file
> > > > > > > > > > > > > > > > -x
> > > > > > > > > > > > > > > > ffff88c32b11e850
> > > > > > > > > > > > > > > >   nf_flags = 0x8,
> > > > > > > > > > > > > > > >   nf_ref.refs.counter = 0x0
> > > > > > > > > > > > > > > >   nf_lru = {
> > > > > > > > > > > > > > > >     next = 0xffff88c32b11e898,
> > > > > > > > > > > > > > > >     prev = 0xffff88c32b11e898
> > > > > > > > > > > > > > > >   },
> > > > > > > > > > > > > > > >   nf_file = 0xffff88c20a701c00,
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > crash> struct
> > > > > > > > > > > > > > > > nfsd_file.nf_flags,nf_ref.refs.counter,nf_lru
> > > > > > > > > > > > > > > > ,nf_file
> > > > > > > > > > > > > > > > -x
> > > > > > > > > > > > > > > > ffff88e372709700
> > > > > > > > > > > > > > > >   nf_flags = 0xc,
> > > > > > > > > > > > > > > >   nf_ref.refs.counter = 0x0
> > > > > > > > > > > > > > > >   nf_lru = {
> > > > > > > > > > > > > > > >     next = 0xffff88e372709748,
> > > > > > > > > > > > > > > >     prev = 0xffff88e372709748
> > > > > > > > > > > > > > > >   },
> > > > > > > > > > > > > > > >   nf_file = 0xffff88e0725e6400,
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > crash> struct
> > > > > > > > > > > > > > > > nfsd_file.nf_flags,nf_ref.refs.counter,nf_lru
> > > > > > > > > > > > > > > > ,nf_file
> > > > > > > > > > > > > > > > -x
> > > > > > > > > > > > > > > > ffff8982864944d0
> > > > > > > > > > > > > > > >   nf_flags = 0xc,
> > > > > > > > > > > > > > > >   nf_ref.refs.counter = 0x0
> > > > > > > > > > > > > > > >   nf_lru = {
> > > > > > > > > > > > > > > >     next = 0xffff898286494518,
> > > > > > > > > > > > > > > >     prev = 0xffff898286494518
> > > > > > > > > > > > > > > >   },
> > > > > > > > > > > > > > > >   nf_file = 0xffff89803c0ff700,
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > The leak occurs when nfsd_file_put() races
> > > > > > > > > > > > > > > > with
> > > > > > > > > > > > > > > > nfsd_file_cond_queue()
> > > > > > > > > > > > > > > > or nfsd_file_lru_cb(). With the following
> > > > > > > > > > > > > > > > patch, I
> > > > > > > > > > > > > > > > haven't observed
> > > > > > > > > > > > > > > > any leak after a few days heavy nfs load:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Our patch submission guidelines require a
> > > > > > > > > > > > > > > Signed-off-
> > > > > > > > > > > > > > > by:
> > > > > > > > > > > > > > > line at the end of the patch description. See
> > > > > > > > > > > > > > > the "Sign
> > > > > > > > > > > > > > > your work -
> > > > > > > > > > > > > > > the Developer's Certificate of Origin" section
> > > > > > > > > > > > > > > of
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/submitting-patches.rst?h=v6.10-rc6
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > (Needed here in case your fix is acceptable).
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > diff --git a/fs/nfsd/filecache.c
> > > > > > > > > > > > > > > > b/fs/nfsd/filecache.c
> > > > > > > > > > > > > > > > index 1a6d5d000b85..2323829f7208 100644
> > > > > > > > > > > > > > > > --- a/fs/nfsd/filecache.c
> > > > > > > > > > > > > > > > +++ b/fs/nfsd/filecache.c
> > > > > > > > > > > > > > > > @@ -389,6 +389,17 @@ nfsd_file_put(struct
> > > > > > > > > > > > > > > > nfsd_file
> > > > > > > > > > > > > > > > *nf)
> > > > > > > > > > > > > > > >   if (!nfsd_file_lru_remove(nf))
> > > > > > > > > > > > > > > >   return;
> > > > > > > > > > > > > > > >   }
> > > > > > > > > > > > > > > > + /*
> > > > > > > > > > > > > > > > + * Racing with nfsd_file_cond_queue() or
> > > > > > > > > > > > > > > > nfsd_file_lru_cb(),
> > > > > > > > > > > > > > > > + * it's unhashed but then removed from the
> > > > > > > > > > > > > > > > dispose
> > > > > > > > > > > > > > > > list,
> > > > > > > > > > > > > > > > + * so we need to free it.
> > > > > > > > > > > > > > > > + */
> > > > > > > > > > > > > > > > + if (refcount_read(&nf->nf_ref) == 0 &&
> > > > > > > > > > > > > > > > +     !test_bit(NFSD_FILE_HASHED, &nf-
> > > > > > > > > > > > > > > > > nf_flags) &&
> > > > > > > > > > > > > > > > +     list_empty(&nf->nf_lru)) {
> > > > > > > > > > > > > > > > + nfsd_file_free(nf);
> > > > > > > > > > > > > > > > + return;
> > > > > > > > > > > > > > > > + }
> > > > > > > > > > > > > > > >   }
> > > > > > > > > > > > > > > >   if (refcount_dec_and_test(&nf->nf_ref))
> > > > > > > > > > > > > > > >   nfsd_file_free(nf);
> > > > > > > > > > > > > > > > @@ -576,7 +587,7 @@
> > > > > > > > > > > > > > > > nfsd_file_cond_queue(struct
> > > > > > > > > > > > > > > > nfsd_file
> > > > > > > > > > > > > > > > *nf, struct
> > > > > > > > > > > > > > > > list_head *dispose)
> > > > > > > > > > > > > > > >   int decrement = 1;
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >   /* If we raced with someone else unhashing,
> > > > > > > > > > > > > > > > ignore
> > > > > > > > > > > > > > > > it
> > > > > > > > > > > > > > > > */
> > > > > > > > > > > > > > > > - if (!nfsd_file_unhash(nf))
> > > > > > > > > > > > > > > > + if (!test_bit(NFSD_FILE_HASHED, &nf-
> > > > > > > > > > > > > > > > > nf_flags))
> > > > > > > > > > > > > > > >   return;
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >   /* If we can't get a reference, ignore it
> > > > > > > > > > > > > > > > */
> > > > > > > > > > > > > > > > @@ -590,6 +601,7 @@
> > > > > > > > > > > > > > > > nfsd_file_cond_queue(struct
> > > > > > > > > > > > > > > > nfsd_file
> > > > > > > > > > > > > > > > *nf, struct
> > > > > > > > > > > > > > > > list_head *dispose)
> > > > > > > > > > > > > > > >   /* If refcount goes to 0, then put on the
> > > > > > > > > > > > > > > > dispose
> > > > > > > > > > > > > > > > list
> > > > > > > > > > > > > > > > */
> > > > > > > > > > > > > > > >   if (refcount_sub_and_test(decrement, &nf-
> > > > > > > > > > > > > > > > > nf_ref))
> > > > > > > > > > > > > > > > {
> > > > > > > > > > > > > > > >   list_add(&nf->nf_lru, dispose);
> > > > > > > > > > > > > > > > + nfsd_file_unhash(nf);
> > > > > > > > > > > > > > > >   trace_nfsd_file_closing(nf);
> > > > > > > > > > > > > > > >   }
> > > > > > > > > > > > > > > >  }
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Please kindly review the patch and let me
> > > > > > > > > > > > > > > > know if it
> > > > > > > > > > > > > > > > makes sense.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > -Youzhong
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > --
> > > > > > > > > > > > > > > Chuck Lever
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > --
> > > > > > > > > > > > > Jeff Layton <jlayton@xxxxxxxxxxxxxxx>
> > > > > > > > > > >
> > > > > > > > > > > --
> > > > > > > > > > > Jeff Layton <jlayton@xxxxxxxxxxxxxxx>
> > > > > > > > > > >
> > > > > > > > > > > --
> > > > > > > > > > > Jeff Layton <jlayton@xxxxxxxxxx>
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > Jeff Layton <jlayton@xxxxxxxxxx>
> > > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Jeff Layton <jlayton@xxxxxxxxxx>
> > > > >
> > > > > --
> > > > > Jeff Layton <jlayton@xxxxxxxxxx>
> > >
> > > --
> > > Jeff Layton <jlayton@xxxxxxxxxx>
> >
>
> --
> Jeff Layton <jlayton@xxxxxxxxxx>





[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux