Re: Leaked nfsd_file due to race condition and early unhash (fs/nfsd/filecache.c)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jul 03, 2024 at 10:12:33AM -0400, Youzhong Yang wrote:
> Hello,
> 
> I'd like to report a nfsd_file leaking issue and propose a fix for it.
> 
> When I tested Linux kernel 6.8 and 6.6, I noticed nfsd_file leaks
> which led to undestroyable file systems (zfs),

Thanks for the report. Some initial comments:

- Do you have a specific reproducer? In other words, what is the
  simplest program that can run on an NFS client that will trigger
  this leak, and can you post it?

- "zfs" is an out-of-tree file system, so it's not directly
  supported for NFSD.

- The guidelines for patch submission require us to fix issues in
  upstream Linux first (currently that's v6.10-rc6). Then that fix
  can be backported to older stable kernels like 6.6.

Can you reproduce the leak with one of the in-kernel filesystems
(either xfs or btrfs would be great) and with NFSD in 6.10-rc6?

One more comment below.


> here are some examples:
> 
> crash> struct nfsd_file -x ffff88e160db0460
> struct nfsd_file {
>   nf_rlist = {
>     rhead = {
>       next = 0xffff8921fa2392f1
>     },
>     next = 0x0
>   },
>   nf_inode = 0xffff8882bc312ef8,
>   nf_file = 0xffff88e2015b1500,
>   nf_cred = 0xffff88e3ab0e7800,
>   nf_net = 0xffffffff83d41600 <init_net>,
>   nf_flags = 0x8,
>   nf_ref = {
>     refs = {
>       counter = 0xc0000000
>     }
>   },
>   nf_may = 0x4,
>   nf_mark = 0xffff88e1bddfb320,
>   nf_lru = {
>     next = 0xffff88e160db04a8,
>     prev = 0xffff88e160db04a8
>   },
>   nf_rcu = {
>     next = 0x10000000000,
>     func = 0x0
>   },
>   nf_birthtime = 0x73d22fc1728
> }
> 
> crash> struct nfsd_file.nf_flags,nf_ref.refs.counter,nf_lru,nf_file -x
> ffff88839a53d850
>   nf_flags = 0x8,
>   nf_ref.refs.counter = 0x0
>   nf_lru = {
>     next = 0xffff88839a53d898,
>     prev = 0xffff88839a53d898
>   },
>   nf_file = 0xffff88810ede8700,
> 
> crash> struct nfsd_file.nf_flags,nf_ref.refs.counter,nf_lru,nf_file -x
> ffff88c32b11e850
>   nf_flags = 0x8,
>   nf_ref.refs.counter = 0x0
>   nf_lru = {
>     next = 0xffff88c32b11e898,
>     prev = 0xffff88c32b11e898
>   },
>   nf_file = 0xffff88c20a701c00,
> 
> crash> struct nfsd_file.nf_flags,nf_ref.refs.counter,nf_lru,nf_file -x
> ffff88e372709700
>   nf_flags = 0xc,
>   nf_ref.refs.counter = 0x0
>   nf_lru = {
>     next = 0xffff88e372709748,
>     prev = 0xffff88e372709748
>   },
>   nf_file = 0xffff88e0725e6400,
> 
> crash> struct nfsd_file.nf_flags,nf_ref.refs.counter,nf_lru,nf_file -x
> ffff8982864944d0
>   nf_flags = 0xc,
>   nf_ref.refs.counter = 0x0
>   nf_lru = {
>     next = 0xffff898286494518,
>     prev = 0xffff898286494518
>   },
>   nf_file = 0xffff89803c0ff700,
> 
> The leak occurs when nfsd_file_put() races with nfsd_file_cond_queue()
> or nfsd_file_lru_cb(). With the following patch, I haven't observed
> any leak after a few days heavy nfs load:

Our patch submission guidelines require a Signed-off-by:
line at the end of the patch description. See the "Sign your work -
the Developer's Certificate of Origin" section of

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/submitting-patches.rst?h=v6.10-rc6

(Needed here in case your fix is acceptable).


> diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
> index 1a6d5d000b85..2323829f7208 100644
> --- a/fs/nfsd/filecache.c
> +++ b/fs/nfsd/filecache.c
> @@ -389,6 +389,17 @@ nfsd_file_put(struct nfsd_file *nf)
>   if (!nfsd_file_lru_remove(nf))
>   return;
>   }
> + /*
> + * Racing with nfsd_file_cond_queue() or nfsd_file_lru_cb(),
> + * it's unhashed but then removed from the dispose list,
> + * so we need to free it.
> + */
> + if (refcount_read(&nf->nf_ref) == 0 &&
> +     !test_bit(NFSD_FILE_HASHED, &nf->nf_flags) &&
> +     list_empty(&nf->nf_lru)) {
> + nfsd_file_free(nf);
> + return;
> + }
>   }
>   if (refcount_dec_and_test(&nf->nf_ref))
>   nfsd_file_free(nf);
> @@ -576,7 +587,7 @@ nfsd_file_cond_queue(struct nfsd_file *nf, struct
> list_head *dispose)
>   int decrement = 1;
> 
>   /* If we raced with someone else unhashing, ignore it */
> - if (!nfsd_file_unhash(nf))
> + if (!test_bit(NFSD_FILE_HASHED, &nf->nf_flags))
>   return;
> 
>   /* If we can't get a reference, ignore it */
> @@ -590,6 +601,7 @@ nfsd_file_cond_queue(struct nfsd_file *nf, struct
> list_head *dispose)
>   /* If refcount goes to 0, then put on the dispose list */
>   if (refcount_sub_and_test(decrement, &nf->nf_ref)) {
>   list_add(&nf->nf_lru, dispose);
> + nfsd_file_unhash(nf);
>   trace_nfsd_file_closing(nf);
>   }
>  }
> 
> Please kindly review the patch and let me know if it makes sense.
> 
> Thanks,
> 
> -Youzhong
> 

-- 
Chuck Lever




[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux