Re: [PATCH 1/2] NFSD: Fix use-after-free warning when doing inter-server copy

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 2/24/21 6:26 PM, dai.ngo@xxxxxxxxxx wrote:
Hi Olga and Bruce,

On 2/24/21 2:35 PM, Olga Kornievskaia wrote:
On Mon, Feb 22, 2021 at 5:09 PM <dai.ngo@xxxxxxxxxx> wrote:

On 2/22/21 2:01 PM, dai.ngo@xxxxxxxxxx wrote:
On 2/22/21 1:46 PM, dai.ngo@xxxxxxxxxx wrote:
On 2/22/21 10:34 AM, dai.ngo@xxxxxxxxxx wrote:
On 2/20/21 8:16 PM, dai.ngo@xxxxxxxxxx wrote:
On 2/20/21 6:08 AM, Olga Kornievskaia wrote:
On Fri, Feb 19, 2021 at 10:21 PM J. Bruce Fields
<bfields@xxxxxxxxxxxx> wrote:
On Fri, Feb 19, 2021 at 05:31:58PM -0800, dai.ngo@xxxxxxxxxx wrote:
If this is the cause why we don't drop the mount after the copy
then I can restore the patch and look into this problem.
Unfortunately,
all my test machines are down for maintenance until Sunday/Monday.
I think we can take some time to figure out what's actually going on
here before reverting anything.
Yes I agree. We need to fix the use-after-free and also make sure
that
reference will go away.
I reverted the patch, verified the warning message is back:

Feb 22 10:07:45 nfsvmf24 kernel: ------------[ cut here ]------------ Feb 22 10:07:45 nfsvmf24 kernel: refcount_t: underflow; use-after-free.

then did a inter-server copy and waited for more than 20 mins and
the destination server still maintains the session with the source
server.  It must be some other references that prevent the mount
to go away.
This change fixed the unmount after inter-server copy problem:

diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index 8d6d2678abad..87687cd18938 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -1304,7 +1304,7 @@ nfsd4_cleanup_inter_ssc(struct vfsmount
*ss_mnt, struct nfsd_file *src,
                         struct nfsd_file *dst)
  {
         nfs42_ssc_close(src->nf_file);
-       /* 'src' is freed by nfsd4_do_async_copy */
+       nfsd_file_put(src);
         nfsd_file_put(dst);
         mntput(ss_mnt);
  }
This change is not need. It's left over from my testing to
reproduce the warning messages. Only the change in
nfsd4_do_async_copy is needed for the unmount problem.

-Dai

@@ -1472,14 +1472,12 @@ static int nfsd4_do_async_copy(void *data)
                 copy->nf_src = kzalloc(sizeof(struct nfsd_file),
GFP_KERNEL);
                 if (!copy->nf_src) {
                         copy->nfserr = nfserr_serverfault;
- nfsd4_interssc_disconnect(copy->ss_mnt);
                         goto do_callback;
                 }
                 copy->nf_src->nf_file = nfs42_ssc_open(copy->ss_mnt,
&copy->c_fh,
&copy->stateid);
                 if (IS_ERR(copy->nf_src->nf_file)) {
                         copy->nfserr = nfserr_offload_denied;
- nfsd4_interssc_disconnect(copy->ss_mnt);
                         goto do_callback;
                 }
         }
@@ -1498,6 +1496,7 @@ static int nfsd4_do_async_copy(void *data)
                         &nfsd4_cb_offload_ops,
NFSPROC4_CLNT_CB_OFFLOAD);
         nfsd4_run_cb(&cb_copy->cp_cb);
  out:
+       nfsd4_interssc_disconnect(copy->ss_mnt);
         if (!copy->cp_intra)
                 kfree(copy->nf_src);
         cleanup_async_copy(copy);

But there is something new. I tried inter-server copy twice.
First time I can verify from tshark capture that a session was
created and destroy, along with all the NFS ops. On 2nd try,
I can
Hi Dai/Bruce,

While I believe the fix works (as in the mount goes away), I'm not
comfortable with this fix as I believe we will be leaking resources.
Server calls nfs42_ssc_open() which creates a legit file pointer (yes
it takes a reference on superblock but it also allocates memory for
"file" structure). Normally a file structure requires doing an fput()
which if I look thru the code does a bunch of things before in the end
calling mntput(). While we free the copy->nf_src in
nfs4_do_asyn_copy(), the copy->nf_src->nf_file was allocated
separately and would have been freed from calling fput() on it.

So I guess it's not correct to do kfree(copy->nf_src) in teh
nfs4_do_async_copy() I think that's probably where use-after-free
comes in. We need to keep it until the cleanup. I'm thinking perhaps
call fput(copy->nf_src->nf_file) first and then free it? Just an idea
but I haven't tested it.

I think the unmount can be treated separately and the fix seems
to be valid for this issue. I think kfree(copy->nf_src) is called
after nfsd4_cleanup_inter_ssc so it's not the reason for the
'use-after-free' warning. A quick look at nfsd4_do_async_copy
I see nf_src was kzalloc'ed but no ref count added to it.

I will review the whole cleanup part and report back.

I think this would fix the resource leak problem:

diff --git a/fs/nfs/nfs4file.c b/fs/nfs/nfs4file.c
index 57b3821d975a..742fc128fdc8 100644
--- a/fs/nfs/nfs4file.c
+++ b/fs/nfs/nfs4file.c
@@ -405,6 +405,11 @@ static void __nfs42_ssc_close(struct file *filep)
        struct nfs_open_context *ctx = nfs_file_open_context(filep);
ctx->state->flags = 0;
+       if (!filep)
+               return;
+       get_file(filep);
+       filp_close(filep, NULL);
+       fput(filep);
 }
static const struct nfs4_ssc_client_ops nfs4_ssc_clnt_ops_tbl = {

-Dai




[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux