Re: [PATCH 1/2] NFSD: Fix use-after-free warning when doing inter-server copy

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 2/22/21 2:01 PM, dai.ngo@xxxxxxxxxx wrote:

On 2/22/21 1:46 PM, dai.ngo@xxxxxxxxxx wrote:

On 2/22/21 10:34 AM, dai.ngo@xxxxxxxxxx wrote:

On 2/20/21 8:16 PM, dai.ngo@xxxxxxxxxx wrote:
On 2/20/21 6:08 AM, Olga Kornievskaia wrote:
On Fri, Feb 19, 2021 at 10:21 PM J. Bruce Fields <bfields@xxxxxxxxxxxx> wrote:
On Fri, Feb 19, 2021 at 05:31:58PM -0800, dai.ngo@xxxxxxxxxx wrote:
If this is the cause why we don't drop the mount after the copy
then I can restore the patch and look into this problem. Unfortunately,
all my test machines are down for maintenance until Sunday/Monday.
I think we can take some time to figure out what's actually going on
here before reverting anything.
Yes I agree. We need to fix the use-after-free and also make sure that reference will go away.

I reverted the patch, verified the warning message is back:

Feb 22 10:07:45 nfsvmf24 kernel: ------------[ cut here ]------------
Feb 22 10:07:45 nfsvmf24 kernel: refcount_t: underflow; use-after-free.

then did a inter-server copy and waited for more than 20 mins and
the destination server still maintains the session with the source
server.  It must be some other references that prevent the mount
to go away.

This change fixed the unmount after inter-server copy problem:

diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index 8d6d2678abad..87687cd18938 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -1304,7 +1304,7 @@ nfsd4_cleanup_inter_ssc(struct vfsmount *ss_mnt, struct nfsd_file *src,
                        struct nfsd_file *dst)
 {
        nfs42_ssc_close(src->nf_file);
-       /* 'src' is freed by nfsd4_do_async_copy */
+       nfsd_file_put(src);
        nfsd_file_put(dst);
        mntput(ss_mnt);
 }

This change is not need. It's left over from my testing to
reproduce the warning messages. Only the change in
nfsd4_do_async_copy is needed for the unmount problem.

-Dai

@@ -1472,14 +1472,12 @@ static int nfsd4_do_async_copy(void *data)
                copy->nf_src = kzalloc(sizeof(struct nfsd_file), GFP_KERNEL);
                if (!copy->nf_src) {
                        copy->nfserr = nfserr_serverfault;
- nfsd4_interssc_disconnect(copy->ss_mnt);
                        goto do_callback;
                }
                copy->nf_src->nf_file = nfs42_ssc_open(copy->ss_mnt, &copy->c_fh,
&copy->stateid);
                if (IS_ERR(copy->nf_src->nf_file)) {
                        copy->nfserr = nfserr_offload_denied;
- nfsd4_interssc_disconnect(copy->ss_mnt);
                        goto do_callback;
                }
        }
@@ -1498,6 +1496,7 @@ static int nfsd4_do_async_copy(void *data)
                        &nfsd4_cb_offload_ops, NFSPROC4_CLNT_CB_OFFLOAD);
        nfsd4_run_cb(&cb_copy->cp_cb);
 out:
+       nfsd4_interssc_disconnect(copy->ss_mnt);
        if (!copy->cp_intra)
                kfree(copy->nf_src);
        cleanup_async_copy(copy);

But there is something new. I tried inter-server copy twice.
First time I can verify from tshark capture that a session was
created and destroy, along with all the NFS ops. On 2nd try,
I can

cannot

see any NFS ops from the tshark capture because all data
are encrypted by TLS/SSLv2. This behavior seems to repeat.
I will look into it but I think this is a separate issue.

-Dai


-Dai

I'm assuming to reproduce the use-after-free I
need to run with kazan, is that what you did Dai?

I did not use Kasan. I just did an inter-server copy and this message
showed up in /var/log/messages.

-Dai


--b.

-Dai

On 2/19/21 5:09 PM, J. Bruce Fields wrote:
Dai, do you have a copy of the original use-after-free warning?

--b.

On Fri, Feb 19, 2021 at 07:18:53PM -0500, Olga Kornievskaia wrote:
Hi Dai (Bruce),

This patch is what broke the mount that's now left behind between the source server and the destination server. We are no longer dropping the necessary reference on the mount to go away. I haven't been paying as much attention as I should have been to the changes. The original code called fput(src) so a simple refcount of the file. Then things got complicated and moved to nfsd_file_put(). So I don't understand complexity. But we need to do some kind of put to decrement the needed
reference on the superblock. Bruce any ideas? Can we go back to
fput()?

On Thu, Oct 29, 2020 at 3:08 PM Dai Ngo <dai.ngo@xxxxxxxxxx> wrote:
The source file nfsd_file is not constructed the same as other
nfsd_file's via nfsd_file_alloc. nfsd_file_put should not be
called to free the object; nfsd_file_put is not the inverse of
kzalloc, instead kfree is called by nfsd4_do_async_copy when done.

Fixes: ce0887ac96d3 ("NFSD add nfs4 inter ssc to nfsd4_copy")
Signed-off-by: Dai Ngo <dai.ngo@xxxxxxxxxx>
---
  fs/nfsd/nfs4proc.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index ad2fa1a8e7ad..9c43cad7e408 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -1299,7 +1299,7 @@ nfsd4_cleanup_inter_ssc(struct vfsmount *ss_mnt, struct nfsd_file *src,
                         struct nfsd_file *dst)
  {
         nfs42_ssc_close(src->nf_file);
-       nfsd_file_put(src);
+       /* 'src' is freed by nfsd4_do_async_copy */
         nfsd_file_put(dst);
         mntput(ss_mnt);
  }
--
2.20.1.1226.g1595ea5.dirty




[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux