On Mon, Feb 22, 2021 at 5:09 PM <dai.ngo@xxxxxxxxxx> wrote: > > > On 2/22/21 2:01 PM, dai.ngo@xxxxxxxxxx wrote: > > > > On 2/22/21 1:46 PM, dai.ngo@xxxxxxxxxx wrote: > >> > >> On 2/22/21 10:34 AM, dai.ngo@xxxxxxxxxx wrote: > >>> > >>> On 2/20/21 8:16 PM, dai.ngo@xxxxxxxxxx wrote: > >>>> On 2/20/21 6:08 AM, Olga Kornievskaia wrote: > >>>>> On Fri, Feb 19, 2021 at 10:21 PM J. Bruce Fields > >>>>> <bfields@xxxxxxxxxxxx> wrote: > >>>>>> On Fri, Feb 19, 2021 at 05:31:58PM -0800, dai.ngo@xxxxxxxxxx wrote: > >>>>>>> If this is the cause why we don't drop the mount after the copy > >>>>>>> then I can restore the patch and look into this problem. > >>>>>>> Unfortunately, > >>>>>>> all my test machines are down for maintenance until Sunday/Monday. > >>>>>> I think we can take some time to figure out what's actually going on > >>>>>> here before reverting anything. > >>>>> Yes I agree. We need to fix the use-after-free and also make sure > >>>>> that > >>>>> reference will go away. > >>> > >>> I reverted the patch, verified the warning message is back: > >>> > >>> Feb 22 10:07:45 nfsvmf24 kernel: ------------[ cut here ]------------ > >>> Feb 22 10:07:45 nfsvmf24 kernel: refcount_t: underflow; use-after-free. > >>> > >>> then did a inter-server copy and waited for more than 20 mins and > >>> the destination server still maintains the session with the source > >>> server. It must be some other references that prevent the mount > >>> to go away. > >> > >> This change fixed the unmount after inter-server copy problem: > >> > >> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c > >> index 8d6d2678abad..87687cd18938 100644 > >> --- a/fs/nfsd/nfs4proc.c > >> +++ b/fs/nfsd/nfs4proc.c > >> @@ -1304,7 +1304,7 @@ nfsd4_cleanup_inter_ssc(struct vfsmount > >> *ss_mnt, struct nfsd_file *src, > >> struct nfsd_file *dst) > >> { > >> nfs42_ssc_close(src->nf_file); > >> - /* 'src' is freed by nfsd4_do_async_copy */ > >> + nfsd_file_put(src); > >> nfsd_file_put(dst); > >> mntput(ss_mnt); > >> } > > This change is not need. It's left over from my testing to > reproduce the warning messages. Only the change in > nfsd4_do_async_copy is needed for the unmount problem. > > -Dai > > >> @@ -1472,14 +1472,12 @@ static int nfsd4_do_async_copy(void *data) > >> copy->nf_src = kzalloc(sizeof(struct nfsd_file), > >> GFP_KERNEL); > >> if (!copy->nf_src) { > >> copy->nfserr = nfserr_serverfault; > >> - nfsd4_interssc_disconnect(copy->ss_mnt); > >> goto do_callback; > >> } > >> copy->nf_src->nf_file = nfs42_ssc_open(copy->ss_mnt, > >> ©->c_fh, > >> ©->stateid); > >> if (IS_ERR(copy->nf_src->nf_file)) { > >> copy->nfserr = nfserr_offload_denied; > >> - nfsd4_interssc_disconnect(copy->ss_mnt); > >> goto do_callback; > >> } > >> } > >> @@ -1498,6 +1496,7 @@ static int nfsd4_do_async_copy(void *data) > >> &nfsd4_cb_offload_ops, > >> NFSPROC4_CLNT_CB_OFFLOAD); > >> nfsd4_run_cb(&cb_copy->cp_cb); > >> out: > >> + nfsd4_interssc_disconnect(copy->ss_mnt); > >> if (!copy->cp_intra) > >> kfree(copy->nf_src); > >> cleanup_async_copy(copy); > >> > >> But there is something new. I tried inter-server copy twice. > >> First time I can verify from tshark capture that a session was > >> created and destroy, along with all the NFS ops. On 2nd try, > >> I can Hi Dai/Bruce, While I believe the fix works (as in the mount goes away), I'm not comfortable with this fix as I believe we will be leaking resources. Server calls nfs42_ssc_open() which creates a legit file pointer (yes it takes a reference on superblock but it also allocates memory for "file" structure). Normally a file structure requires doing an fput() which if I look thru the code does a bunch of things before in the end calling mntput(). While we free the copy->nf_src in nfs4_do_asyn_copy(), the copy->nf_src->nf_file was allocated separately and would have been freed from calling fput() on it. So I guess it's not correct to do kfree(copy->nf_src) in teh nfs4_do_async_copy() I think that's probably where use-after-free comes in. We need to keep it until the cleanup. I'm thinking perhaps call fput(copy->nf_src->nf_file) first and then free it? Just an idea but I haven't tested it.