Re: [PATCH 1/2] NFSD: Fix use-after-free warning when doing inter-server copy

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> On Feb 25, 2021, at 1:58 PM, dai.ngo@xxxxxxxxxx wrote:
> 
> 
> On 2/24/21 6:26 PM, dai.ngo@xxxxxxxxxx wrote:
>> Hi Olga and Bruce,
>> 
>> On 2/24/21 2:35 PM, Olga Kornievskaia wrote:
>>> On Mon, Feb 22, 2021 at 5:09 PM <dai.ngo@xxxxxxxxxx> wrote:
>>>> 
>>>> On 2/22/21 2:01 PM, dai.ngo@xxxxxxxxxx wrote:
>>>>> On 2/22/21 1:46 PM, dai.ngo@xxxxxxxxxx wrote:
>>>>>> On 2/22/21 10:34 AM, dai.ngo@xxxxxxxxxx wrote:
>>>>>>> On 2/20/21 8:16 PM, dai.ngo@xxxxxxxxxx wrote:
>>>>>>>> On 2/20/21 6:08 AM, Olga Kornievskaia wrote:
>>>>>>>>> On Fri, Feb 19, 2021 at 10:21 PM J. Bruce Fields
>>>>>>>>> <bfields@xxxxxxxxxxxx> wrote:
>>>>>>>>>> On Fri, Feb 19, 2021 at 05:31:58PM -0800, dai.ngo@xxxxxxxxxx wrote:
>>>>>>>>>>> If this is the cause why we don't drop the mount after the copy
>>>>>>>>>>> then I can restore the patch and look into this problem.
>>>>>>>>>>> Unfortunately,
>>>>>>>>>>> all my test machines are down for maintenance until Sunday/Monday.
>>>>>>>>>> I think we can take some time to figure out what's actually going on
>>>>>>>>>> here before reverting anything.
>>>>>>>>> Yes I agree. We need to fix the use-after-free and also make sure
>>>>>>>>> that
>>>>>>>>> reference will go away.
>>>>>>> I reverted the patch, verified the warning message is back:
>>>>>>> 
>>>>>>> Feb 22 10:07:45 nfsvmf24 kernel: ------------[ cut here ]------------
>>>>>>> Feb 22 10:07:45 nfsvmf24 kernel: refcount_t: underflow; use-after-free.
>>>>>>> 
>>>>>>> then did a inter-server copy and waited for more than 20 mins and
>>>>>>> the destination server still maintains the session with the source
>>>>>>> server.  It must be some other references that prevent the mount
>>>>>>> to go away.
>>>>>> This change fixed the unmount after inter-server copy problem:
>>>>>> 
>>>>>> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
>>>>>> index 8d6d2678abad..87687cd18938 100644
>>>>>> --- a/fs/nfsd/nfs4proc.c
>>>>>> +++ b/fs/nfsd/nfs4proc.c
>>>>>> @@ -1304,7 +1304,7 @@ nfsd4_cleanup_inter_ssc(struct vfsmount
>>>>>> *ss_mnt, struct nfsd_file *src,
>>>>>>                          struct nfsd_file *dst)
>>>>>>   {
>>>>>>          nfs42_ssc_close(src->nf_file);
>>>>>> -       /* 'src' is freed by nfsd4_do_async_copy */
>>>>>> +       nfsd_file_put(src);
>>>>>>          nfsd_file_put(dst);
>>>>>>          mntput(ss_mnt);
>>>>>>   }
>>>> This change is not need. It's left over from my testing to
>>>> reproduce the warning messages. Only the change in
>>>> nfsd4_do_async_copy is needed for the unmount problem.
>>>> 
>>>> -Dai
>>>> 
>>>>>> @@ -1472,14 +1472,12 @@ static int nfsd4_do_async_copy(void *data)
>>>>>>                  copy->nf_src = kzalloc(sizeof(struct nfsd_file),
>>>>>> GFP_KERNEL);
>>>>>>                  if (!copy->nf_src) {
>>>>>>                          copy->nfserr = nfserr_serverfault;
>>>>>> - nfsd4_interssc_disconnect(copy->ss_mnt);
>>>>>>                          goto do_callback;
>>>>>>                  }
>>>>>>                  copy->nf_src->nf_file = nfs42_ssc_open(copy->ss_mnt,
>>>>>> &copy->c_fh,
>>>>>> &copy->stateid);
>>>>>>                  if (IS_ERR(copy->nf_src->nf_file)) {
>>>>>>                          copy->nfserr = nfserr_offload_denied;
>>>>>> - nfsd4_interssc_disconnect(copy->ss_mnt);
>>>>>>                          goto do_callback;
>>>>>>                  }
>>>>>>          }
>>>>>> @@ -1498,6 +1496,7 @@ static int nfsd4_do_async_copy(void *data)
>>>>>>                          &nfsd4_cb_offload_ops,
>>>>>> NFSPROC4_CLNT_CB_OFFLOAD);
>>>>>>          nfsd4_run_cb(&cb_copy->cp_cb);
>>>>>>   out:
>>>>>> +       nfsd4_interssc_disconnect(copy->ss_mnt);
>>>>>>          if (!copy->cp_intra)
>>>>>>                  kfree(copy->nf_src);
>>>>>>          cleanup_async_copy(copy);
>>>>>> 
>>>>>> But there is something new. I tried inter-server copy twice.
>>>>>> First time I can verify from tshark capture that a session was
>>>>>> created and destroy, along with all the NFS ops. On 2nd try,
>>>>>> I can
>>> Hi Dai/Bruce,
>>> 
>>> While I believe the fix works (as in the mount goes away), I'm not
>>> comfortable with this fix as I believe we will be leaking resources.
>>> Server calls nfs42_ssc_open() which creates a legit file pointer (yes
>>> it takes a reference on superblock but it also allocates memory for
>>> "file" structure). Normally a file structure requires doing an fput()
>>> which if I look thru the code does a bunch of things before in the end
>>> calling mntput(). While we free the copy->nf_src in
>>> nfs4_do_asyn_copy(), the copy->nf_src->nf_file was allocated
>>> separately and would have been freed from calling fput() on it.
>>> 
>>> So I guess it's not correct to do kfree(copy->nf_src) in teh
>>> nfs4_do_async_copy() I think that's probably where use-after-free
>>> comes in. We need to keep it until the cleanup. I'm thinking perhaps
>>> call fput(copy->nf_src->nf_file) first and then free it? Just an idea
>>> but I haven't tested it.
>> 
>> I think the unmount can be treated separately and the fix seems
>> to be valid for this issue. I think kfree(copy->nf_src) is called
>> after nfsd4_cleanup_inter_ssc so it's not the reason for the
>> 'use-after-free' warning. A quick look at nfsd4_do_async_copy
>> I see nf_src was kzalloc'ed but no ref count added to it.
>> 
>> I will review the whole cleanup part and report back.
> 
> I think this would fix the resource leak problem:
> 
> diff --git a/fs/nfs/nfs4file.c b/fs/nfs/nfs4file.c
> index 57b3821d975a..742fc128fdc8 100644
> --- a/fs/nfs/nfs4file.c
> +++ b/fs/nfs/nfs4file.c
> @@ -405,6 +405,11 @@ static void __nfs42_ssc_close(struct file *filep)
>        struct nfs_open_context *ctx = nfs_file_open_context(filep);
>         ctx->state->flags = 0;
> +       if (!filep)
> +               return;
> +       get_file(filep);
> +       filp_close(filep, NULL);
> +       fput(filep);
> }
>  static const struct nfs4_ssc_client_ops nfs4_ssc_clnt_ops_tbl = {
> 
> -Dai

I'm not clear on the final outcome. Is a code change needed?
If so, can someone post a patch (with a full description and
Signed-off-by) or point me to one that's been posted already?

Thanks!

--
Chuck Lever







[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux