Re: [PATCH] NFSv4.1: nfs4_fl_prepare_ds must be careful about reporting success.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, Nov 19, 2016 at 1:33 AM, NeilBrown <neilb@xxxxxxxx> wrote:
> On Sat, Nov 19 2016, Olga Kornievskaia wrote:
>
>> On Fri, Nov 18, 2016 at 9:32 AM, Trond Myklebust
>> <trondmy@xxxxxxxxxxxxxxx> wrote:
>>>
>>>> On Nov 18, 2016, at 00:01, NeilBrown <neilb@xxxxxxxx> wrote:
>>>>
>>>>
>>>> Ping?
>>>> No sign of this in linux-next, and no replies….
>>>
>>> I’d like some confirmation from the NetApp folks on this. They are the main stakeholders for the pNFS files implementation. I don’t think we need an equivalent for flex files.
>>
>> Just to see if I understand this patch. If "ds" isn't NULL, then the
>> client has an RPC client with the ds. But it doesn't have a valid NFS
>> client? Looking at the code I really don't see how that can happen.
>
> I thought I explained that, but I was rather brief.
>
> nfs4_fl_prepare_ds() calls nfs4_pnfs_ds_connect() in order to create the
> NFS client.  The first time it is called it will set NFS4DS_CONNECTING
> and then call _nfs4_pnfs_v?_ds_connect() which will establish the
> connection and set ds->ds_clp.  If the server is unresponsive, this an
> block indefinitely.
>
> If a second thread then tries the same time, it will enter
> nfs4_pnfs_ds_connect() and will discovery that NFS4DS_CONNECTING is
> already set, so it will call nfs4_wait_ds_connect().
>
> As nfs4_wait_ds_connect() waits with TASK_KILLABLE, it will abort if the
> process is kill by an uncaught signal (such as SIGKILL).
> In this case it will return even though NFS4DS_CONNECTING is still set,
> and ds->ds_clp is still NULL.  i.e. the first thread hasn't established
> the connection yet.
>
> As the process has been killed, the 'ds' that it holds that doesn't have
> an NFS client handle will never be used.  But we need to be sure that
> the process will exit cleanly without trying to de-reference that NULL
> ds_clp.
>
> That is what the patch ensures.

Thanks for the explanation. Makes sense.

>> Maybe there is a bug elsewhere? If we return here, is there a chance
>> we are going to have a zombie RPC client with the ds? If that's not a
>> problem then I don't think there an issue to assume that if there is
>> no valid NFS client then we would want to return a NULL from
>> nfs4_fl_prepare_ds().
>
> It isn't a "zombie RPC client", but rather an unborn RPC client, which
> still has NFS4DS_CONNECTING set.
>
> Thanks,
> NeilBrown
>
>
>>
>>
>>>
>>>>
>>>> Thanks,
>>>> NeilBrown
>>>>
>>>>
>>>> On Fri, Oct 21 2016, NeilBrown wrote:
>>>>
>>>>>
>>>>> Various places assume that if nfs4_fl_prepare_ds() turns a non-NULL
>>>>> 'ds', then ds->ds_clp will also be non-NULL.
>>>>>
>>>>> This is not necessasrily true in the case when the process received a
>>>>> fatal signal while nfs4_pnfs_ds_connect is waiting in
>>>>> nfs4_wait_ds_connect().  In that case ->ds_clp may not be set, and the
>>>>> devid may not recently have been marked unavailable.
>>>>>
>>>>> So add a test for ->ds_clp == NULL and return NULL in that case.
>>>>>
>>>>> Fixes: c23266d532b4 ("NFS4.1 Fix data server connection race")
>>>>> Signed-off-by: NeilBrown <neilb@xxxxxxxx>
>>>>> ---
>>>>> fs/nfs/filelayout/filelayoutdev.c | 3 ++-
>>>>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/fs/nfs/filelayout/filelayoutdev.c b/fs/nfs/filelayout/filelayoutdev.c
>>>>> index 4946ef40ba87..85ef38f9765f 100644
>>>>> --- a/fs/nfs/filelayout/filelayoutdev.c
>>>>> +++ b/fs/nfs/filelayout/filelayoutdev.c
>>>>> @@ -283,7 +283,8 @@ nfs4_fl_prepare_ds(struct pnfs_layout_segment *lseg, u32 ds_idx)
>>>>>                           s->nfs_client->cl_rpcclient->cl_auth->au_flavor);
>>>>>
>>>>> out_test_devid:
>>>>> -    if (filelayout_test_devid_unavailable(devid))
>>>>> +    if (ret->ds_clp == NULL ||
>>>>> +        filelayout_test_devid_unavailable(devid))
>>>>>              ret = NULL;
>>>>> out:
>>>>>      return ret;
>>>>> --
>>>>> 2.10.1
>>>
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux