Re: [PATCH 1/1] NFSD: fix error handling in callbacks

Chuck Lever III <chuck.lever@xxxxxxxxxx> · Thu, 11 Mar 2021 15:10:09 +0000

> On Mar 10, 2021, at 5:09 PM, Olga Kornievskaia <olga.kornievskaia@xxxxxxxxx> wrote:
> 
> On Wed, Mar 10, 2021 at 9:47 AM J. Bruce Fields <bfields@xxxxxxxxxx> wrote:
>> 
>> On Tue, Mar 09, 2021 at 08:59:51PM +0000, Trond Myklebust wrote:
>>> On Tue, 2021-03-09 at 15:41 -0500, Olga Kornievskaia wrote:
>>>> On Tue, Mar 9, 2021 at 3:22 PM Trond Myklebust <
>>>> trondmy@xxxxxxxxxxxxxxx> wrote:
>>>>> 
>>>>> On Tue, 2021-03-09 at 10:37 -0500, J. Bruce Fields wrote:
>>>>>> On Tue, Mar 09, 2021 at 09:41:27AM -0500, Olga Kornievskaia
>>>>>> wrote:
>>>>>>> From: Olga Kornievskaia <kolga@xxxxxxxxxx>
>>>>>>> 
>>>>>>> When the server tries to do a callback and a client fails it
>>>>>>> due to
>>>>>>> authentication problems, we need the server to set callback
>>>>>>> down
>>>>>>> flag in RENEW so that client can recover.
>>>>>> 
>>>>>> I was looking at this.  It looks to me like this should really be
>>>>>> just:
>>>>>> 
>>>>>>        case 1:
>>>>>>                if (task->tk_status)
>>>>>>                        nfsd4_mark_cb_down(clp, task->tk_status);
>>>>>> 
>>>>>> If tk_status showed an error, and the ->done method doesn't
>>>>>> return 0
>>>>>> to
>>>>>> tell us it something worth retrying, then the callback failed
>>>>>> permanently, so we should mark the callback path down, regardless
>>>>>> of
>>>>>> the
>>>>>> exact error.
>>>>> 
>>>>> I disagree. task->tk_status could be an unhandled NFSv4 error (see
>>>>> nfsd4_cb_recall_done()). The client might, for instance, be in the
>>>>> process of returning the delegation being recalled. Why should that
>>>>> result in the callback channel being marked as down?
>>>>> 
>>>> 
>>>> Are you talking about say the connection going down and server should
>>>> just reconnect instead of recovering the callback channel. I assumed
>>>> that connection break is something that's not  recoverable by the
>>>> callback but perhaps I'm wrong.
>>> 
>>> No. I'm saying that nfsd4_cb_recall_done() will return a value of '1'
>>> for both task->tk_status == -EBADHANDLE and -NFS4ERR_BAD_STATEID. I'm
>>> not seeing why either of those errors should be handled by marking the
>>> callback channel as being down.
>>> 
>>> Looking further, it seems that the same function will also return '1'
>>> without checking the value of task->tk_status if the delegation has
>>> been revoked or returned. So that would mean that even NFS4ERR_DELAY
>>> could trigger the call to nfsd4_mark_cb_down() with the above change.
>> 
>> Yeah, OK, that's wrong, apologies.
>> 
>> I'm just a little worried about the attempt to enumerate transport level
>> errors in nfsd4_cb_done().  Are we sure that EIO, ETIMEDOUT, EACCESS is
>> the right list?
> 
> Looking at call_transmit_status error handling, I don't think
> connection errors are returned. Instead the code tries to fix the
> connection by retrying unless the rpc_timeout is reached and then only
> EIO,TIMEDOUT is returned.
> 
> Can then my original patch be considered without resubmission?

Bruce has authorized v1 of this patch, but that one has the
uncorrected patch description. Post a v4?

>> --b.
>> 
>>> 
>>>> 
>>>>>> 
>>>>>> --b.
>>>>>> 
>>>>>>> 
>>>>>>> Signed-off-by: Olga Kornievskaia <kolga@xxxxxxxxxx>
>>>>>>> ---
>>>>>>> fs/nfsd/nfs4callback.c | 1 +
>>>>>>> 1 file changed, 1 insertion(+)
>>>>>>> 
>>>>>>> diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
>>>>>>> index 052be5bf9ef5..7325592b456e 100644
>>>>>>> --- a/fs/nfsd/nfs4callback.c
>>>>>>> +++ b/fs/nfsd/nfs4callback.c
>>>>>>> @@ -1189,6 +1189,7 @@ static void nfsd4_cb_done(struct rpc_task
>>>>>>> *task, void *calldata)
>>>>>>>                switch (task->tk_status) {
>>>>>>>                case -EIO:
>>>>>>>                case -ETIMEDOUT:
>>>>>>> +               case -EACCES:
>>>>>>>                        nfsd4_mark_cb_down(clp, task-
>>>>>>>> tk_status);
>>>>>>>                }
>>>>>>>                break;
>>>>>>> --
>>>>>>> 2.27.0
>>>>>>> 
>>>>>> 
>>>>> 
>>>>> --
>>>>> Trond Myklebust
>>>>> Linux NFS client maintainer, Hammerspace
>>>>> trond.myklebust@xxxxxxxxxxxxxxx
>>>>> 
>>>>> 
>>> 
>>> --
>>> Trond Myklebust
>>> Linux NFS client maintainer, Hammerspace
>>> trond.myklebust@xxxxxxxxxxxxxxx

--
Chuck Lever