Re: handling error on RECLAIM_COMPLETE

Olga Kornievskaia <aglo@xxxxxxxxx> · Tue, 24 Jan 2017 15:48:17 -0500



On Tue, Jan 24, 2017 at 3:35 PM, Trond Myklebust
<trondmy@xxxxxxxxxxxxxxx> wrote:
>
>> On Jan 24, 2017, at 14:50, Olga Kornievskaia <aglo@xxxxxxxxx> wrote:
>>
>> On Tue, Jan 24, 2017 at 2:44 PM, Trond Myklebust
>> <trondmy@xxxxxxxxxxxxxxx> wrote:
>>>
>>>> On Jan 24, 2017, at 14:40, Olga Kornievskaia <aglo@xxxxxxxxx> wrote:
>>>>
>>>> On Tue, Jan 24, 2017 at 2:12 PM, Trond Myklebust
>>>> <trondmy@xxxxxxxxxxxxxxx> wrote:
>>>>>
>>>>>> On Jan 24, 2017, at 14:06, Olga Kornievskaia <aglo@xxxxxxxxx> wrote:
>>>>>>
>>>>>> Hi Trond,
>>>>>>
>>>>>> Is there a reason that nfs4_proc_reclaim_complete() isn't  wrapped
>>>>>> with a do while() to handle errors?
>>>>>
>>>>> What do we not already handle correctly in nfs4_reclaim_complete_done()?
>>>>
>>>> Could this be because when an error occurs rpc_done isn't called
>>>> (rpc_release is called)? What I see is that if RECLAIM_COMPLETE gets
>>>> an error (BAD_SESSION) the client just ignores it.
>>>>
>>>
>>> That’s correct. Why do we need to handle BAD_SESSION there? We’re done with state recovery, so if the server rebooted, we can catch that later.
>>
>> (1) don't we want to handle session errors as soon as possible?
>> (2) I ran into a problem (not sure yet if reproducible) where I had a
>> client stuck in an infinite loop of RECLAIM_COMPLETE being sent with
>> reply of BAD_SESSION.
>>
>> yes I don't know why the client is looping but it made me look into
>> the fact that we are not handling session errors on reclaim complete
>> which I simulated by having the server return BAD_SESSION to
>> RECLAIM_COMPLETE and I see that client simply ignores it.
>
> It doesn’t ignore it:
>
> static int nfs41_reclaim_complete_handle_errors(struct rpc_task *task, struct nfs_client *clp)
> {
>         switch(task->tk_status) {
>         case 0:
>         case -NFS4ERR_COMPLETE_ALREADY:
>         case -NFS4ERR_WRONG_CRED: /* What to do here? */
>                 break;
>         case -NFS4ERR_DELAY:
>                 rpc_delay(task, NFS4_POLL_RETRY_MAX);
>                 /* fall through */
>         case -NFS4ERR_RETRY_UNCACHED_REP:
>                 return -EAGAIN;
>         default:
>                 nfs4_schedule_lease_recovery(clp);
>         }
>         return 0;
> }
>
> IOW: what the code does is schedule another round of lease recovery.

We already agreed that this doesn't get called because rpc_call_done
isn't called on the error.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html