Re: reuse of slot and seq# when RPC was interrupted

Olga Kornievskaia <aglo@xxxxxxxxx> · Fri, 23 Sep 2016 15:27:45 -0400

On Fri, Sep 23, 2016 at 3:07 PM, Trond Myklebust
<trondmy@xxxxxxxxxxxxxxx> wrote:
>
>> On Sep 23, 2016, at 14:41, Olga Kornievskaia <aglo@xxxxxxxxx> wrote:
>>
>> On Fri, Sep 23, 2016 at 2:34 PM, Trond Myklebust
>> <trondmy@xxxxxxxxxxxxxxx> wrote:
>>>
>>>> On Sep 23, 2016, at 14:25, Olga Kornievskaia <aglo@xxxxxxxxx> wrote:
>>>>
>>>> On Fri, Sep 23, 2016 at 2:08 PM, Trond Myklebust
>>>> <trondmy@xxxxxxxxxxxxxxx> wrote:
>>>>>
>>>>>> On Sep 23, 2016, at 13:59, Olga Kornievskaia <aglo@xxxxxxxxx> wrote:
>>>>>>
>>>>>> On Fri, Sep 23, 2016 at 1:45 PM, Trond Myklebust
>>>>>> <trondmy@xxxxxxxxxxxxxxx> wrote:
>>>>>>>
>>>>>>>> On Sep 23, 2016, at 13:40, Olga Kornievskaia <aglo@xxxxxxxxx> wrote:
>>>>>>>>
>>>>>>>> If we instead bump the sequence number in the case of interrupted and do:
>>>>>>>
>>>>>>> You have no guarantees that the server has seen and processed the operation.
>>>>>>
>>>>>> That is correct, i have tested the patch and made server never to
>>>>>> receive the operation and client have an interrupted slot. On the next
>>>>>> operation the server will complain back with SEQ_MISORDERED. Client
>>>>>> can recover from this operation. Client can not recover from "Remote
>>>>>> EIO”.
>>>>>>
>>>>>
>>>>> Why not?
>>>>
>>>> When XDR layer returns EREMOTEIO it's not handled by the NFS error
>>>> recovery (are you suggesting we should?)  and returns that to the
>>>> application.
>>>>
>>>
>>> I’m saying that if we get a SEQ_MISORDERED due to a previous interrupt on that slot, then we should ignore the error in task->tk_status, and just retry after bumping the slot seqid.
>>>
>>
>> I'm confused where your objection lies. Are you ok with bumping the
>> sequence # when task->tk_status = 1 and saying that we should still
>> keep the code that I deleted in the 2nd chunk of the patch that bumped
>> the seqid on getting SEQ_MISORDERED due to a previously interrupted
>> slot?
>> Wouldn't that create a difference of 2 slots for the server that has
>> received the original request?
>>
>
> I’m saying I’d prefer to keep the current code, but fix the retry that is apparently broken. If we’re not ignoring the task->tk_error when we decide to retry, then that’s a bug in my opinion.

I'm not understand what you are suggestion. I do better with example
so allow me:

REMOVE used slot 0 seq=00000036 received ctrl-c
nfs41_sequence_done() gets called task->tk_status = 1:
slot->interrupted is set to 1. slot is freed.

next operation comes in, in my case it's ACCESS. initialization of the
sequence uses slot 0 seq=00000036
server replies with REMOVE

client code xdr in decode_op_hrs() returns EREMOTEIO. decode_access()
returns EREMOTEIO. handle error just returns that error.

where do we retry?
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html