Re: reuse of slot and seq# when RPC was interrupted

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Sep 23, 2016 at 4:34 PM, Trond Myklebust
<trondmy@xxxxxxxxxxxxxxx> wrote:
>
>> On Sep 23, 2016, at 16:25, Olga Kornievskaia <aglo@xxxxxxxxx> wrote:
>>
>> On Fri, Sep 23, 2016 at 4:07 PM, Olga Kornievskaia <aglo@xxxxxxxxx> wrote:
>>> On Fri, Sep 23, 2016 at 3:57 PM, Trond Myklebust
>>> <trondmy@xxxxxxxxxxxxxxx> wrote:
>>>>
>>>>> On Sep 23, 2016, at 15:27, Olga Kornievskaia <aglo@xxxxxxxxx> wrote:
>>>>>
>>>>> On Fri, Sep 23, 2016 at 3:07 PM, Trond Myklebust
>>>>> <trondmy@xxxxxxxxxxxxxxx> wrote:
>>>>>>
>>>>>>> On Sep 23, 2016, at 14:41, Olga Kornievskaia <aglo@xxxxxxxxx> wrote:
>>>>>>>
>>>>>>> On Fri, Sep 23, 2016 at 2:34 PM, Trond Myklebust
>>>>>>> <trondmy@xxxxxxxxxxxxxxx> wrote:
>>>>>>>>
>>>>>>>>> On Sep 23, 2016, at 14:25, Olga Kornievskaia <aglo@xxxxxxxxx> wrote:
>>>>>>>>>
>>>>>>>>> On Fri, Sep 23, 2016 at 2:08 PM, Trond Myklebust
>>>>>>>>> <trondmy@xxxxxxxxxxxxxxx> wrote:
>>>>>>>>>>
>>>>>>>>>>> On Sep 23, 2016, at 13:59, Olga Kornievskaia <aglo@xxxxxxxxx> wrote:
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Sep 23, 2016 at 1:45 PM, Trond Myklebust
>>>>>>>>>>> <trondmy@xxxxxxxxxxxxxxx> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> On Sep 23, 2016, at 13:40, Olga Kornievskaia <aglo@xxxxxxxxx> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> If we instead bump the sequence number in the case of interrupted and do:
>>>>>>>>>>>>
>>>>>>>>>>>> You have no guarantees that the server has seen and processed the operation.
>>>>>>>>>>>
>>>>>>>>>>> That is correct, i have tested the patch and made server never to
>>>>>>>>>>> receive the operation and client have an interrupted slot. On the next
>>>>>>>>>>> operation the server will complain back with SEQ_MISORDERED. Client
>>>>>>>>>>> can recover from this operation. Client can not recover from "Remote
>>>>>>>>>>> EIO”.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Why not?
>>>>>>>>>
>>>>>>>>> When XDR layer returns EREMOTEIO it's not handled by the NFS error
>>>>>>>>> recovery (are you suggesting we should?)  and returns that to the
>>>>>>>>> application.
>>>>>>>>>
>>>>>>>>
>>>>>>>> I’m saying that if we get a SEQ_MISORDERED due to a previous interrupt on that slot, then we should ignore the error in task->tk_status, and just retry after bumping the slot seqid.
>>>>>>>>
>>>>>>>
>>>>>>> I'm confused where your objection lies. Are you ok with bumping the
>>>>>>> sequence # when task->tk_status = 1 and saying that we should still
>>>>>>> keep the code that I deleted in the 2nd chunk of the patch that bumped
>>>>>>> the seqid on getting SEQ_MISORDERED due to a previously interrupted
>>>>>>> slot?
>>>>>>> Wouldn't that create a difference of 2 slots for the server that has
>>>>>>> received the original request?
>>>>>>>
>>>>>>
>>>>>> I’m saying I’d prefer to keep the current code, but fix the retry that is apparently broken. If we’re not ignoring the task->tk_error when we decide to retry, then that’s a bug in my opinion.
>>>>>
>>>>> I'm not understand what you are suggestion. I do better with example
>>>>> so allow me:
>>>>>
>>>>> REMOVE used slot 0 seq=00000036 received ctrl-c
>>>>> nfs41_sequence_done() gets called task->tk_status = 1:
>>>>> slot->interrupted is set to 1. slot is freed.
>>>>>
>>>>> next operation comes in, in my case it's ACCESS. initialization of the
>>>>> sequence uses slot 0 seq=00000036
>>>>> server replies with REMOVE
>>>>>
>>>>> client code xdr in decode_op_hrs() returns EREMOTEIO. decode_access()
>>>>> returns EREMOTEIO. handle error just returns that error.
>>>>>
>>>>> where do we retry?
>>>>>
>>>>
>>>> The retry should be happening when we exit from nfs41_sequence_done() by restarting the RPC.
>>>
>>> Are you suggestion that REMOVE is retried? Ok I can see that (though
>>> I'm not sure why a killed task suppose to be retried. Wasn't it killed
>>> for a reason?). But if you are saying ACCESS should be retried then I
>>> don't see how it can fit into the code flow.
>>
>> I'm still hung up on the fact your suggestion of "retry". There is no
>> retry. You wrote "if we get a SEQ_MISORDERED" we never get
>> "SEQ_MISORDERED".
>>
>> I can see if you want to add to error_handling that we check if error
>> is EREMOTEIO and check that slot->interrupted is set, then we try?
>>
>
> Yes, that’s what I’d hope to see.

Ok I understand now and would try that.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux