On Fri, Sep 23, 2016 at 4:34 PM, Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> wrote: > >> On Sep 23, 2016, at 16:25, Olga Kornievskaia <aglo@xxxxxxxxx> wrote: >> >> On Fri, Sep 23, 2016 at 4:07 PM, Olga Kornievskaia <aglo@xxxxxxxxx> wrote: >>> On Fri, Sep 23, 2016 at 3:57 PM, Trond Myklebust >>> <trondmy@xxxxxxxxxxxxxxx> wrote: >>>> >>>>> On Sep 23, 2016, at 15:27, Olga Kornievskaia <aglo@xxxxxxxxx> wrote: >>>>> >>>>> On Fri, Sep 23, 2016 at 3:07 PM, Trond Myklebust >>>>> <trondmy@xxxxxxxxxxxxxxx> wrote: >>>>>> >>>>>>> On Sep 23, 2016, at 14:41, Olga Kornievskaia <aglo@xxxxxxxxx> wrote: >>>>>>> >>>>>>> On Fri, Sep 23, 2016 at 2:34 PM, Trond Myklebust >>>>>>> <trondmy@xxxxxxxxxxxxxxx> wrote: >>>>>>>> >>>>>>>>> On Sep 23, 2016, at 14:25, Olga Kornievskaia <aglo@xxxxxxxxx> wrote: >>>>>>>>> >>>>>>>>> On Fri, Sep 23, 2016 at 2:08 PM, Trond Myklebust >>>>>>>>> <trondmy@xxxxxxxxxxxxxxx> wrote: >>>>>>>>>> >>>>>>>>>>> On Sep 23, 2016, at 13:59, Olga Kornievskaia <aglo@xxxxxxxxx> wrote: >>>>>>>>>>> >>>>>>>>>>> On Fri, Sep 23, 2016 at 1:45 PM, Trond Myklebust >>>>>>>>>>> <trondmy@xxxxxxxxxxxxxxx> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> On Sep 23, 2016, at 13:40, Olga Kornievskaia <aglo@xxxxxxxxx> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> If we instead bump the sequence number in the case of interrupted and do: >>>>>>>>>>>> >>>>>>>>>>>> You have no guarantees that the server has seen and processed the operation. >>>>>>>>>>> >>>>>>>>>>> That is correct, i have tested the patch and made server never to >>>>>>>>>>> receive the operation and client have an interrupted slot. On the next >>>>>>>>>>> operation the server will complain back with SEQ_MISORDERED. Client >>>>>>>>>>> can recover from this operation. Client can not recover from "Remote >>>>>>>>>>> EIO”. >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Why not? >>>>>>>>> >>>>>>>>> When XDR layer returns EREMOTEIO it's not handled by the NFS error >>>>>>>>> recovery (are you suggesting we should?) and returns that to the >>>>>>>>> application. >>>>>>>>> >>>>>>>> >>>>>>>> I’m saying that if we get a SEQ_MISORDERED due to a previous interrupt on that slot, then we should ignore the error in task->tk_status, and just retry after bumping the slot seqid. >>>>>>>> >>>>>>> >>>>>>> I'm confused where your objection lies. Are you ok with bumping the >>>>>>> sequence # when task->tk_status = 1 and saying that we should still >>>>>>> keep the code that I deleted in the 2nd chunk of the patch that bumped >>>>>>> the seqid on getting SEQ_MISORDERED due to a previously interrupted >>>>>>> slot? >>>>>>> Wouldn't that create a difference of 2 slots for the server that has >>>>>>> received the original request? >>>>>>> >>>>>> >>>>>> I’m saying I’d prefer to keep the current code, but fix the retry that is apparently broken. If we’re not ignoring the task->tk_error when we decide to retry, then that’s a bug in my opinion. >>>>> >>>>> I'm not understand what you are suggestion. I do better with example >>>>> so allow me: >>>>> >>>>> REMOVE used slot 0 seq=00000036 received ctrl-c >>>>> nfs41_sequence_done() gets called task->tk_status = 1: >>>>> slot->interrupted is set to 1. slot is freed. >>>>> >>>>> next operation comes in, in my case it's ACCESS. initialization of the >>>>> sequence uses slot 0 seq=00000036 >>>>> server replies with REMOVE >>>>> >>>>> client code xdr in decode_op_hrs() returns EREMOTEIO. decode_access() >>>>> returns EREMOTEIO. handle error just returns that error. >>>>> >>>>> where do we retry? >>>>> >>>> >>>> The retry should be happening when we exit from nfs41_sequence_done() by restarting the RPC. >>> >>> Are you suggestion that REMOVE is retried? Ok I can see that (though >>> I'm not sure why a killed task suppose to be retried. Wasn't it killed >>> for a reason?). But if you are saying ACCESS should be retried then I >>> don't see how it can fit into the code flow. >> >> I'm still hung up on the fact your suggestion of "retry". There is no >> retry. You wrote "if we get a SEQ_MISORDERED" we never get >> "SEQ_MISORDERED". >> >> I can see if you want to add to error_handling that we check if error >> is EREMOTEIO and check that slot->interrupted is set, then we try? >> > > Yes, that’s what I’d hope to see. Ok I understand now and would try that. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html