> On Sep 23, 2016, at 15:27, Olga Kornievskaia <aglo@xxxxxxxxx> wrote: > > On Fri, Sep 23, 2016 at 3:07 PM, Trond Myklebust > <trondmy@xxxxxxxxxxxxxxx> wrote: >> >>> On Sep 23, 2016, at 14:41, Olga Kornievskaia <aglo@xxxxxxxxx> wrote: >>> >>> On Fri, Sep 23, 2016 at 2:34 PM, Trond Myklebust >>> <trondmy@xxxxxxxxxxxxxxx> wrote: >>>> >>>>> On Sep 23, 2016, at 14:25, Olga Kornievskaia <aglo@xxxxxxxxx> wrote: >>>>> >>>>> On Fri, Sep 23, 2016 at 2:08 PM, Trond Myklebust >>>>> <trondmy@xxxxxxxxxxxxxxx> wrote: >>>>>> >>>>>>> On Sep 23, 2016, at 13:59, Olga Kornievskaia <aglo@xxxxxxxxx> wrote: >>>>>>> >>>>>>> On Fri, Sep 23, 2016 at 1:45 PM, Trond Myklebust >>>>>>> <trondmy@xxxxxxxxxxxxxxx> wrote: >>>>>>>> >>>>>>>>> On Sep 23, 2016, at 13:40, Olga Kornievskaia <aglo@xxxxxxxxx> wrote: >>>>>>>>> >>>>>>>>> If we instead bump the sequence number in the case of interrupted and do: >>>>>>>> >>>>>>>> You have no guarantees that the server has seen and processed the operation. >>>>>>> >>>>>>> That is correct, i have tested the patch and made server never to >>>>>>> receive the operation and client have an interrupted slot. On the next >>>>>>> operation the server will complain back with SEQ_MISORDERED. Client >>>>>>> can recover from this operation. Client can not recover from "Remote >>>>>>> EIO”. >>>>>>> >>>>>> >>>>>> Why not? >>>>> >>>>> When XDR layer returns EREMOTEIO it's not handled by the NFS error >>>>> recovery (are you suggesting we should?) and returns that to the >>>>> application. >>>>> >>>> >>>> I’m saying that if we get a SEQ_MISORDERED due to a previous interrupt on that slot, then we should ignore the error in task->tk_status, and just retry after bumping the slot seqid. >>>> >>> >>> I'm confused where your objection lies. Are you ok with bumping the >>> sequence # when task->tk_status = 1 and saying that we should still >>> keep the code that I deleted in the 2nd chunk of the patch that bumped >>> the seqid on getting SEQ_MISORDERED due to a previously interrupted >>> slot? >>> Wouldn't that create a difference of 2 slots for the server that has >>> received the original request? >>> >> >> I’m saying I’d prefer to keep the current code, but fix the retry that is apparently broken. If we’re not ignoring the task->tk_error when we decide to retry, then that’s a bug in my opinion. > > I'm not understand what you are suggestion. I do better with example > so allow me: > > REMOVE used slot 0 seq=00000036 received ctrl-c > nfs41_sequence_done() gets called task->tk_status = 1: > slot->interrupted is set to 1. slot is freed. > > next operation comes in, in my case it's ACCESS. initialization of the > sequence uses slot 0 seq=00000036 > server replies with REMOVE > > client code xdr in decode_op_hrs() returns EREMOTEIO. decode_access() > returns EREMOTEIO. handle error just returns that error. > > where do we retry? > The retry should be happening when we exit from nfs41_sequence_done() by restarting the RPC.��.n��������+%������w��{.n�����{��w���jg��������ݢj����G�������j:+v���w�m������w�������h�����٥