> On Jan 2, 2019, at 2:24 PM, Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> wrote: > > On Wed, 2019-01-02 at 14:06 -0500, Trond Myklebust wrote: >> On Wed, 2019-01-02 at 13:57 -0500, Trond Myklebust wrote: >>> On Wed, 2019-01-02 at 13:51 -0500, Chuck Lever wrote: >>>>> On Jan 2, 2019, at 1:45 PM, Trond Myklebust < >>>>> trondmy@xxxxxxxxxxxxxxx> wrote: >>>>> >>>>> On Wed, 2019-01-02 at 13:17 -0500, Chuck Lever wrote: >>>>>>> On Dec 31, 2018, at 2:21 PM, Trond Myklebust < >>>>>>> trondmy@xxxxxxxxxxxxxxx> wrote: >>>>>>> >>>>>>> On Mon, 2018-12-31 at 19:18 +0000, Trond Myklebust wrote: >>>>>>>> On Mon, 2018-12-31 at 14:09 -0500, Chuck Lever wrote: >>>>>>>>>> On Dec 31, 2018, at 1:59 PM, Trond Myklebust < >>>>>>>>>> trondmy@xxxxxxxxxxxxxxx> wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>> The test for rpcauth_xmit_need_reencode() happens when we >>>>>>>> call >>>>>>>> xprt_request_transmit() to actually put the RPC call on >>>>>>>> the >>>>>>>> wire. >>>>>>>> The >>>>>>>> enqueue order should not be able to defeat that test. >>>>>>>> >>>>>>>> Hmm... Is it perhaps the test for req->rq_bytes_sent that >>>>>>>> is >>>>>>>> failing >>>>>>>> because this is a retransmission after a >>>>>>>> disconnect/reconnect >>>>>>>> that >>>>>>>> didn't trigger a re-encode? >>>>>>> >>>>>>> Actually, it might be worth a try to move the test for >>>>>>> rpcauth_xmit_need_reencode() outside the enclosing test for >>>>>>> req- >>>>>>>> rq_bytes_sent as that is just a minor optimisation. >>>>>> >>>>>> Perhaps that's the case for TCP, but RPCs sent via xprtrdma >>>>>> never >>>>>> set >>>>>> req->rq_bytes_sent to a non-zero value. The body of the "if" >>>>>> statement >>>>>> is always executed for those RPCs. >>>>>> >>>>> >>>>> Then the question is what is defeating the call to >>>>> rpcauth_xmit_need_reencode() in xprt_request_transmit() and >>>>> causing >>>>> it >>>>> not to trigger in the misordered cases? >>>> >>>> Here's a sample RPC/RDMA case. >>>> >>>> My instrumented server reports this: >>>> >>>> Jan 2 13:29:00 klimt kernel: gss_check_seq_num: dropped: >>>> seq_num=141220 sd->sd_max=141360 >>>> >>>> >>>> ftrace log on the client shows this: >>>> >>>> kworker/u28:12-2191 [004] 194.048534: rpcgss_need_reencode: >>>> task:1761@5 xid=0x88f4f47c rq_seqno=141220 seq_xmit=141336 >>>> reencode >>>> unneeded >>>> kworker/u28:12-2191 [004] 194.048534: >>>> xprt_transmit: task:1761@5 xid=0x88f4f47c seqno=141220 >>>> status=-57 >>>> kworker/u28:12-2191 [004] 194.048534: >>>> rpc_task_run_action: task:1779@5 flags=ASYNC >>>> runstate=RUNNING|ACTIVE|NEED_XMIT|NEED_RECV status=-57 >>>> action=call_transmit_status >>>> kworker/u28:12-2191 [004] 194.048535: >>>> rpc_task_run_action: task:1779@5 flags=ASYNC >>>> runstate=RUNNING|ACTIVE|NEED_XMIT|NEED_RECV status=0 >>>> action=call_transmit >>>> kworker/u28:12-2191 [004] 194.048535: >>>> rpc_task_sleep: task:1779@5 flags=ASYNC >>>> runstate=RUNNING|ACTIVE|NEED_XMIT|NEED_RECV status=-11 timeout=0 >>>> queue=xprt_sending >>>> >>>> >>>> kworker/u28:12-2191 [004] 194.048552: rpcgss_need_reencode: >>>> task:1761@5 xid=0x88f4f47c rq_seqno=141220 seq_xmit=141336 >>>> reencode >>>> unneeded >>>> kworker/u28:12-2191 [004] 194.048557: >>>> xprt_transmit: task:1761@5 xid=0x88f4f47c seqno=141220 >>>> status=0 >>>> >>>> >>>> kworker/u28:12-2191 [004] 194.048559: rpcgss_need_reencode: >>>> task:1902@5 xid=0x14f5f47c rq_seqno=141360 seq_xmit=141336 >>>> reencode >>>> unneeded >>>> kworker/u28:12-2191 [004] 194.048563: >>>> xprt_transmit: task:1902@5 xid=0x14f5f47c seqno=141360 >>>> status=0 >>>> >>>> >>>> Note that first need_reencode: the sequence numbers show that the >>>> xmit >>>> queue has been significantly re-ordered. The request being >>>> transmitted is >>>> already very close to the lower end of the GSS sequence number >>>> window. >>>> >>>> The server then re-ordereds these two slightly because the first >>>> one >>>> had >>>> some Read chunks that need to be pulled over, the second was pure >>>> inline >>>> and therefore could be processed immediately. That is enough to >>>> force >>>> the >>>> first one outside the GSS sequence number window. >>>> >>>> I haven't looked closely at the pathology of the TCP case. >>> >>> Wait a minute... That's not OK. The client can't be expected to >>> take >>> into account reordering that happens on the server side. >> >> If that's the case, then we would need to halt transmission as soon >> as >> we hit the RPCSEC_GSS window edge. Off the cuff, I'm not sure how to >> do >> that, since those windows are per session (i.e. per user). > > So here is something we probably could do: modify > xprt_request_enqueue_transmit() to order the list in req->rq_xmit2 by > req->rq_seqno. Why not add " && !req->rq_seq_no " to the third arm? Calls are already enqueued in sequence number order. > Since task->tk_owner is actually a pid, then that's not > a perfect solution, but we could further mitigate by modifying > gss_xmit_need_reencode() to only allow transmission of requests that > are within 2/3 of the window. > -- > Trond Myklebust > Linux NFS client maintainer, Hammerspace > trond.myklebust@xxxxxxxxxxxxxxx -- Chuck Lever