> On Jan 2, 2019, at 1:57 PM, Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> wrote: > > On Wed, 2019-01-02 at 13:51 -0500, Chuck Lever wrote: >>> On Jan 2, 2019, at 1:45 PM, Trond Myklebust < >>> trondmy@xxxxxxxxxxxxxxx> wrote: >>> >>> On Wed, 2019-01-02 at 13:17 -0500, Chuck Lever wrote: >>>>> On Dec 31, 2018, at 2:21 PM, Trond Myklebust < >>>>> trondmy@xxxxxxxxxxxxxxx> wrote: >>>>> >>>>> On Mon, 2018-12-31 at 19:18 +0000, Trond Myklebust wrote: >>>>>> On Mon, 2018-12-31 at 14:09 -0500, Chuck Lever wrote: >>>>>>>> On Dec 31, 2018, at 1:59 PM, Trond Myklebust < >>>>>>>> trondmy@xxxxxxxxxxxxxxx> wrote: >>>>>>>> >>>>>>>> >>>>>> The test for rpcauth_xmit_need_reencode() happens when we >>>>>> call >>>>>> xprt_request_transmit() to actually put the RPC call on the >>>>>> wire. >>>>>> The >>>>>> enqueue order should not be able to defeat that test. >>>>>> >>>>>> Hmm... Is it perhaps the test for req->rq_bytes_sent that is >>>>>> failing >>>>>> because this is a retransmission after a disconnect/reconnect >>>>>> that >>>>>> didn't trigger a re-encode? >>>>> >>>>> Actually, it might be worth a try to move the test for >>>>> rpcauth_xmit_need_reencode() outside the enclosing test for >>>>> req- >>>>>> rq_bytes_sent as that is just a minor optimisation. >>>> >>>> Perhaps that's the case for TCP, but RPCs sent via xprtrdma never >>>> set >>>> req->rq_bytes_sent to a non-zero value. The body of the "if" >>>> statement >>>> is always executed for those RPCs. >>>> >>> >>> Then the question is what is defeating the call to >>> rpcauth_xmit_need_reencode() in xprt_request_transmit() and causing >>> it >>> not to trigger in the misordered cases? >> >> Here's a sample RPC/RDMA case. >> >> My instrumented server reports this: >> >> Jan 2 13:29:00 klimt kernel: gss_check_seq_num: dropped: >> seq_num=141220 sd->sd_max=141360 >> >> >> ftrace log on the client shows this: >> >> kworker/u28:12-2191 [004] 194.048534: rpcgss_need_reencode: >> task:1761@5 xid=0x88f4f47c rq_seqno=141220 seq_xmit=141336 reencode >> unneeded >> kworker/u28:12-2191 [004] 194.048534: >> xprt_transmit: task:1761@5 xid=0x88f4f47c seqno=141220 >> status=-57 >> kworker/u28:12-2191 [004] 194.048534: >> rpc_task_run_action: task:1779@5 flags=ASYNC >> runstate=RUNNING|ACTIVE|NEED_XMIT|NEED_RECV status=-57 >> action=call_transmit_status >> kworker/u28:12-2191 [004] 194.048535: >> rpc_task_run_action: task:1779@5 flags=ASYNC >> runstate=RUNNING|ACTIVE|NEED_XMIT|NEED_RECV status=0 >> action=call_transmit >> kworker/u28:12-2191 [004] 194.048535: >> rpc_task_sleep: task:1779@5 flags=ASYNC >> runstate=RUNNING|ACTIVE|NEED_XMIT|NEED_RECV status=-11 timeout=0 >> queue=xprt_sending >> >> >> kworker/u28:12-2191 [004] 194.048552: rpcgss_need_reencode: >> task:1761@5 xid=0x88f4f47c rq_seqno=141220 seq_xmit=141336 reencode >> unneeded >> kworker/u28:12-2191 [004] 194.048557: >> xprt_transmit: task:1761@5 xid=0x88f4f47c seqno=141220 >> status=0 >> >> >> kworker/u28:12-2191 [004] 194.048559: rpcgss_need_reencode: >> task:1902@5 xid=0x14f5f47c rq_seqno=141360 seq_xmit=141336 reencode >> unneeded >> kworker/u28:12-2191 [004] 194.048563: >> xprt_transmit: task:1902@5 xid=0x14f5f47c seqno=141360 >> status=0 >> >> >> Note that first need_reencode: the sequence numbers show that the >> xmit >> queue has been significantly re-ordered. The request being >> transmitted is >> already very close to the lower end of the GSS sequence number >> window. >> >> The server then re-ordereds these two slightly because the first one >> had >> some Read chunks that need to be pulled over, the second was pure >> inline >> and therefore could be processed immediately. That is enough to force >> the >> first one outside the GSS sequence number window. >> >> I haven't looked closely at the pathology of the TCP case. > > Wait a minute... That's not OK. The client can't be expected to take > into account reordering that happens on the server side. Conversely, the client can't assume the transport and the server don't re-order. This does not appear to be a problem for the v4.19 client: I don't see disconnect storms with that client. -- Chuck Lever