Re: [PATCH v3 26/44] SUNRPC: Improve latency for interactive tasks

Chuck Lever <chuck.lever@xxxxxxxxxx> · Wed, 2 Jan 2019 14:08:01 -0500

> On Jan 2, 2019, at 1:57 PM, Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> wrote:
> 
> On Wed, 2019-01-02 at 13:51 -0500, Chuck Lever wrote:
>>> On Jan 2, 2019, at 1:45 PM, Trond Myklebust <
>>> trondmy@xxxxxxxxxxxxxxx> wrote:
>>> 
>>> On Wed, 2019-01-02 at 13:17 -0500, Chuck Lever wrote:
>>>>> On Dec 31, 2018, at 2:21 PM, Trond Myklebust <
>>>>> trondmy@xxxxxxxxxxxxxxx> wrote:
>>>>> 
>>>>> On Mon, 2018-12-31 at 19:18 +0000, Trond Myklebust wrote:
>>>>>> On Mon, 2018-12-31 at 14:09 -0500, Chuck Lever wrote:
>>>>>>>> On Dec 31, 2018, at 1:59 PM, Trond Myklebust <
>>>>>>>> trondmy@xxxxxxxxxxxxxxx> wrote:
>>>>>>>> 
>>>>>>>> 
>>>>>> The test for rpcauth_xmit_need_reencode() happens when we
>>>>>> call
>>>>>> xprt_request_transmit() to actually put the RPC call on the
>>>>>> wire.
>>>>>> The
>>>>>> enqueue order should not be able to defeat that test.
>>>>>> 
>>>>>> Hmm... Is it perhaps the test for req->rq_bytes_sent that is
>>>>>> failing
>>>>>> because this is a retransmission after a disconnect/reconnect
>>>>>> that
>>>>>> didn't trigger a re-encode?
>>>>> 
>>>>> Actually, it might be worth a try to move the test for
>>>>> rpcauth_xmit_need_reencode() outside the enclosing test for
>>>>> req-
>>>>>> rq_bytes_sent as that is just a minor optimisation.
>>>> 
>>>> Perhaps that's the case for TCP, but RPCs sent via xprtrdma never
>>>> set
>>>> req->rq_bytes_sent to a non-zero value. The body of the "if"
>>>> statement
>>>> is always executed for those RPCs.
>>>> 
>>> 
>>> Then the question is what is defeating the call to
>>> rpcauth_xmit_need_reencode() in xprt_request_transmit() and causing
>>> it
>>> not to trigger in the misordered cases?
>> 
>> Here's a sample RPC/RDMA case.
>> 
>> My instrumented server reports this:
>> 
>> Jan  2 13:29:00 klimt kernel: gss_check_seq_num: dropped:
>> seq_num=141220 sd->sd_max=141360
>> 
>> 
>> ftrace log on the client shows this:
>> 
>>  kworker/u28:12-2191  [004]   194.048534: rpcgss_need_reencode:
>> task:1761@5 xid=0x88f4f47c rq_seqno=141220 seq_xmit=141336 reencode
>> unneeded
>>  kworker/u28:12-2191  [004]   194.048534:
>> xprt_transmit:        task:1761@5 xid=0x88f4f47c seqno=141220
>> status=-57
>>  kworker/u28:12-2191  [004]   194.048534:
>> rpc_task_run_action:  task:1779@5 flags=ASYNC
>> runstate=RUNNING|ACTIVE|NEED_XMIT|NEED_RECV status=-57
>> action=call_transmit_status
>>  kworker/u28:12-2191  [004]   194.048535:
>> rpc_task_run_action:  task:1779@5 flags=ASYNC
>> runstate=RUNNING|ACTIVE|NEED_XMIT|NEED_RECV status=0
>> action=call_transmit
>>  kworker/u28:12-2191  [004]   194.048535:
>> rpc_task_sleep:       task:1779@5 flags=ASYNC
>> runstate=RUNNING|ACTIVE|NEED_XMIT|NEED_RECV status=-11 timeout=0
>> queue=xprt_sending
>> 
>> 
>>  kworker/u28:12-2191  [004]   194.048552: rpcgss_need_reencode:
>> task:1761@5 xid=0x88f4f47c rq_seqno=141220 seq_xmit=141336 reencode
>> unneeded
>>  kworker/u28:12-2191  [004]   194.048557:
>> xprt_transmit:        task:1761@5 xid=0x88f4f47c seqno=141220
>> status=0
>> 
>> 
>>  kworker/u28:12-2191  [004]   194.048559: rpcgss_need_reencode:
>> task:1902@5 xid=0x14f5f47c rq_seqno=141360 seq_xmit=141336 reencode
>> unneeded
>>  kworker/u28:12-2191  [004]   194.048563:
>> xprt_transmit:        task:1902@5 xid=0x14f5f47c seqno=141360
>> status=0
>> 
>> 
>> Note that first need_reencode: the sequence numbers show that the
>> xmit
>> queue has been significantly re-ordered. The request being
>> transmitted is
>> already very close to the lower end of the GSS sequence number
>> window.
>> 
>> The server then re-ordereds these two slightly because the first one
>> had
>> some Read chunks that need to be pulled over, the second was pure
>> inline
>> and therefore could be processed immediately. That is enough to force
>> the
>> first one outside the GSS sequence number window.
>> 
>> I haven't looked closely at the pathology of the TCP case.
> 
> Wait a minute... That's not OK. The client can't be expected to take
> into account reordering that happens on the server side.

Conversely, the client can't assume the transport and the server don't
re-order. This does not appear to be a problem for the v4.19 client:
I don't see disconnect storms with that client.

--
Chuck Lever