Re: [PATCH v3 26/44] SUNRPC: Improve latency for interactive tasks

Chuck Lever <chuck.lever@xxxxxxxxxx> · Wed, 2 Jan 2019 14:33:37 -0500

> On Jan 2, 2019, at 2:24 PM, Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> wrote:
> 
> On Wed, 2019-01-02 at 14:06 -0500, Trond Myklebust wrote:
>> On Wed, 2019-01-02 at 13:57 -0500, Trond Myklebust wrote:
>>> On Wed, 2019-01-02 at 13:51 -0500, Chuck Lever wrote:
>>>>> On Jan 2, 2019, at 1:45 PM, Trond Myklebust <
>>>>> trondmy@xxxxxxxxxxxxxxx> wrote:
>>>>> 
>>>>> On Wed, 2019-01-02 at 13:17 -0500, Chuck Lever wrote:
>>>>>>> On Dec 31, 2018, at 2:21 PM, Trond Myklebust <
>>>>>>> trondmy@xxxxxxxxxxxxxxx> wrote:
>>>>>>> 
>>>>>>> On Mon, 2018-12-31 at 19:18 +0000, Trond Myklebust wrote:
>>>>>>>> On Mon, 2018-12-31 at 14:09 -0500, Chuck Lever wrote:
>>>>>>>>>> On Dec 31, 2018, at 1:59 PM, Trond Myklebust <
>>>>>>>>>> trondmy@xxxxxxxxxxxxxxx> wrote:
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> The test for rpcauth_xmit_need_reencode() happens when we
>>>>>>>> call
>>>>>>>> xprt_request_transmit() to actually put the RPC call on
>>>>>>>> the
>>>>>>>> wire.
>>>>>>>> The
>>>>>>>> enqueue order should not be able to defeat that test.
>>>>>>>> 
>>>>>>>> Hmm... Is it perhaps the test for req->rq_bytes_sent that
>>>>>>>> is
>>>>>>>> failing
>>>>>>>> because this is a retransmission after a
>>>>>>>> disconnect/reconnect
>>>>>>>> that
>>>>>>>> didn't trigger a re-encode?
>>>>>>> 
>>>>>>> Actually, it might be worth a try to move the test for
>>>>>>> rpcauth_xmit_need_reencode() outside the enclosing test for
>>>>>>> req-
>>>>>>>> rq_bytes_sent as that is just a minor optimisation.
>>>>>> 
>>>>>> Perhaps that's the case for TCP, but RPCs sent via xprtrdma
>>>>>> never
>>>>>> set
>>>>>> req->rq_bytes_sent to a non-zero value. The body of the "if"
>>>>>> statement
>>>>>> is always executed for those RPCs.
>>>>>> 
>>>>> 
>>>>> Then the question is what is defeating the call to
>>>>> rpcauth_xmit_need_reencode() in xprt_request_transmit() and
>>>>> causing
>>>>> it
>>>>> not to trigger in the misordered cases?
>>>> 
>>>> Here's a sample RPC/RDMA case.
>>>> 
>>>> My instrumented server reports this:
>>>> 
>>>> Jan  2 13:29:00 klimt kernel: gss_check_seq_num: dropped:
>>>> seq_num=141220 sd->sd_max=141360
>>>> 
>>>> 
>>>> ftrace log on the client shows this:
>>>> 
>>>>  kworker/u28:12-2191  [004]   194.048534: rpcgss_need_reencode:
>>>> task:1761@5 xid=0x88f4f47c rq_seqno=141220 seq_xmit=141336
>>>> reencode
>>>> unneeded
>>>>  kworker/u28:12-2191  [004]   194.048534:
>>>> xprt_transmit:        task:1761@5 xid=0x88f4f47c seqno=141220
>>>> status=-57
>>>>  kworker/u28:12-2191  [004]   194.048534:
>>>> rpc_task_run_action:  task:1779@5 flags=ASYNC
>>>> runstate=RUNNING|ACTIVE|NEED_XMIT|NEED_RECV status=-57
>>>> action=call_transmit_status
>>>>  kworker/u28:12-2191  [004]   194.048535:
>>>> rpc_task_run_action:  task:1779@5 flags=ASYNC
>>>> runstate=RUNNING|ACTIVE|NEED_XMIT|NEED_RECV status=0
>>>> action=call_transmit
>>>>  kworker/u28:12-2191  [004]   194.048535:
>>>> rpc_task_sleep:       task:1779@5 flags=ASYNC
>>>> runstate=RUNNING|ACTIVE|NEED_XMIT|NEED_RECV status=-11 timeout=0
>>>> queue=xprt_sending
>>>> 
>>>> 
>>>>  kworker/u28:12-2191  [004]   194.048552: rpcgss_need_reencode:
>>>> task:1761@5 xid=0x88f4f47c rq_seqno=141220 seq_xmit=141336
>>>> reencode
>>>> unneeded
>>>>  kworker/u28:12-2191  [004]   194.048557:
>>>> xprt_transmit:        task:1761@5 xid=0x88f4f47c seqno=141220
>>>> status=0
>>>> 
>>>> 
>>>>  kworker/u28:12-2191  [004]   194.048559: rpcgss_need_reencode:
>>>> task:1902@5 xid=0x14f5f47c rq_seqno=141360 seq_xmit=141336
>>>> reencode
>>>> unneeded
>>>>  kworker/u28:12-2191  [004]   194.048563:
>>>> xprt_transmit:        task:1902@5 xid=0x14f5f47c seqno=141360
>>>> status=0
>>>> 
>>>> 
>>>> Note that first need_reencode: the sequence numbers show that the
>>>> xmit
>>>> queue has been significantly re-ordered. The request being
>>>> transmitted is
>>>> already very close to the lower end of the GSS sequence number
>>>> window.
>>>> 
>>>> The server then re-ordereds these two slightly because the first
>>>> one
>>>> had
>>>> some Read chunks that need to be pulled over, the second was pure
>>>> inline
>>>> and therefore could be processed immediately. That is enough to
>>>> force
>>>> the
>>>> first one outside the GSS sequence number window.
>>>> 
>>>> I haven't looked closely at the pathology of the TCP case.
>>> 
>>> Wait a minute... That's not OK. The client can't be expected to
>>> take
>>> into account reordering that happens on the server side.
>> 
>> If that's the case, then we would need to halt transmission as soon
>> as
>> we hit the RPCSEC_GSS window edge. Off the cuff, I'm not sure how to
>> do
>> that, since those windows are per session (i.e. per user).
> 
> So here is something we probably could do: modify
> xprt_request_enqueue_transmit() to order the list in req->rq_xmit2 by
> req->rq_seqno.

Why not add " && !req->rq_seq_no " to the third arm? Calls are already
enqueued in sequence number order.

> Since task->tk_owner is actually a pid, then that's not
> a perfect solution, but we could further mitigate by modifying
> gss_xmit_need_reencode() to only allow transmission of requests that
> are within 2/3 of the window.
> -- 
> Trond Myklebust
> Linux NFS client maintainer, Hammerspace
> trond.myklebust@xxxxxxxxxxxxxxx

--
Chuck Lever