Re: [PATCH v3 26/44] SUNRPC: Improve latency for interactive tasks

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> On Jan 2, 2019, at 1:45 PM, Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> wrote:
> 
> On Wed, 2019-01-02 at 13:17 -0500, Chuck Lever wrote:
>>> On Dec 31, 2018, at 2:21 PM, Trond Myklebust <
>>> trondmy@xxxxxxxxxxxxxxx> wrote:
>>> 
>>> On Mon, 2018-12-31 at 19:18 +0000, Trond Myklebust wrote:
>>>> On Mon, 2018-12-31 at 14:09 -0500, Chuck Lever wrote:
>>>>>> On Dec 31, 2018, at 1:59 PM, Trond Myklebust <
>>>>>> trondmy@xxxxxxxxxxxxxxx> wrote:
>>>>>> 
>>>>>> 
>>>> The test for rpcauth_xmit_need_reencode() happens when we call
>>>> xprt_request_transmit() to actually put the RPC call on the wire.
>>>> The
>>>> enqueue order should not be able to defeat that test.
>>>> 
>>>> Hmm... Is it perhaps the test for req->rq_bytes_sent that is
>>>> failing
>>>> because this is a retransmission after a disconnect/reconnect
>>>> that
>>>> didn't trigger a re-encode?
>>> 
>>> Actually, it might be worth a try to move the test for
>>> rpcauth_xmit_need_reencode() outside the enclosing test for req-
>>>> rq_bytes_sent as that is just a minor optimisation.
>> 
>> Perhaps that's the case for TCP, but RPCs sent via xprtrdma never set
>> req->rq_bytes_sent to a non-zero value. The body of the "if"
>> statement
>> is always executed for those RPCs.
>> 
> 
> Then the question is what is defeating the call to
> rpcauth_xmit_need_reencode() in xprt_request_transmit() and causing it
> not to trigger in the misordered cases?

Here's a sample RPC/RDMA case.

My instrumented server reports this:

Jan  2 13:29:00 klimt kernel: gss_check_seq_num: dropped: seq_num=141220 sd->sd_max=141360


ftrace log on the client shows this:

  kworker/u28:12-2191  [004]   194.048534: rpcgss_need_reencode: task:1761@5 xid=0x88f4f47c rq_seqno=141220 seq_xmit=141336 reencode unneeded
  kworker/u28:12-2191  [004]   194.048534: xprt_transmit:        task:1761@5 xid=0x88f4f47c seqno=141220 status=-57
  kworker/u28:12-2191  [004]   194.048534: rpc_task_run_action:  task:1779@5 flags=ASYNC runstate=RUNNING|ACTIVE|NEED_XMIT|NEED_RECV status=-57 action=call_transmit_status
  kworker/u28:12-2191  [004]   194.048535: rpc_task_run_action:  task:1779@5 flags=ASYNC runstate=RUNNING|ACTIVE|NEED_XMIT|NEED_RECV status=0 action=call_transmit
  kworker/u28:12-2191  [004]   194.048535: rpc_task_sleep:       task:1779@5 flags=ASYNC runstate=RUNNING|ACTIVE|NEED_XMIT|NEED_RECV status=-11 timeout=0 queue=xprt_sending


  kworker/u28:12-2191  [004]   194.048552: rpcgss_need_reencode: task:1761@5 xid=0x88f4f47c rq_seqno=141220 seq_xmit=141336 reencode unneeded
  kworker/u28:12-2191  [004]   194.048557: xprt_transmit:        task:1761@5 xid=0x88f4f47c seqno=141220 status=0


  kworker/u28:12-2191  [004]   194.048559: rpcgss_need_reencode: task:1902@5 xid=0x14f5f47c rq_seqno=141360 seq_xmit=141336 reencode unneeded
  kworker/u28:12-2191  [004]   194.048563: xprt_transmit:        task:1902@5 xid=0x14f5f47c seqno=141360 status=0


Note that first need_reencode: the sequence numbers show that the xmit
queue has been significantly re-ordered. The request being transmitted is
already very close to the lower end of the GSS sequence number window.

The server then re-ordereds these two slightly because the first one had
some Read chunks that need to be pulled over, the second was pure inline
and therefore could be processed immediately. That is enough to force the
first one outside the GSS sequence number window.

I haven't looked closely at the pathology of the TCP case.


--
Chuck Lever







[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux