Re: [PATCH v3 26/44] SUNRPC: Improve latency for interactive tasks

Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> · Wed, 2 Jan 2019 19:11:05 +0000

On Wed, 2019-01-02 at 14:08 -0500, Chuck Lever wrote:
> > On Jan 2, 2019, at 1:57 PM, Trond Myklebust <
> > trondmy@xxxxxxxxxxxxxxx> wrote:
> > 
> > On Wed, 2019-01-02 at 13:51 -0500, Chuck Lever wrote:
> > > > On Jan 2, 2019, at 1:45 PM, Trond Myklebust <
> > > > trondmy@xxxxxxxxxxxxxxx> wrote:
> > > > 
> > > > On Wed, 2019-01-02 at 13:17 -0500, Chuck Lever wrote:
> > > > > > On Dec 31, 2018, at 2:21 PM, Trond Myklebust <
> > > > > > trondmy@xxxxxxxxxxxxxxx> wrote:
> > > > > > 
> > > > > > On Mon, 2018-12-31 at 19:18 +0000, Trond Myklebust wrote:
> > > > > > > On Mon, 2018-12-31 at 14:09 -0500, Chuck Lever wrote:
> > > > > > > > > On Dec 31, 2018, at 1:59 PM, Trond Myklebust <
> > > > > > > > > trondmy@xxxxxxxxxxxxxxx> wrote:
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > The test for rpcauth_xmit_need_reencode() happens when we
> > > > > > > call
> > > > > > > xprt_request_transmit() to actually put the RPC call on
> > > > > > > the
> > > > > > > wire.
> > > > > > > The
> > > > > > > enqueue order should not be able to defeat that test.
> > > > > > > 
> > > > > > > Hmm... Is it perhaps the test for req->rq_bytes_sent that
> > > > > > > is
> > > > > > > failing
> > > > > > > because this is a retransmission after a
> > > > > > > disconnect/reconnect
> > > > > > > that
> > > > > > > didn't trigger a re-encode?
> > > > > > 
> > > > > > Actually, it might be worth a try to move the test for
> > > > > > rpcauth_xmit_need_reencode() outside the enclosing test for
> > > > > > req-
> > > > > > > rq_bytes_sent as that is just a minor optimisation.
> > > > > 
> > > > > Perhaps that's the case for TCP, but RPCs sent via xprtrdma
> > > > > never
> > > > > set
> > > > > req->rq_bytes_sent to a non-zero value. The body of the "if"
> > > > > statement
> > > > > is always executed for those RPCs.
> > > > > 
> > > > 
> > > > Then the question is what is defeating the call to
> > > > rpcauth_xmit_need_reencode() in xprt_request_transmit() and
> > > > causing
> > > > it
> > > > not to trigger in the misordered cases?
> > > 
> > > Here's a sample RPC/RDMA case.
> > > 
> > > My instrumented server reports this:
> > > 
> > > Jan  2 13:29:00 klimt kernel: gss_check_seq_num: dropped:
> > > seq_num=141220 sd->sd_max=141360
> > > 
> > > 
> > > ftrace log on the client shows this:
> > > 
> > >  kworker/u28:12-2191  [004]   194.048534: rpcgss_need_reencode:
> > > task:1761@5 xid=0x88f4f47c rq_seqno=141220 seq_xmit=141336
> > > reencode
> > > unneeded
> > >  kworker/u28:12-2191  [004]   194.048534:
> > > xprt_transmit:        task:1761@5 xid=0x88f4f47c seqno=141220
> > > status=-57
> > >  kworker/u28:12-2191  [004]   194.048534:
> > > rpc_task_run_action:  task:1779@5 flags=ASYNC
> > > runstate=RUNNING|ACTIVE|NEED_XMIT|NEED_RECV status=-57
> > > action=call_transmit_status
> > >  kworker/u28:12-2191  [004]   194.048535:
> > > rpc_task_run_action:  task:1779@5 flags=ASYNC
> > > runstate=RUNNING|ACTIVE|NEED_XMIT|NEED_RECV status=0
> > > action=call_transmit
> > >  kworker/u28:12-2191  [004]   194.048535:
> > > rpc_task_sleep:       task:1779@5 flags=ASYNC
> > > runstate=RUNNING|ACTIVE|NEED_XMIT|NEED_RECV status=-11 timeout=0
> > > queue=xprt_sending
> > > 
> > > 
> > >  kworker/u28:12-2191  [004]   194.048552: rpcgss_need_reencode:
> > > task:1761@5 xid=0x88f4f47c rq_seqno=141220 seq_xmit=141336
> > > reencode
> > > unneeded
> > >  kworker/u28:12-2191  [004]   194.048557:
> > > xprt_transmit:        task:1761@5 xid=0x88f4f47c seqno=141220
> > > status=0
> > > 
> > > 
> > >  kworker/u28:12-2191  [004]   194.048559: rpcgss_need_reencode:
> > > task:1902@5 xid=0x14f5f47c rq_seqno=141360 seq_xmit=141336
> > > reencode
> > > unneeded
> > >  kworker/u28:12-2191  [004]   194.048563:
> > > xprt_transmit:        task:1902@5 xid=0x14f5f47c seqno=141360
> > > status=0
> > > 
> > > 
> > > Note that first need_reencode: the sequence numbers show that the
> > > xmit
> > > queue has been significantly re-ordered. The request being
> > > transmitted is
> > > already very close to the lower end of the GSS sequence number
> > > window.
> > > 
> > > The server then re-ordereds these two slightly because the first
> > > one
> > > had
> > > some Read chunks that need to be pulled over, the second was pure
> > > inline
> > > and therefore could be processed immediately. That is enough to
> > > force
> > > the
> > > first one outside the GSS sequence number window.
> > > 
> > > I haven't looked closely at the pathology of the TCP case.
> > 
> > Wait a minute... That's not OK. The client can't be expected to
> > take
> > into account reordering that happens on the server side.
> 
> Conversely, the client can't assume the transport and the server
> don't
> re-order. This does not appear to be a problem for the v4.19 client:
> I don't see disconnect storms with that client.

There is absolutely nothing stopping it from happening in 4.19. It's
just very unlikely because the stream is strictly ordered on the client
side. So the misordering on the server would have to be pretty extreme.

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@xxxxxxxxxxxxxxx