Re: [PATCH v3 26/44] SUNRPC: Improve latency for interactive tasks

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 2019-01-02 at 13:57 -0500, Trond Myklebust wrote:
> On Wed, 2019-01-02 at 13:51 -0500, Chuck Lever wrote:
> > > On Jan 2, 2019, at 1:45 PM, Trond Myklebust <
> > > trondmy@xxxxxxxxxxxxxxx> wrote:
> > > 
> > > On Wed, 2019-01-02 at 13:17 -0500, Chuck Lever wrote:
> > > > > On Dec 31, 2018, at 2:21 PM, Trond Myklebust <
> > > > > trondmy@xxxxxxxxxxxxxxx> wrote:
> > > > > 
> > > > > On Mon, 2018-12-31 at 19:18 +0000, Trond Myklebust wrote:
> > > > > > On Mon, 2018-12-31 at 14:09 -0500, Chuck Lever wrote:
> > > > > > > > On Dec 31, 2018, at 1:59 PM, Trond Myklebust <
> > > > > > > > trondmy@xxxxxxxxxxxxxxx> wrote:
> > > > > > > > 
> > > > > > > > 
> > > > > > The test for rpcauth_xmit_need_reencode() happens when we
> > > > > > call
> > > > > > xprt_request_transmit() to actually put the RPC call on the
> > > > > > wire.
> > > > > > The
> > > > > > enqueue order should not be able to defeat that test.
> > > > > > 
> > > > > > Hmm... Is it perhaps the test for req->rq_bytes_sent that
> > > > > > is
> > > > > > failing
> > > > > > because this is a retransmission after a
> > > > > > disconnect/reconnect
> > > > > > that
> > > > > > didn't trigger a re-encode?
> > > > > 
> > > > > Actually, it might be worth a try to move the test for
> > > > > rpcauth_xmit_need_reencode() outside the enclosing test for
> > > > > req-
> > > > > > rq_bytes_sent as that is just a minor optimisation.
> > > > 
> > > > Perhaps that's the case for TCP, but RPCs sent via xprtrdma
> > > > never
> > > > set
> > > > req->rq_bytes_sent to a non-zero value. The body of the "if"
> > > > statement
> > > > is always executed for those RPCs.
> > > > 
> > > 
> > > Then the question is what is defeating the call to
> > > rpcauth_xmit_need_reencode() in xprt_request_transmit() and
> > > causing
> > > it
> > > not to trigger in the misordered cases?
> > 
> > Here's a sample RPC/RDMA case.
> > 
> > My instrumented server reports this:
> > 
> > Jan  2 13:29:00 klimt kernel: gss_check_seq_num: dropped:
> > seq_num=141220 sd->sd_max=141360
> > 
> > 
> > ftrace log on the client shows this:
> > 
> >   kworker/u28:12-2191  [004]   194.048534: rpcgss_need_reencode:
> > task:1761@5 xid=0x88f4f47c rq_seqno=141220 seq_xmit=141336 reencode
> > unneeded
> >   kworker/u28:12-2191  [004]   194.048534:
> > xprt_transmit:        task:1761@5 xid=0x88f4f47c seqno=141220
> > status=-57
> >   kworker/u28:12-2191  [004]   194.048534:
> > rpc_task_run_action:  task:1779@5 flags=ASYNC
> > runstate=RUNNING|ACTIVE|NEED_XMIT|NEED_RECV status=-57
> > action=call_transmit_status
> >   kworker/u28:12-2191  [004]   194.048535:
> > rpc_task_run_action:  task:1779@5 flags=ASYNC
> > runstate=RUNNING|ACTIVE|NEED_XMIT|NEED_RECV status=0
> > action=call_transmit
> >   kworker/u28:12-2191  [004]   194.048535:
> > rpc_task_sleep:       task:1779@5 flags=ASYNC
> > runstate=RUNNING|ACTIVE|NEED_XMIT|NEED_RECV status=-11 timeout=0
> > queue=xprt_sending
> > 
> > 
> >   kworker/u28:12-2191  [004]   194.048552: rpcgss_need_reencode:
> > task:1761@5 xid=0x88f4f47c rq_seqno=141220 seq_xmit=141336 reencode
> > unneeded
> >   kworker/u28:12-2191  [004]   194.048557:
> > xprt_transmit:        task:1761@5 xid=0x88f4f47c seqno=141220
> > status=0
> > 
> > 
> >   kworker/u28:12-2191  [004]   194.048559: rpcgss_need_reencode:
> > task:1902@5 xid=0x14f5f47c rq_seqno=141360 seq_xmit=141336 reencode
> > unneeded
> >   kworker/u28:12-2191  [004]   194.048563:
> > xprt_transmit:        task:1902@5 xid=0x14f5f47c seqno=141360
> > status=0
> > 
> > 
> > Note that first need_reencode: the sequence numbers show that the
> > xmit
> > queue has been significantly re-ordered. The request being
> > transmitted is
> > already very close to the lower end of the GSS sequence number
> > window.
> > 
> > The server then re-ordereds these two slightly because the first
> > one
> > had
> > some Read chunks that need to be pulled over, the second was pure
> > inline
> > and therefore could be processed immediately. That is enough to
> > force
> > the
> > first one outside the GSS sequence number window.
> > 
> > I haven't looked closely at the pathology of the TCP case.
> 
> Wait a minute... That's not OK. The client can't be expected to take
> into account reordering that happens on the server side.

If that's the case, then we would need to halt transmission as soon as
we hit the RPCSEC_GSS window edge. Off the cuff, I'm not sure how to do
that, since those windows are per session (i.e. per user).

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@xxxxxxxxxxxxxxx






[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux