Re: [PATCH v3 26/44] SUNRPC: Improve latency for interactive tasks

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 2018-12-31 at 13:44 -0500, Chuck Lever wrote:
> > On Dec 31, 2018, at 1:09 PM, Trond Myklebust <
> > trondmy@xxxxxxxxxxxxxxx> wrote:
> > 
> > On Thu, 2018-12-27 at 17:34 -0500, Chuck Lever wrote:
> > > > On Dec 27, 2018, at 5:14 PM, Trond Myklebust <
> > > > trondmy@xxxxxxxxxxxxxxx> wrote:
> > > > 
> > > > 
> > > > 
> > > > > On Dec 27, 2018, at 20:21, Chuck Lever <
> > > > > chuck.lever@xxxxxxxxxx>
> > > > > wrote:
> > > > > 
> > > > > Hi Trond-
> > > > > 
> > > > > I've chased down a couple of remaining regressions with the
> > > > > v4.20
> > > > > NFS client,
> > > > > and they seem to be rooted in this commit.
> > > > > 
> > > > > When using sec=krb5, krb5i, or krb5p I found that multi-
> > > > > threaded
> > > > > workloads
> > > > > trigger a lot of server-side disconnects. This is with TCP
> > > > > and
> > > > > RDMA transports.
> > > > > An instrumented server shows that the client is under-running 
> > > > > the
> > > > > GSS sequence
> > > > > number window. I monitored the order in which GSS sequence
> > > > > numbers appear on
> > > > > the wire, and after this commit, the sequence numbers are
> > > > > wildly
> > > > > misordered.
> > > > > If I revert the hunk in xprt_request_enqueue_transmit, the
> > > > > problem goes away.
> > > > > 
> > > > > I also found that reverting that hunk results in a 3-4%
> > > > > improvement in fio
> > > > > IOPS rates, as well as improvement in average and maximum
> > > > > latency
> > > > > as reported
> > > > > by fio.
> > > > > 
> > > > 
> > > > Hmm… Provided the sequence numbers still lie within the window,
> > > > then why would the order matter?
> > > 
> > > The misordering is so bad that one request is delayed long enough
> > > to
> > > fall outside the window. The new “need re-encode” logic does not
> > > trigger.
> > > 
> > 
> > That's weird. I can't see anything wrong with need re-encode at
> > this
> > point.
> 
> I don't think there is anything wrong with it, it looks like it's
> not called in this case.

So you are saying that the call to rpcauth_xmit_need_reencode() is
triggering the EBADMSG, but that this fails to cause a re-encode of the
message?

> 
> > Do the window sizes agree on the client and the server?
> 
> Yes, both are 128. I also tried with 64 on the client side and 128
> on the server side. That reduces the frequency of disconnects, but
> does not eliminate them.
> 
> I'm not clear what problem the logic in xprt_request_enqueue_transmit
> is trying to address. It seems to me that the initial, simple
> implementation of this function is entirely adequate..?

I agree that the fair queueing code could result in a reordering that
could screw up the RPCSEC_GSS sequencing. However, we do expect the
need reencode stuff to catch that.

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@xxxxxxxxxxxxxxx






[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux