Re: [PATCH v3 26/44] SUNRPC: Improve latency for interactive tasks

Chuck Lever <chuck.lever@xxxxxxxxxx> · Wed, 2 Jan 2019 13:17:59 -0500

> On Dec 31, 2018, at 2:21 PM, Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> wrote:
> 
> On Mon, 2018-12-31 at 19:18 +0000, Trond Myklebust wrote:
>> On Mon, 2018-12-31 at 14:09 -0500, Chuck Lever wrote:
>>>> On Dec 31, 2018, at 1:59 PM, Trond Myklebust <
>>>> trondmy@xxxxxxxxxxxxxxx> wrote:
>>>> 
>>>> On Mon, 2018-12-31 at 13:44 -0500, Chuck Lever wrote:
>>>>>> On Dec 31, 2018, at 1:09 PM, Trond Myklebust <
>>>>>> trondmy@xxxxxxxxxxxxxxx> wrote:
>>>>>> 
>>>>>> On Thu, 2018-12-27 at 17:34 -0500, Chuck Lever wrote:
>>>>>>>> On Dec 27, 2018, at 5:14 PM, Trond Myklebust <
>>>>>>>> trondmy@xxxxxxxxxxxxxxx> wrote:
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> On Dec 27, 2018, at 20:21, Chuck Lever <
>>>>>>>>> chuck.lever@xxxxxxxxxx>
>>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>> Hi Trond-
>>>>>>>>> 
>>>>>>>>> I've chased down a couple of remaining regressions with
>>>>>>>>> the
>>>>>>>>> v4.20
>>>>>>>>> NFS client,
>>>>>>>>> and they seem to be rooted in this commit.
>>>>>>>>> 
>>>>>>>>> When using sec=krb5, krb5i, or krb5p I found that
>>>>>>>>> multi-
>>>>>>>>> threaded
>>>>>>>>> workloads
>>>>>>>>> trigger a lot of server-side disconnects. This is with
>>>>>>>>> TCP
>>>>>>>>> and
>>>>>>>>> RDMA transports.
>>>>>>>>> An instrumented server shows that the client is under-
>>>>>>>>> running 
>>>>>>>>> the
>>>>>>>>> GSS sequence
>>>>>>>>> number window. I monitored the order in which GSS
>>>>>>>>> sequence
>>>>>>>>> numbers appear on
>>>>>>>>> the wire, and after this commit, the sequence numbers
>>>>>>>>> are
>>>>>>>>> wildly
>>>>>>>>> misordered.
>>>>>>>>> If I revert the hunk in xprt_request_enqueue_transmit,
>>>>>>>>> the
>>>>>>>>> problem goes away.
>>>>>>>>> 
>>>>>>>>> I also found that reverting that hunk results in a 3-4%
>>>>>>>>> improvement in fio
>>>>>>>>> IOPS rates, as well as improvement in average and
>>>>>>>>> maximum
>>>>>>>>> latency
>>>>>>>>> as reported
>>>>>>>>> by fio.
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> Hmm… Provided the sequence numbers still lie within the
>>>>>>>> window,
>>>>>>>> then why would the order matter?
>>>>>>> 
>>>>>>> The misordering is so bad that one request is delayed long
>>>>>>> enough
>>>>>>> to
>>>>>>> fall outside the window. The new “need re-encode” logic
>>>>>>> does
>>>>>>> not
>>>>>>> trigger.
>>>>>>> 
>>>>>> 
>>>>>> That's weird. I can't see anything wrong with need re-encode
>>>>>> at
>>>>>> this
>>>>>> point.
>>>>> 
>>>>> I don't think there is anything wrong with it, it looks like
>>>>> it's
>>>>> not called in this case.
>>>> 
>>>> So you are saying that the call to rpcauth_xmit_need_reencode()
>>>> is
>>>> triggering the EBADMSG, but that this fails to cause a re-encode
>>>> of
>>>> the
>>>> message?
>>> 
>>> No, I think what's going on is that the need_reencode happens when
>>> the
>>> RPC is enqueued, and is successful.
>>> 
>>> But xprt_request_enqueue_transmit places the RPC somewhere in the
>>> middle
>>> of xmit_queue. xmit_queue is long enough that more than 128
>>> requests
>>> are
>>> before the enqueued request.
>> 
>> The test for rpcauth_xmit_need_reencode() happens when we call
>> xprt_request_transmit() to actually put the RPC call on the wire. The
>> enqueue order should not be able to defeat that test.
>> 
>> Hmm... Is it perhaps the test for req->rq_bytes_sent that is failing
>> because this is a retransmission after a disconnect/reconnect that
>> didn't trigger a re-encode?
> 
> Actually, it might be worth a try to move the test for
> rpcauth_xmit_need_reencode() outside the enclosing test for req-
>> rq_bytes_sent as that is just a minor optimisation.

Perhaps that's the case for TCP, but RPCs sent via xprtrdma never set
req->rq_bytes_sent to a non-zero value. The body of the "if" statement
is always executed for those RPCs.

--
Chuck Lever