Re: question about the performance impact of sec=krb5

Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> · Mon, 13 Feb 2023 18:36:34 +0000

> On Feb 13, 2023, at 12:45, Olga Kornievskaia <aglo@xxxxxxxxx> wrote:
> 
> On Mon, Feb 13, 2023 at 10:38 AM Trond Myklebust
> <trondmy@xxxxxxxxxxxxxxx> wrote:
>> 
>> 
>> 
>>> On Feb 13, 2023, at 09:55, Olga Kornievskaia <aglo@xxxxxxxxx> wrote:
>>> 
>>> On Sun, Feb 12, 2023 at 1:08 PM Chuck Lever III <chuck.lever@xxxxxxxxxx> wrote:
>>>> 
>>>> 
>>>> 
>>>>> On Feb 12, 2023, at 1:01 AM, Wang Yugui <wangyugui@xxxxxxxxxxxx> wrote:
>>>>> 
>>>>> Hi,
>>>>> 
>>>>> question about the performance of sec=krb5.
>>>>> 
>>>>> https://learn.microsoft.com/en-us/azure/azure-netapp-files/performance-impact-kerberos
>>>>> Performance impact of krb5:
>>>>>     Average IOPS decreased by 53%
>>>>>     Average throughput decreased by 53%
>>>>>     Average latency increased by 3.2 ms
>>>> 
>>>> Looking at the numbers in this article... they don't
>>>> seem quite right. Here are the others:
>>>> 
>>>>> Performance impact of krb5i:
>>>>>     • Average IOPS decreased by 55%
>>>>>     • Average throughput decreased by 55%
>>>>>     • Average latency increased by 0.6 ms
>>>>> Performance impact of krb5p:
>>>>>     • Average IOPS decreased by 77%
>>>>>     • Average throughput decreased by 77%
>>>>>     • Average latency increased by 1.6 ms
>>>> 
>>>> I would expect krb5p to be the worst in terms of
>>>> latency. And I would like to see round-trip numbers
>>>> reported: what part of the increase in latency is
>>>> due to server versus client processing?
>>>> 
>>>> This is also remarkable:
>>>> 
>>>>> When nconnect is used in Linux, the GSS security context is shared between all the nconnect connections to a particular server. TCP is a reliable transport that supports out-of-order packet delivery to deal with out-of-order packets in a GSS stream, using a sliding window of sequence numbers. When packets not in the sequence window are received, the security context is discarded, and a new security context is negotiated. All messages sent with in the now-discarded context are no longer valid, thus requiring the messages to be sent again. Larger number of packets in an nconnect setup cause frequent out-of-window packets, triggering the described behavior. No specific degradation percentages can be stated with this behavior.
>>>> 
>>>> 
>>>> So, does this mean that nconnect makes the GSS sequence
>>>> window problem worse, or that when a window underrun
>>>> occurs it has broader impact because multiple connections
>>>> are affected?
>>> 
>>> Yes nconnect makes the GSS sequence window problem worse (very typical
>>> to generate more than gss window size number of rpcs and have no
>>> ability to control in what order they would be sent) and yes all
>>> connections are affected. ONTAP as linux uses 128 gss window size but
>>> we've experimented with increasing it to larger values and it would
>>> still cause issues.
>>> 
>>>> Seems like maybe nconnect should set up a unique GSS
>>>> context for each xprt. It would be helpful to file a bug.
>>> 
>>> At the time when I saw the issue and asked about it (though can't find
>>> a reference now) I got the impression that having multiple contexts
>>> for the same rpc client was not going to be acceptable.
>>> 
>> 
>> We have discussed this earlier on this mailing list. To me, the two issues are separate.
>> - It would be nice to enforce the GSS window on the client, and to throttle further RPC calls from using a context once the window is full.
>> - It might also be nice to allow for multiple contexts on the client and to have them assigned on a per-xprt basis so that the number of slots scales with the number of connections.
>> 
>> Note though, that window issues do tend to be mitigated by the NFSv4.x (x>0) sessions. It would make sense for server vendors to ensure that they match the GSS window size to the max number of session slots.
> 
> Matching max session slots to gss window size doesn't help but perhaps
> my understanding of the flow is wrong. Typically all these runs are
> done with the client's default session slot # which is only 64slots
> (server's session slot size is higher). The session slot assignment
> happens after the gss sequence assignment. So we have a bunch of
> requests that have gotten gss sequence numbers that exceed the window
> slot and then they go wait for the slot assignment but when they are
> sent they are already out of sequence window.
> 

The NFSv4.x session slot is normally assigned before we kick off the RPC state machine in ‘call_start()’. So if you are limited to 64 session slots, then that will prevent you from exceeding the GSS 128 entry window.

_________________________________
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@xxxxxxxxxxxxxxx