> On Feb 13, 2023, at 12:45, Olga Kornievskaia <aglo@xxxxxxxxx> wrote: > > On Mon, Feb 13, 2023 at 10:38 AM Trond Myklebust > <trondmy@xxxxxxxxxxxxxxx> wrote: >> >> >> >>> On Feb 13, 2023, at 09:55, Olga Kornievskaia <aglo@xxxxxxxxx> wrote: >>> >>> On Sun, Feb 12, 2023 at 1:08 PM Chuck Lever III <chuck.lever@xxxxxxxxxx> wrote: >>>> >>>> >>>> >>>>> On Feb 12, 2023, at 1:01 AM, Wang Yugui <wangyugui@xxxxxxxxxxxx> wrote: >>>>> >>>>> Hi, >>>>> >>>>> question about the performance of sec=krb5. >>>>> >>>>> https://learn.microsoft.com/en-us/azure/azure-netapp-files/performance-impact-kerberos >>>>> Performance impact of krb5: >>>>> Average IOPS decreased by 53% >>>>> Average throughput decreased by 53% >>>>> Average latency increased by 3.2 ms >>>> >>>> Looking at the numbers in this article... they don't >>>> seem quite right. Here are the others: >>>> >>>>> Performance impact of krb5i: >>>>> • Average IOPS decreased by 55% >>>>> • Average throughput decreased by 55% >>>>> • Average latency increased by 0.6 ms >>>>> Performance impact of krb5p: >>>>> • Average IOPS decreased by 77% >>>>> • Average throughput decreased by 77% >>>>> • Average latency increased by 1.6 ms >>>> >>>> I would expect krb5p to be the worst in terms of >>>> latency. And I would like to see round-trip numbers >>>> reported: what part of the increase in latency is >>>> due to server versus client processing? >>>> >>>> This is also remarkable: >>>> >>>>> When nconnect is used in Linux, the GSS security context is shared between all the nconnect connections to a particular server. TCP is a reliable transport that supports out-of-order packet delivery to deal with out-of-order packets in a GSS stream, using a sliding window of sequence numbers. When packets not in the sequence window are received, the security context is discarded, and a new security context is negotiated. All messages sent with in the now-discarded context are no longer valid, thus requiring the messages to be sent again. Larger number of packets in an nconnect setup cause frequent out-of-window packets, triggering the described behavior. No specific degradation percentages can be stated with this behavior. >>>> >>>> >>>> So, does this mean that nconnect makes the GSS sequence >>>> window problem worse, or that when a window underrun >>>> occurs it has broader impact because multiple connections >>>> are affected? >>> >>> Yes nconnect makes the GSS sequence window problem worse (very typical >>> to generate more than gss window size number of rpcs and have no >>> ability to control in what order they would be sent) and yes all >>> connections are affected. ONTAP as linux uses 128 gss window size but >>> we've experimented with increasing it to larger values and it would >>> still cause issues. >>> >>>> Seems like maybe nconnect should set up a unique GSS >>>> context for each xprt. It would be helpful to file a bug. >>> >>> At the time when I saw the issue and asked about it (though can't find >>> a reference now) I got the impression that having multiple contexts >>> for the same rpc client was not going to be acceptable. >>> >> >> We have discussed this earlier on this mailing list. To me, the two issues are separate. >> - It would be nice to enforce the GSS window on the client, and to throttle further RPC calls from using a context once the window is full. >> - It might also be nice to allow for multiple contexts on the client and to have them assigned on a per-xprt basis so that the number of slots scales with the number of connections. >> >> Note though, that window issues do tend to be mitigated by the NFSv4.x (x>0) sessions. It would make sense for server vendors to ensure that they match the GSS window size to the max number of session slots. > > Matching max session slots to gss window size doesn't help but perhaps > my understanding of the flow is wrong. Typically all these runs are > done with the client's default session slot # which is only 64slots > (server's session slot size is higher). The session slot assignment > happens after the gss sequence assignment. So we have a bunch of > requests that have gotten gss sequence numbers that exceed the window > slot and then they go wait for the slot assignment but when they are > sent they are already out of sequence window. > The NFSv4.x session slot is normally assigned before we kick off the RPC state machine in ‘call_start()’. So if you are limited to 64 session slots, then that will prevent you from exceeding the GSS 128 entry window. _________________________________ Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@xxxxxxxxxxxxxxx