Re: question about the performance impact of sec=krb5

Chuck Lever III <chuck.lever@xxxxxxxxxx> · Mon, 13 Feb 2023 01:07:13 +0000

> On Feb 12, 2023, at 5:45 PM, Wang Yugui <wangyugui@xxxxxxxxxxxx> wrote:
> 
> Hi,
> 
>> 
>> 
>>> On Feb 12, 2023, at 1:01 AM, Wang Yugui <wangyugui@xxxxxxxxxxxx> wrote:
>>> 
>>> Hi,
>>> 
>>> question about the performance of sec=krb5.
>>> 
>>> https://learn.microsoft.com/en-us/azure/azure-netapp-files/performance-impact-kerberos
>>> Performance impact of krb5:
>>> 	Average IOPS decreased by 53%
>>> 	Average throughput decreased by 53%
>>> 	Average latency increased by 3.2 ms
>> 
>> Looking at the numbers in this article... they don't
>> seem quite right. Here are the others:
>> 
>>> Performance impact of krb5i:
>>> 	? Average IOPS decreased by 55%
>>> 	? Average throughput decreased by 55%
>>> 	? Average latency increased by 0.6 ms
>>> Performance impact of krb5p:
>>> 	? Average IOPS decreased by 77%
>>> 	? Average throughput decreased by 77%
>>> 	? Average latency increased by 1.6 ms
>> 
>> I would expect krb5p to be the worst in terms of
>> latency. And I would like to see round-trip numbers
>> reported: what part of the increase in latency is
>> due to server versus client processing?
>> 
>> This is also remarkable:
>> 
>>> When nconnect is used in Linux, the GSS security context is shared between all the nconnect connections to a particular server. TCP is a reliable transport that supports out-of-order packet delivery to deal with out-of-order packets in a GSS stream, using a sliding window of sequence numbers.?When packets not in the sequence window are received, the security context is discarded, and?a new security context is negotiated. All messages sent with in the now-discarded context are no longer valid, thus requiring the messages to be sent again. Larger number of packets in an nconnect setup cause frequent out-of-window packets, triggering the described behavior. No specific degradation percentages can be stated with this behavior.
>> 
>> 
>> So, does this mean that nconnect makes the GSS sequence
>> window problem worse, or that when a window underrun
>> occurs it has broader impact because multiple connections
>> are affected?
>> 
>> Seems like maybe nconnect should set up a unique GSS
>> context for each xprt. It would be helpful to file a bug.
>> 
>> 
>>> and then in 'man 5 nfs'
>>> sec=krb5  provides cryptographic proof of a user's identity in each RPC request.
>> 
>> Kerberos has performance impacts due to the crypto-
>> graphic operations that are performed on even small
>> fixed-sized sections of each RPC message, when using
>> sec=krb5 (no 'i' or 'p').
>> 
>> 
>>> Is there a option of better performance to check krb5 only when mount.nfs4,
>>> not when file acess?
>> 
>> If you mount with NFSv4 and sec=sys from a Linux NFS
>> client that has a keytab, the client will attempt to
>> use krb5i for lease management operations (such as
>> EXCHANGE_ID) but it will continue to use sec=sys for
>> user authentication. That's not terribly secure.
> 
> I noticed this feature in this case
> - the nfs client joined the windows AD(then have a keytab)
> - the windows AD server is shutdown.
> then 'mount.nfs4 -o sec=sys' will take about 3 min.
> because there are 60s timeout  *3 inside.
> but 'sec=sys' does not need any krb5 operations?

I would expect some bad behavior in this case: the
client is using Kerberos while part of the network
service infrastructure is not available to it. It's
going to hang.

If you don't want sec=sys to hang, then either don't
take the AD offline, don't put a keytab on the client,
or don't use NFSv4.

> maybe we can have another krb5 mode, such as 'krb5l'
> - the nfs client must have a keytab.
> - krb5 must be used only when mount.nfs4

It's not that simple.

All mount points on that client of that server share the
same lease, whether they are sec=sys or sec=krb5*. The
krb5 mounts must use krb5 for lease management, the
sec=sys mounts may use it, but don't have to.

What's more, when the client reboots, it needs to re-
identify itself to the server using the same credential,
no matter which order the mounts are re-established --
sys first or krb5 first.

Or, more generally speaking, when a keytab is present,
even if the client has only sec=sys mounts at this moment,
it might establish a sec=krb5 mount at any time in the
future. For instance, consider the case where only sec=sys
mounts reside in /etc/fstab that get mounted at boot time,
but there are sec=krb5 mounts in an automounter map that
get pulled in when a user accesses them.

In other words, it's not a per-mount setting, and it has
to be the same principal and security flavor after every
client reboot. We picked an appropriate level of security
for lease management that meets these requirements. The
only choice is to use Kerberos if there is even the
possibility that sec=krb5* can be used.

It might be surprising behavior, until you realize this
is kind of the only way it can work with a single lease
per client. Plus it encourages better security.

> It would be more secure than IP address check in /etc/exorts?

Well, it would provide some degree of peer authentication
based on whatever principal is available on the client
(a host service principal or some user that wants to
provide a password for this purpose).

But then user I/O requests would use AUTH_SYS, which is
trivial to alter while the RPC messages transit an open
network. That's what I meant by not terribly secure. But
better than all AUTH_SYS, sure.

> Best Regards
> Wang Yugui (wangyugui@xxxxxxxxxxxx)
> 2023/02/13
> 
> 
>> 
>> A better answer would be to make Kerberos faster.
>> I've done some recent work on improving the overhead
>> of using message digest algorithms with GSS-API, but
>> haven't done any specific measurement. I'm sure
>> there's more room for optimization.
>> 
>> Even better would be to use a transport layer security
>> service. Amazon has EFS and Oracle Cloud has something
>> similar, but we're working on a standard approach that
>> uses TLSv1.3.
>> 
>> 
>> --
>> Chuck Lever

--
Chuck Lever