On Mon, Feb 03, 2025 at 02:23:19PM +0000, David Howells wrote: > [!] Note that the net/sunrpc/auth_gss/ implementation gets a pair of > ciphers, one non-CTS and one CTS, using the former to do all the aligned > blocks and the latter to do the last two blocks if they aren't also > aligned. It may be necessary to do this here too for performance reasons - > but there are considerations both ways: > > (1) firstly, there is an optimised assembly version of cts(cbc(aes)) on > x86_64 that should be used instead of having two ciphers; > > (2) secondly, none of the hardware offload drivers seem to offer CTS > support (Intel QAT does not, for instance). > > However, I don't know if it's possible to query the crypto API to find out > whether there's an optimised CTS algorithm available. Linux's "cts" is specifically the CS3 variant of CTS (using the terminology of NIST SP800-38A https://dl.acm.org/doi/pdf/10.5555/2206248) which unconditionally swaps the last two blocks. Is that the variant that is needed here? SP800-38A mentions that CS3 is the variant used in Kerberos 5, so I assume yes. If yes, then you need to use cts(cbc(aes)) unconditionally. (BTW, I hope you have some test that shows that you actually implemented the Kerberos protocol correctly?) x86_64 already has an AES-NI assembly optimized cts(cbc(aes)), as you mentioned. I will probably add a VAES optimized cts(cbc(aes)) at some point; I've just been doing other modes first. I don't see why off-CPU hardware offload support should deserve much attention here, given the extremely high speed of on-CPU crypto these days and the great difficulty of integrating off-CPU acceleration efficiently. In particular it seems weird to consider Intel QAT a reasonable thing to use over VAES. Regardless, absent direct support for cts(cbc(aes)) the cts template will build it on top of cbc(aes) anyway. - Eric