On Fri, 4 Dec 2020 at 17:52, David Howells <dhowells@xxxxxxxxxx> wrote: > > Bruce Fields <bfields@xxxxxxxxxxxx> wrote: > > > OK, I guess I don't understand the question. I haven't thought about > > this code in at least a decade. What's an auxilary cipher? Is this a > > question about why we're implementing something, or how we're > > implementing it? > > That's what the Linux sunrpc implementation calls them: > > struct crypto_sync_skcipher *acceptor_enc; > struct crypto_sync_skcipher *initiator_enc; > struct crypto_sync_skcipher *acceptor_enc_aux; > struct crypto_sync_skcipher *initiator_enc_aux; > > Auxiliary ciphers aren't mentioned in rfc396{1,2} so it appears to be > something peculiar to that implementation. > > So acceptor_enc and acceptor_enc_aux, for instance, are both based on the same > key, and the implementation seems to pass the IV from one to the other. The > only difference is that the 'aux' cipher lacks the CTS wrapping - which only > makes a difference for the final two blocks[*] of the encryption (or > decryption) - and only if the data doesn't fully fill out the last block > (ie. it needs padding in some way so that the encryption algorithm can handle > it). > > [*] Encryption cipher blocks, that is. > > So I think it's purpose is twofold: > > (1) It's a way to be a bit more efficient, cutting out the CTS layer's > indirection and additional buffering. > > (2) crypto_skcipher_encrypt() assumes that it's doing the entire crypto > operation in one go and will always impose the final CTS bit, so you > can't call it repeatedly to progress through a buffer (as > xdr_process_buf() would like to do) as that would corrupt the data being > encrypted - unless you made sure that the data was always block-size > aligned (in which case, there's no point using CTS). > > I wonder how much going through three layers of crypto modules costs. Looking > at how AES can be implemented using, say, Intel AES intructions, it looks like > AES+CBC should be easy to do in a single module. I wonder if we could have > optimised kerberos crypto that do the AES and the SHA together in a single > loop. > The tricky thing with CTS is that you have to ensure that the final full and partial blocks are presented to the crypto driver as one chunk, or it won't be able to perform the ciphertext stealing. This might be the reason for the current approach. If the sunrpc code has multiple disjoint chunks of data to encrypto, it is always better to wrap it in a single scatterlist and call into the skcipher only once. However, I would recommend against it: at least for ARM and arm64, I have already contributed SIMD based implementations that use SIMD permutation instructions and overlapping loads and stores to perform the ciphertext stealing, which means that there is only a single layer which implements CTS+CBC+AES, and this layer can consume the entire scatterlist in one go. We could easily do something similar in the AES-NI driver as well.