Ard Biesheuvel <ardb@xxxxxxxxxx> wrote: > The tricky thing with CTS is that you have to ensure that the final > full and partial blocks are presented to the crypto driver as one > chunk, or it won't be able to perform the ciphertext stealing. This > might be the reason for the current approach. If the sunrpc code has > multiple disjoint chunks of data to encrypto, it is always better to > wrap it in a single scatterlist and call into the skcipher only once. Yeah - the problem with that is that for sunrpc, we might be dealing with 1MB plus bits of non-contiguous pages, requiring >8K of scatterlist elements (admittedly, we can chain them, but we may have to do one or more large allocations). > However, I would recommend against it: Sorry, recommend against what? > at least for ARM and arm64, I > have already contributed SIMD based implementations that use SIMD > permutation instructions and overlapping loads and stores to perform > the ciphertext stealing, which means that there is only a single layer > which implements CTS+CBC+AES, and this layer can consume the entire > scatterlist in one go. We could easily do something similar in the > AES-NI driver as well. Can you point me at that in the sources? Can you also do SHA at the same time in the same loop? Note that the rfc3962 AES does the checksum over the plaintext, but rfc8009 does it over the ciphertext. David