On Mon, 7 Dec 2020 at 13:02, David Howells <dhowells@xxxxxxxxxx> wrote: > > Ard Biesheuvel <ardb@xxxxxxxxxx> wrote: > > > > Yeah - the problem with that is that for sunrpc, we might be dealing with 1MB > > > plus bits of non-contiguous pages, requiring >8K of scatterlist elements > > > (admittedly, we can chain them, but we may have to do one or more large > > > allocations). > > > > > > > However, I would recommend against it: > > > > > > Sorry, recommend against what? > > > > > > > Recommend against the current approach of manipulating the input like > > this and feeding it into the skcipher piecemeal. > > Right. I understand the problem, but as I mentioned above, the scatterlist > itself becomes a performance issue as it may exceed two pages in size. Double > that as there may need to be separate input and output scatterlists. > I wasn't aware that Herbert's work hadn't been merged yet. So that means it is entirely reasonable to split the input like this and feed the first part into a cbc(aes) skcipher and the last part into a cts(cbc(aes)) skcipher, provided that you ensure that the last part covers the final two blocks (one full block and one block that is either full or partial) With Herbert's changes, you will be able to use the same skcipher, and pass a flag to all but the final part that more data is coming. But for lack of that, the current approach is optimal for cases where having to cover the entire input with a single scatterlist is undesirable. > > Herbert recently made some changes for MSG_MORE support in the AF_ALG > > code, which permits a skcipher encryption to be split into several > > invocations of the skcipher layer without the need for this complexity > > on the side of the caller. Maybe there is a way to reuse that here. > > Herbert? > > I wonder if it would help if the input buffer and output buffer didn't have to > correspond exactly in usage - ie. the output buffer could be used at a slower > rate than the input to allow for buffering inside the crypto algorithm. > I don't follow - how could one be used at a slower rate? > > > Can you also do SHA at the same time in the same loop? > > > > SHA-1 or HMAC-SHA1? The latter could probably be modeled as an AEAD. > > The former doesn't really fit the current API so we'd have to invent > > something for it. > > The hashes corresponding to the kerberos enctypes I'm supporting are: > > HMAC-SHA1 for aes128-cts-hmac-sha1-96 and aes256-cts-hmac-sha1-96. > > HMAC-SHA256 for aes128-cts-hmac-sha256-128 > > HMAC-SHA384 for aes256-cts-hmac-sha384-192 > > CMAC-CAMELLIA for camellia128-cts-cmac and camellia256-cts-cmac > > I'm not sure you can support all of those with the instructions available. > It depends on whether the caller can make use of the authenc() pattern, which is a type of AEAD we support. There are numerous implementations of authenc(hmac(shaXXX),cbc(aes)), including h/w accelerated ones, but none that implement ciphertext stealing. So that means that, even if you manage to use the AEAD layer to perform both at the same time, the generic authenc() template will perform the cts(cbc(aes)) and hmac(shaXXX) by calling into skciphers and ahashes, respectively, which won't give you any benefit until accelerated implementations turn up that perform the whole operation in one pass over the input. And even then, I don't think the performance benefit will be worth it.