On Fri, Dec 09, 2016 at 01:47:26PM +0000, Ard Biesheuvel wrote: > The bit-sliced NEON implementation of AES only performs optimally if > it can process 8 blocks of input in parallel. This is due to the nature > of bit slicing, where the n-th bit of each byte of AES state of each input > block is collected into NEON register 'n', for registers q0 - q7. > > This implies that the amount of work for the transform is fixed, > regardless of whether we are handling just one block or 8 in parallel. > > So let's try a bit harder to iterate over the input in suitably sized > chunks, by increasing the chunksize to 8 * AES_BLOCK_SIZE, and tweaking > the loops to only process multiples of the chunk size, unless we are > handling the last chunk in the input stream. > > Note that the skcipher walk API guarantees that a step in the walk never > returns less that 'chunksize' bytes if there are at least that many bytes > of input still available. However, it does *not* guarantee that those steps > produce an exact multiple of the chunk size. > > Signed-off-by: Ard Biesheuvel <ard.biesheuvel@xxxxxxxxxx> I like this patch. However, I had different plans for the chunksize attribute. It's primarily meant to be a hint to the upper layer in case it does partial updates. It's meant to provide the minimum number of bytes a partial update can carry without screwing up subsequent updates. It just happens to be the same value that we were using during an skcipher walk. So I think for your case we should add a new attribute, perhaps walk_chunksize or walksize, which doesn't need to be exported to the outside at all and can then be used by the walk interface. Thanks, -- Email: Herbert Xu <herbert@xxxxxxxxxxxxxxxxxxx> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html