On 7 February 2014 03:23, Herbert Xu <herbert@xxxxxxxxxxxxxxxxxxx> wrote: > On Thu, Feb 06, 2014 at 01:25:01PM +0100, Ard Biesheuvel wrote: >> My apologies if this has been discussed/debated before on linux-crypto. >> >> When working on accelerated crypto for ARM and arm64, I noticed that many of >> the existing accelerated implementations for other architectures duplicate much >> of the chaining modes, not because they can be accelerated themselves but mainly >> because the generic chaining mode implementations cannot present the data in >> large enough chunks for the accelerated implementations to reach their optimal >> speed. >> >> This series proposes a way to improve on that. I have only implemented the CBC >> example because it makes for nice a benchmark, but CTR and XTS are other obvious >> candidates for the treatment. >> >> I have included my arm64 AES cipher implementation for reference. > > We can already do this using the existing blkcipher interface > if the underlying accelerated implementation exports an ECB > version of itself. > That had occurred to me as well. > So if we're going to do this I'd like to see CBC/CTR/XTS simply > be modified to use ecb(X) instead of X where appropriate. > I agree that it would be trivial for cbc(%s) to probe for ecb(%s) before settling on using plain '%s. But how to probe for an /accelerated/ ecb(%s), i.e., how to avoid using the generic ecb(%s) which adds nothing but overhead? The other issue is how to find out what the optimal chunk size is for the accelerated ecb(%s) implementation, which would involve adding a struct member that holds the preferred number of blocks presented in a single invocation. In fact, that would solve both issues, as the probe could check this struct member for a >1 value (as my current series does but in against a cipher_alg instance) The downside is that this adds some overhead in handling the scatterlists: even though the cbc mode handler will only ever call the ecb mode handler with exactly the preferred number of blocks contiguously in memory (thanks to the outer blkcipher_walk_virt_block()), the ecb implementation -being a blkcipher instance- cannot make any assumptions so it needs its own level of blkcipher walking to traverse the data. Would there be any way around that? -- Ard. -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html