On 3 August 2018 at 17:47, Herbert Xu <herbert@xxxxxxxxxxxxxxxxxxx> wrote: > On Mon, Jul 30, 2018 at 11:06:39PM +0200, Ard Biesheuvel wrote: >> Update the combined AES-GCM AEAD implementation to process two blocks >> at a time, allowing us to switch to a faster version of the GHASH >> implementation. >> >> Note that this does not update the core GHASH transform, only the >> combined AES-GCM AEAD mode. GHASH is mostly used with AES anyway, and >> the ARMv8 architecture mandates support for AES instructions if >> 64-bit polynomial multiplication instructions are implemented. This >> means that mosts users of the pmull.p64 based GHASH routines are better >> off using the combined AES-GCM code anyway. Users of the pmull.p8 based >> GHASH implementation are unlikely to benefit substantially from aggregation, >> given that the multiplication phase is much more dominant in this case >> (and it is only the reduction phase that is amortized over multiple >> blocks) >> >> Performance numbers for Cortex-A53 can be found after patches #2 and #3. >> >> Changes since v1: >> - rebase to take the changes in patch 'crypto: arm64 - revert NEON yield for >> fast AEAD implementations' which I sent out on July 29th >> - add a patch to reduce the number of invocations of kernel_neon_begin() >> and kernel_neon_end() on the common path >> >> Ard Biesheuvel (3): >> crypto/arm64: aes-ce-gcm - operate on two input blocks at a time >> crypto/arm64: aes-ce-gcm - implement 2-way aggregation >> crypto: arm64/aes-ce-gcm - don't reload key schedule if avoidable >> >> arch/arm64/crypto/ghash-ce-core.S | 136 +++++++++------ >> arch/arm64/crypto/ghash-ce-glue.c | 176 ++++++++++++-------- >> 2 files changed, 198 insertions(+), 114 deletions(-) > > All applied. Thanks. Thanks Herbert.