Update the combined AES-GCM AEAD implementation to process two blocks at a time, allowing us to switch to a faster version of the GHASH implementation. Note that this does not update the core GHASH transform, only the combined AES-GCM AEAD mode. GHASH is mostly used with AES anyway, and the ARMv8 architecture mandates support for AES instructions if 64-bit polynomial multiplication instructions are implemented. This means that mosts users of the pmull.p64 based GHASH routines are better off using the combined AES-GCM code anyway. Users of the pmull.p8 based GHASH implementation are unlikely to benefit substantially from aggregation, given that the multiplication phase is much more dominant in this case (and it is only the reduction phase that is amortized over multiple blocks) Performance numbers for Cortex-A53 can be found after patches #2 and #3. Changes since v1: - rebase to take the changes in patch 'crypto: arm64 - revert NEON yield for fast AEAD implementations' which I sent out on July 29th - add a patch to reduce the number of invocations of kernel_neon_begin() and kernel_neon_end() on the common path Ard Biesheuvel (3): crypto/arm64: aes-ce-gcm - operate on two input blocks at a time crypto/arm64: aes-ce-gcm - implement 2-way aggregation crypto: arm64/aes-ce-gcm - don't reload key schedule if avoidable arch/arm64/crypto/ghash-ce-core.S | 136 +++++++++------ arch/arm64/crypto/ghash-ce-glue.c | 176 ++++++++++++-------- 2 files changed, 198 insertions(+), 114 deletions(-) -- 2.18.0