On Thu, Dec 17, 2020 at 07:55:16PM +0100, Ard Biesheuvel wrote: > Counter mode is a stream cipher chaining mode that is typically used > with inputs that are of arbitrarily length, and so a tail block which > is smaller than a full AES block is rule rather than exception. > > The current ctr(aes) implementation for arm64 always makes a separate > call into the assembler routine to process this tail block, which is > suboptimal, given that it requires reloading of the AES round keys, > and prevents us from handling this tail block using the 5-way stride > that we use for better performance on deep pipelines. > > So let's update the assembler routine so it can handle any input size, > and uses NEON permutation instructions and overlapping loads and stores > to handle the tail block. This results in a ~16% speedup for 1420 byte > blocks on cores with deep pipelines such as ThunderX2. > > Signed-off-by: Ard Biesheuvel <ardb@xxxxxxxxxx> > --- > arch/arm64/crypto/aes-glue.c | 46 +++--- > arch/arm64/crypto/aes-modes.S | 165 +++++++++++++------- > 2 files changed, 137 insertions(+), 74 deletions(-) Patch applied. Thanks. -- Email: Herbert Xu <herbert@xxxxxxxxxxxxxxxxxxx> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt