On Sat, Sep 01, 2018 at 12:17:07AM -0700, Eric Biggers wrote: > From: Eric Biggers <ebiggers@xxxxxxxxxx> > > Optimize ChaCha20 NEON performance by: > > - Implementing the 8-bit rotations using the 'vtbl.8' instruction. > - Streamlining the part that adds the original state and XORs the data. > - Making some other small tweaks. > > On ARM Cortex-A7, these optimizations improve ChaCha20 performance from > about 12.08 cycles per byte to about 11.37 -- a 5.9% improvement. > > There is a tradeoff involved with the 'vtbl.8' rotation method since > there is at least one CPU (Cortex-A53) where it's not fastest. But it > seems to be a better default; see the added comment. Overall, this > patch reduces Cortex-A53 performance by less than 0.5%. > > Signed-off-by: Eric Biggers <ebiggers@xxxxxxxxxx> > --- > arch/arm/crypto/chacha20-neon-core.S | 277 ++++++++++++++------------- > 1 file changed, 143 insertions(+), 134 deletions(-) Patch applied. Thanks. -- Email: Herbert Xu <herbert@xxxxxxxxxxxxxxxxxxx> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt