Re: [PATCH] crypto: arm/chacha-neon - optimize for non-block size multiples

"Jason A. Donenfeld" <Jason@xxxxxxxxx> · Mon, 2 Nov 2020 01:30:25 +0100

Cool patch! I look forward to getting out the old arm32 rig and
benching this. One question:

On Sun, Nov 1, 2020 at 5:33 PM Ard Biesheuvel <ardb@xxxxxxxxxx> wrote:
> On out-of-order microarchitectures such as Cortex-A57, this results in
> a speedup for 1420 byte blocks of about 21%, without any signficant
> performance impact of the power-of-2 block sizes. On lower end cores
> such as Cortex-A53, the speedup for 1420 byte blocks is only about 2%,
> but also without impacting other input sizes.

A57 and A53 are 64-bit, but this is code for 32-bit arm, right? So the
comparison is more like A15 vs A5? Or are you running 32-bit kernels
on armv8 hardware?