Re: [PATCH 0/5] crypto: add NEON-optimized BLAKE2b

Eric Biggers <ebiggers@xxxxxxxxxx> · Wed, 16 Dec 2020 19:54:18 -0800

On Wed, Dec 16, 2020 at 11:32:44PM +0100, Jason A. Donenfeld wrote:
> Hi Eric,
> 
> On Wed, Dec 16, 2020 at 9:48 PM Eric Biggers <ebiggers@xxxxxxxxxx> wrote:
> > By the way, if people are interested in having my ARM scalar implementation of
> > BLAKE2s in the kernel too, I can send a patchset for that too.  It just ended up
> > being slower than BLAKE2b and SHA-1, so it wasn't as good for the use case
> > mentioned above.  If it were to be added as "blake2s-256-arm", we'd have:
> 
> I'd certainly be interested in this. Any rough idea how it performs
> for pretty small messages compared to the generic implementation?
> 100-140 byte ranges? Is the speedup about the same as for longer
> messages because this doesn't parallelize across multiple blocks?
> 

It does one block at a time, and there isn't much overhead, so yes the speedup
on short messages should be about the same as on long messages.

I did a couple quick userspace benchmarks and got (still on Cortex-A7):

	100-byte messages:
		BLAKE2s ARM:     28.9 cpb
		BLAKE2s generic: 42.4 cpb

	140-byte messages:
		BLAKE2s ARM:     29.5 cpb
		BLAKE2s generic: 44.0 cpb

The results in the kernel may differ a bit, but probably not by much.

- Eric