Hi Eric, On Wed, Dec 16, 2020 at 9:48 PM Eric Biggers <ebiggers@xxxxxxxxxx> wrote: > By the way, if people are interested in having my ARM scalar implementation of > BLAKE2s in the kernel too, I can send a patchset for that too. It just ended up > being slower than BLAKE2b and SHA-1, so it wasn't as good for the use case > mentioned above. If it were to be added as "blake2s-256-arm", we'd have: I'd certainly be interested in this. Any rough idea how it performs for pretty small messages compared to the generic implementation? 100-140 byte ranges? Is the speedup about the same as for longer messages because this doesn't parallelize across multiple blocks? Jason