Re: [PATCH v1 2/3] zinc: Introduce minimal cryptography library

"Jason A. Donenfeld" <Jason@xxxxxxxxx> · Wed, 15 Aug 2018 23:31:11 -0700

Hi Eric,

On Tue, Aug 14, 2018 at 2:12 PM Eric Biggers <ebiggers@xxxxxxxxxx> wrote:
> On ARM Cortex-A7, OpenSSL's ChaCha20 implementation is 13.9 cpb (cycles per
> byte), whereas Linux's is faster: 11.9 cpb.
>
> The reason Linux's ChaCha20 NEON implementation is faster than OpenSSL's
>
> I understand there are tradeoffs, and different implementations can be faster on
> different CPUs.
>
> So if your proposal goes in, I'd likely need to write a patch
> to get the old performance back, at least on Cortex-A7...

Yes, absolutely. Different CPUs behave differently indeed, but if you
have improvements for hardware that matters to you, we should
certainly incorporate these, and also loop Andy Polyakov in (I've
added him to the CC for the WIP v2). ChaCha is generally pretty
obvious, but for big integer algorithms -- like Poly1305 and
Curve25519 -- I think it's all the more important to involve Andy and
the rest of the world in general, so that Linux benefits from bug
research and fuzzing in places that are typically and classically
prone to nasty issues. In other words, let's definitely incorporate
your improvements after the patchset goes in, and at the same time
we'll try to bring Andy and others into the fold, where our
improvements can generally track each others.

> Also, I don't know whether Andy P. considered the 4xNEON implementation
> technique.  It could even be fastest on other ARM CPUs too, I don't know.

After v2, when he's CC'd in, let's plan to start discussing this with him.

Jason