On 3/8/22 6:12 PM, Darren Tucker wrote:
On Wed, 9 Mar 2022 at 09:59, rapier <rapier@xxxxxxx> wrote:
I was poking at the MAC routines looking for some efficiencies for high
performance environments. I was looking at the umac.c and comparing it
to the original source at https://fastcrypto.org/front/umac/umac.c After
a couple of false starts I found that reverting the endian conversion
routines back to what Krovetz wrote realized a 8% to 16% improvement
Interesting! One obvious difference is what you have is potentially
inline-able static functions instead of function calls across
compilation units that (barring whole program optimization) can't be
inlined. If you put the existing functions from misc.c into umac.c as
statics do you see the same improvement?
That worked and I saw the same improvement. For a 20GB test (a dd pipe
with aes2560ctr) I'm seeing peaks at 870MB/s versus 720MB/s for stock.
So it does look like that its being inlined. I'm going to poke at a
couple more things and then provide an updated patch. I think I have a
big endian system around here somewhere so I want to test on that as well.
This is pleasing. Initially I was looking at improving performance by
pipelining the MAC but that's not possible with ETM. This is about the
level of performance gain I was hoping to get with that and it's a lot
easier.
Anyway, I'll get the new patch up soon.
Chris
_______________________________________________
openssh-unix-dev mailing list
openssh-unix-dev@xxxxxxxxxxx
https://lists.mindrot.org/mailman/listinfo/openssh-unix-dev