Re: [RFC PATCH 18/18] net: wireguard - switch to crypto API for packet encryption

Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> · Thu, 26 Sep 2019 19:06:08 -0700

On Thu, Sep 26, 2019 at 5:15 PM Pascal Van Leeuwen
<pvanleeuwen@xxxxxxxxxxxxxx> wrote:
>
> But even the CPU only thing may have several implementations, of which
> you want to select the fastest one supported by the _detected_ CPU
> features (i.e. SSE, AES-NI, AVX, AVX512, NEON, etc. etc.)
> Do you think this would still be efficient if that would be some
> large if-else tree? Also, such a fixed implementation wouldn't scale.

Just a note on this part.

Yes, with retpoline a large if-else tree is actually *way* better for
performance these days than even just one single indirect call. I
think the cross-over point is somewhere around 20 if-statements.

But those kinds of things also are things that we already handle well
with instruction rewriting, so they can actually have even less of an
overhead than a conditional branch. Using code like

  if (static_cpu_has(X86_FEATURE_AVX2))

actually ends up patching the code at run-time, so you end up having
just an unconditional branch. Exactly because CPU feature choices
often end up being in critical code-paths where you have
one-or-the-other kind of setup.

And yes, one of the big users of this is very much the crypto library code.

The code to do the above is disgusting, and when you look at the
generated code you see odd unreachable jumps and what looks like a
slow "bts" instruction that does the testing dynamically.

And then the kernel instruction stream gets rewritten fairly early
during the boot depending on the actual CPU capabilities, and the
dynamic tests get overwritten by a direct jump.

Admittedly I don't think the arm64 people go to quite those lengths,
but it certainly wouldn't be impossible there either.  It just takes a
bit of architecture knowledge and a strong stomach ;)

                 Linus