> -----Original Message----- > From: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> > Sent: Friday, September 27, 2019 4:06 AM > To: Pascal Van Leeuwen <pvanleeuwen@xxxxxxxxxxxxxx> > Cc: Ard Biesheuvel <ard.biesheuvel@xxxxxxxxxx>; Linux Crypto Mailing List <linux- > crypto@xxxxxxxxxxxxxxx>; Linux ARM <linux-arm-kernel@xxxxxxxxxxxxxxxxxxx>; Herbert Xu > <herbert@xxxxxxxxxxxxxxxxxxx>; David Miller <davem@xxxxxxxxxxxxx>; Greg KH > <gregkh@xxxxxxxxxxxxxxxxxxx>; Jason A . Donenfeld <Jason@xxxxxxxxx>; Samuel Neves > <sneves@xxxxxxxxx>; Dan Carpenter <dan.carpenter@xxxxxxxxxx>; Arnd Bergmann > <arnd@xxxxxxxx>; Eric Biggers <ebiggers@xxxxxxxxxx>; Andy Lutomirski <luto@xxxxxxxxxx>; > Will Deacon <will@xxxxxxxxxx>; Marc Zyngier <maz@xxxxxxxxxx>; Catalin Marinas > <catalin.marinas@xxxxxxx> > Subject: Re: [RFC PATCH 18/18] net: wireguard - switch to crypto API for packet > encryption > > On Thu, Sep 26, 2019 at 5:15 PM Pascal Van Leeuwen > <pvanleeuwen@xxxxxxxxxxxxxx> wrote: > > > > But even the CPU only thing may have several implementations, of which > > you want to select the fastest one supported by the _detected_ CPU > > features (i.e. SSE, AES-NI, AVX, AVX512, NEON, etc. etc.) > > Do you think this would still be efficient if that would be some > > large if-else tree? Also, such a fixed implementation wouldn't scale. > > Just a note on this part. > > Yes, with retpoline a large if-else tree is actually *way* better for > performance these days than even just one single indirect call. I > think the cross-over point is somewhere around 20 if-statements. > Yikes, that is just _horrible_ :-( _However_ there's many CPU architectures out there that _don't_ need the retpoline mitigation and would be unfairly penalized by the deep if-else tree (as opposed to the indirect branch) for a problem they did not cause in the first place. Wouldn't it be more fair to impose the penalty on the CPU's actually _causing_ this problem? Also because those are generally the more powerful CPU's anyway, that would suffer the least from the overhead? > But those kinds of things also are things that we already handle well > with instruction rewriting, so they can actually have even less of an > overhead than a conditional branch. Using code like > > if (static_cpu_has(X86_FEATURE_AVX2)) > > actually ends up patching the code at run-time, so you end up having > just an unconditional branch. Exactly because CPU feature choices > often end up being in critical code-paths where you have > one-or-the-other kind of setup. > > And yes, one of the big users of this is very much the crypto library code. > Ok, I didn't know about that. So I suppose we could have something like if (static_soc_has(HW_CRYPTO_ACCELERATOR_XYZ)) ... Hmmm ... > The code to do the above is disgusting, and when you look at the > generated code you see odd unreachable jumps and what looks like a > slow "bts" instruction that does the testing dynamically. > > And then the kernel instruction stream gets rewritten fairly early > during the boot depending on the actual CPU capabilities, and the > dynamic tests get overwritten by a direct jump. > > Admittedly I don't think the arm64 people go to quite those lengths, > but it certainly wouldn't be impossible there either. It just takes a > bit of architecture knowledge and a strong stomach ;) > > Linus Regards, Pascal van Leeuwen Silicon IP Architect, Multi-Protocol Engines @ Verimatrix www.insidesecure.com