On Sun, 2018-01-21 at 14:27 -0800, Linus Torvalds wrote: > On Sun, Jan 21, 2018 at 2:00 PM, David Woodhouse <dwmw2@xxxxxxxxxxxxx> wrote: > >> > >> The patches do things like add the garbage MSR writes to the kernel > >> entry/exit points. That's insane. That says "we're trying to protect > >> the kernel". We already have retpoline there, with less overhead. > > > > You're looking at IBRS usage, not IBPB. They are different things. > > Ehh. Odd intel naming detail. > > If you look at this series, it very much does that kernel entry/exit > stuff. It was patch 10/10, iirc. In fact, the patch I was replying to > was explicitly setting that garbage up. > > And I really don't want to see these garbage patches just mindlessly > sent around. I think we've covered the technical part of this now, not that you like it — not that any of us *like* it. But since the peanut gallery is paying lots of attention it's probably worth explaining it a little more for their benefit. This is all about Spectre variant 2, where the CPU can be tricked into mispredicting the target of an indirect branch. And I'm specifically looking at what we can do on *current* hardware, where we're limited to the hacks they can manage to add in the microcode. The new microcode from Intel and AMD adds three new features. One new feature (IBPB) is a complete barrier for branch prediction. After frobbing this, no branch targets learned earlier are going to be used. It's kind of expensive (order of magnitude ~4000 cycles). The second (STIBP) protects a hyperthread sibling from following branch predictions which were learned on another sibling. You *might* want this when running unrelated processes in userspace, for example. Or different VM guests running on HT siblings. The third feature (IBRS) is more complicated. It's designed to be set when you enter a more privileged execution mode (i.e. the kernel). It prevents branch targets learned in a less-privileged execution mode, BEFORE IT WAS MOST RECENTLY SET, from taking effect. But it's not just a 'set-and-forget' feature, it also has barrier-like semantics and needs to be set on *each* entry into the kernel (from userspace or a VM guest). It's *also* expensive. And a vile hack, but for a while it was the only option we had. Even with IBRS, the CPU cannot tell the difference between different userspace processes, and between different VM guests. So in addition to IBRS to protect the kernel, we need the full IBPB barrier on context switch and vmexit. And maybe STIBP while they're running. Then along came Paul with the cunning plan of "oh, indirect branches can be exploited? Screw it, let's not have any of *those* then", which is retpoline. And it's a *lot* faster than frobbing IBRS on every entry into the kernel. It's a massive performance win. So now we *mostly* don't need IBRS. We build with retpoline, use IBPB on context switches/vmexit (which is in the first part of this patch series before IBRS is added), and we're safe. We even refactored the patch series to put retpoline first. But wait, why did I say "mostly"? Well, not everyone has a retpoline compiler yet... but OK, screw them; they need to update. Then there's Skylake, and that generation of CPU cores. For complicated reasons they actually end up being vulnerable not just on indirect branches, but also on a 'ret' in some circumstances (such as 16+ CALLs in a deep chain). The IBRS solution, ugly though it is, did address that. Retpoline doesn't. There are patches being floated to detect and prevent deep stacks, and deal with some of the other special cases that bite on SKL, but those are icky too. And in fact IBRS performance isn't anywhere near as bad on this generation of CPUs as it is on earlier CPUs *anyway*, which makes it not quite so insane to *contemplate* using it as Intel proposed. That's why my initial idea, as implemented in this RFC patchset, was to stick with IBRS on Skylake, and use retpoline everywhere else. I'll give you "garbage patches", but they weren't being "just mindlessly sent around". If we're going to drop IBRS support and accept the caveats, then let's do it as a conscious decision having seen what it would look like, not just drop it quietly because poor Davey is too scared that Linus might shout at him again. :) I have seen *hand-wavy* analyses of the Skylake thing that mean I'm not actually lying awake at night fretting about it, but nothing concrete that really says it's OK. If you view retpoline as a performance optimisation, which is how it first arrived, then it's rather unconventional to say "well, it only opens a *little* bit of a security hole but it does go nice and fast so let's do it". But fine, I'm content with ditching the use of IBRS to protect the kernel, and I'm not even surprised. There's a *reason* we put it last in the series, as both the most contentious and most dispensable part. I'd be *happier* with a coherent analysis showing Skylake is still OK, but hey-ho, screw Skylake. The early part of the series adds the new feature bits and detects when it can turn KPTI off on non-Meltdown-vulnerable Intel CPUs, and also supports the IBPB barrier that we need to make retpoline complete. That much I think we definitely *do* want. There have been a bunch of us working on this behind the scenes; one of us will probably post that bit in the next day or so. I think we also want to expose IBRS to VM guests, even if we don't use it ourselves. Because Windows guests (and RHEL guests; yay!) do use it. If we can be done with the shouty part, I'd actually quite like to have a sensible discussion about when, if ever, we do IBPB on context switch (ptraceability and dumpable have both been suggested) and when, if ever, we set STIPB in userspace.
Attachment:
smime.p7s
Description: S/MIME cryptographic signature