On Sat, Jan 13, 2018 at 11:33 AM, Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: > On Sat, Jan 13, 2018 at 11:05 AM, Linus Torvalds > <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: >> >> I _know_ that lfence is expensive as hell on P4, for example. >> >> Yes, yes, "sbb" is often more expensive than most ALU instructions, >> and Agner Fog says it has a 10-cycle latency on Prescott (which is >> outrageous, but being one or two cycles more due to the flags >> generation is normal). So the sbb/and may certainly add a few cycles >> to the critical path, but on Prescott "lfence" is *50* cycles >> according to those same tables by Agner Fog. > > Side note: I don't think P4 is really relevant for a performance > discussion, I was just giving it as an example where we do know actual > cycles. > > I'm much more interested in modern Intel big-core CPU's, and just > wondering whether somebody could ask an architect. > > Because I _suspect_ the answer from a CPU architect would be: "Christ, > the sbb/and sequence is much better because it doesn't have any extra > serialization", but maybe I'm wrong, and people feel that lfence is > particularly easy to do right without any real downside. > >From the last paragraph of this guidance: https://software.intel.com/sites/default/files/managed/c5/63/336996-Speculative-Execution-Side-Channel-Mitigations.pdf ...I read that as Linux can constrain speculation with 'and; sbb' instead of 'lfence', and any future changes will be handled like any new cpu enabling. To your specific question of the relative costs, sbb is architecturally cheaper, so let's go with that approach. For this '__uaccess_begin_nospec' patch set, at a minimum the kernel needs a helper that can be easily grep'd when/if it needs changing in a future kernel. It also indicates that the command line approach to dynamically switch the mitigation mechanism is over-engineering. That said, for get_user specifically, can we do something even cheaper. Dave H. reminds me that any valid user pointer that gets past the address limit check will have the high bit clear. So instead of calculating a mask, just unconditionally clear the high bit. It seems worse case userspace can speculatively leak something that's already in its address space. I'll respin this set along those lines, and drop the ifence bits.