On Sun, Jan 21, 2018 at 6:04 PM, Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: > On Sun, Jan 21, 2018 at 5:38 PM, Andy Lutomirski <luto@xxxxxxxxxx> wrote: >> >> 3. What's with sbb; and? I can see two sane ways to do this. One is >> cmovaq [something safe], %rax, > > Heh. I think it's partly about being old-fashioned. sbb has always > been around, and is the traditional trick for 0/-1. > > Also, my original suggested thing did the *access* too, and masked the > result with the same mask. > > But I guess we could use cmov instead. It has very similar performance > (ie it was relatively slow on P4, but so was sbb). > > However, I suspect it actually has a slightly higher register > pressure, since you'd need to have that zero register (zero being the > "safe" value), and the only good way to get a zero value is the xor > thing, which affects flags and thus needs to be before the cmp. > > In contrast, the sbb trick has no early inputs needed. > > So on the whole, 'cmov' may be more natural on a conceptual level, but > the sbb trick really is a very "traditional x86 thing" to do. Fair enough. That being said, what I *actually* want to do is to nuke this thing entirely. I just wrote a patch to turn off the SYSCALL64 fast path entirely when retpolines are on. Then this issue can be dealt with in C. I assume someone has a brilliant way to make gcc automatically do something intelligent about guarded array access in C. </snicker> Seriously, though, the retpolined fast path is actually slower than the slow path on a "minimal" retpoline kernel (which is what I'm using because Fedora hasn't pushed out a retpoline compiler yet), and I doubt it's more than the tiniest speedup on a full retpoline kernel. I've read a bunch of emails flying around saying that retpolines aren't that bad. In very informal benchmarking, a single mispredicted ret (which is what a retpoline is) seems to take 20-30 cycles on Skylake. That's pretty far from "not bad". Is IBRS somehow doing something that adversely affects code that doesn't use indirect branches? Because I'm having a bit of a hard time imagining IBRS hurting indirect branches worse than retpolines do.