On Tue, Apr 30, 2019 at 6:56 AM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote: > > On Mon, Apr 29, 2019 at 01:07:33PM -0700, Linus Torvalds wrote: > > > > We still have that sti sysexit in the 32-bit code. > > We also have both: "STI; HLT" and "STI; MWAIT" where we rely on the STI > shadow. I guess the good news is that in all cases we really only ever protect against a very unlikely race, and if the race happens it's not actually fatal. Yes, if we get an NMI and then an interrupt in between the "st;hlt" we might wait for the next interrupt and get a (potentially fairly horrible) latency issue. I guess that with maximal luck it might be a one-shot timer and not get re-armed, but it sounds very very very unlikely. Googling around, I actually find a patch from Avi Kivity from back in 2010 for this exact issue, apparently because kvm got this case wrong and somebody hit it. The patch never made it upstream exactly because kvm could be fixed and people decided that most real hardware didn't have the issue in the first place. In the discussion I found, Peter Anvin tried to get confirmation from AMD engineers about this too, but I don't see any resolution. Realistically, I don't think you can hit the problem in practice. The only way to hit that incredibly small race of "one instruction, *both* NMI and interrupts" is to have a lot of interrupts going all at the same time, but that will also then solve the latency problem, so the very act of triggering it will also fix it. I don't see any case where it's really bad. The "sti sysexit" race is similar, just about latency of user space signal reporting (and perhaps any pending TIF_WORK_xyz flags). So maybe we don't care deeply about the sti shadow. It's a potential latecy problem when broken, but not a huge issue. And for the instruction rewriting hack, moving to "push+sti+ret" also makes a lost sti shadow just a "possibly odd stack frame visibility" issue rather than anything deeply fatal. We can probably just write it off as "some old CPU's (and a smattering or very rare and not relevant new ones) have potential but unlikely latency issues because of a historical CPU mis-design - don't do perf on them". Linus