> > > > > > And yes, the current code appears to suffer the same defect. > > > > That defect isn't going to be fixed simply by changing how KVM > > forwards NMIs though. IIUC, _everything_ between VM-Exit and the > > invocation of the NMI handler needs to be noinstr. On VM-Exit due to > > NMI, NMIs are blocked. If a #BP/#DB/#PF occurs before KVM gets to > > kvm_x86_handle_exit_irqoff(), the subsequent IRET will unblock NMIs > > before the original NMI is serviced, i.e. a second NMI could come in > > at anytime regardless of how KVM forwards the NMI to the kernel. > > > > Is there any way to solve this without tagging everything noinstr? > > There is a metric shit ton of code between VM-Exit and the handling of > > NMIs, and much of that code is common helpers. It might be possible > > to hoist NMI handler much earlier, though we'd need to do a super > > thorough audit to ensure all necessary host state is restored. > > As NMI is the only vector with this potential issue, it sounds a good idea to only > promote its handling. > Hi Peter/Sean, I prefer to move _everything_ between VM-Exit and the invocation of the NMI handler into the noinstr section in the next patch set, how do you think? Xin > > > > > > With FRED, ERETS/ERETU replace IRET, and use bit 28 of the > > > > > popped CS field to control whether to unblock NMI. If bit 28 of > > > > > the field (above the selector) is 1, ERETS/ERETU unblocks NMIs. > > > > Side topic, there's a bug in the ISE docs. Section "9.4.2 NMI > > Blocking" states that bit 16 holds the "unblock NMI" magic, which I'm > > guessing is a holdover from an earlier revision of FRED. > > Good catch, the latest 3.0 draft spec changed it to bit 28, but section 9.4.2 > didn't get a proper update. > > > > > As specified in Section 6.1.3 and Section 6.2.3, ERETS and ERETU > > each unblocks NMIs > > if bit 16 of the popped CS field is 1. The following items detail > > how this behavior may be > > changed in VMX non-root operation, depending on the settings of > > certain VM- execution > > controls: > > > > > > Yes, I know that. It is one of the many improvements FRED brings. > > > > Ideally the IBT WAIT-FOR-ENDBR state also gets squirreled away in > > > > the hardware exception frame, but that's still up in the air I > > > > believe :/ > > > > > > > > Anyway.. given there is interrupt priority and NMI is pretty much > > > > on top of everything else the reinject crap *should* run NMI first. > > > > That way NMI runs with the latch disabled and whatever other > > > > pending > > interrupts will run later. > > > > > > > > But that all is still broken because afaict the current code also > > > > leaves noinstr -- and once you leave noinstr (or use a static_key, > > > > static_call or anything else that > > > > *can* change at runtime) you can't guarantee nothing. > > > > > > For NMI, HPA asked me to use "int $2", as it switches to the NMI IST > > > stack to execute the NMI handler, essentially like how HW deals with > > > a NMI in host. and I tested it with NMI watchdog, it looks working fine. > > > > Heh, well yeah, because that's how KVM used to handle NMIs back before > > I reworked NMI handling to use the direct call method. Ironically, > > that original change was done in part to try and make it _easier_ to > > deal with FRED (back before FRED was publicly disclosed). > > > > If KVM reverts to INTn, the fix to route KVM=>NMI through the non-IST > > entry can be reverted too. > > > > a217a6593cec ("KVM/VMX: Invoke NMI non-IST entry instead of IST entry") > > 1a5488ef0dcf ("KVM: VMX: Invoke NMI handler via indirect call > > instead of > > INTn") > > Sure, I'm just thinking to put asm("int $2") in a new function exc_raise_nmi() > defined in arch/x86/kernel/traps.c. Thus we will need to change it only > whenever we have any better facility, and no KVM VMX change required.