> > > > > But what about NMIs, afaict this is all horribly broken for NMIs. > > > > > > > > > > So the whole VMX thing latches the NMI (which stops NMI > > > > > recursion), > > > right? > > > > > > > > > > But then you drop out of noinstr code, which means any random > > > > > exception can happen (kprobes #BP, hw_breakpoint #DB, or even > > > > > #PF due to random nonsense like *SAN). This exception will do > > > > > IRET and clear the NMI latch, all before you get to run any of the NMI > code. > > > > > > > > What you said here implies that we have this problem in the existing > code. > > > > Because a fake iret stack is created to call the NMI handler in > > > > the IDT NMI descriptor, which lastly executes the IRET instruction. > > > > > > I can't follow; of course the IDT handler terminates with IRET, it has to no? > > > > With FRED, ERETS/ERETU replace IRET, and use bit 28 of the popped CS > > field to control whether to unblock NMI. If bit 28 of the field (above > > the selector) is 1, ERETS/ERETU unblocks NMIs. > > Yes, I know that. It is one of the many improvements FRED brings. > Ideally the IBT WAIT-FOR-ENDBR state also gets squirreled away in the > hardware exception frame, but that's still up in the air I believe :/ > > Anyway.. given there is interrupt priority and NMI is pretty much on top of > everything else the reinject crap *should* run NMI first. That way NMI runs > with the latch disabled and whatever other pending interrupts will run later. > > But that all is still broken because afaict the current code also leaves noinstr -- > and once you leave noinstr (or use a static_key, static_call or anything else that > *can* change at runtime) you can't guarantee nothing. For NMI, HPA asked me to use "int $2", as it switches to the NMI IST stack to execute the NMI handler, essentially like how HW deals with a NMI in host. and I tested it with NMI watchdog, it looks working fine. For IRQs, we still use the dispatch table, but with a new func added in DEFINE_IDTENTRY_SYSVEC with the noinstr entry/exit code removed: #define DEFINE_IDTENTRY_SYSVEC(func) \ static void __##func(struct pt_regs *regs); \ \ __visible noinstr void func(struct pt_regs *regs) \ { \ irqentry_state_t state = irqentry_enter(regs); \ \ instrumentation_begin(); \ kvm_set_cpu_l1tf_flush_l1d(); \ run_sysvec_on_irqstack_cond(__##func, regs); \ instrumentation_end(); \ irqentry_exit(regs, state); \ } \ \ +void dispatch_table_##func(struct pt_regs *regs) \ +{ \ + kvm_set_cpu_l1tf_flush_l1d(); \ + run_sysvec_on_irqstack_cond(__##func, regs); \ +} + \ \ static noinline void __##func(struct pt_regs *regs) How do you think?