On 11/11/2022 22:22, H. Peter Anvin wrote: > On November 11, 2022 8:35:30 AM PST, Andrew Cooper <Andrew.Cooper3@xxxxxxxxxx> wrote: >> On 11/11/2022 14:23, Peter Zijlstra wrote: >>> On Fri, Nov 11, 2022 at 01:48:26PM +0100, Paolo Bonzini wrote: >>>> On 11/11/22 13:19, Peter Zijlstra wrote: >>>>> On Fri, Nov 11, 2022 at 01:04:27PM +0100, Paolo Bonzini wrote: >>>>>> On Intel you can optionally make it hold onto IRQs, but NMIs are always >>>>>> eaten by the VMEXIT and have to be reinjected manually. >>>>> That 'optionally' thing worries me -- as in, KVM is currently >>>>> opting-out? >>>> Yes, because "If the “process posted interrupts” VM-execution control is 1, >>>> the “acknowledge interrupt on exit” VM-exit control is 1" (SDM 26.2.1.1, >>>> checks on VM-Execution Control Fields). Ipse dixit. Posted interrupts are >>>> available and used on all processors since I think Ivy Bridge. >> On server SKUs. Client only got "virtual interrupt processing" fairly >> recently IIRC, which is the CPU-side property which matters. >> >>> (imagine the non-coc compliant reaction here) >>> >>> So instead of fixing it, they made it worse :-( >>> >>> And now FRED is arguably making it worse again, and people wonder why I >>> hate virt... >> The only FRED-compatible fix is to send a self-NMI, because because you >> may need a CSL change too. >> >> VT-x *does* hold the NMI latch (for VMEXIT_REASON NMI), so it's self-NMI >> and then enable_nmi()s. >> >> Except the IRET to self won't work - it will need to be ERETS-to-self. >> Which I think is fine. >> >> But what isn't fine is the fact that a self-NMI doesn't deliver >> synchronously, so you need to wait until it is pending, before enabling >> NMIs. (Well, actually you need to ensure that it's definitely delivered >> before re-entering the VM). >> >> And I'm totally out of ideas here... >> >> ~Andrew >> > There is no fundamental reason to do a CSL/IST change if you happen to know a priori that the stack is in a valid state to have the NMI frame on it; that is: > > 1. Not deep into a nested I/O layer; > 2. Valid, and not in flux in any way. 3. The NMI handler doesn't depend on being run on the alternate stack. > Since this reinject will always be in a well-defined location, that's fine. > > So I think *that* concern is not actually an issue. > > Again, note that this is not a FRED-specific problem. Hmm yeah. On further consideration, I don't think FRED is relevant here (outside of a few minor details). The VMExit behaviour is simply that of the NMI handler but without an exception frame on the stack. The early asm is walking on egg-shells with respect to the NMI latch, just like the regular NMI handler is. Peter is correct that once you leave the VMExit handler's noinstr region, a plethora of things can re-enable NMIs behind your back. And this happening in practice will end up with you logically taking NMIs out of order. Whether this matters or not is a different question. Right now, NMI is "just" an edge triggered interrupt, but a theoretical future with NMI vectors might have some fun causality bugs to contend with. If the out-of-order NMIs isn't a major concern, then a self-NMI is the simple way to invoke the NMI handler in a context it can cope with. Otherwise, the VMExit handler's instr region has to do the handoff when it's in the same state that the NMI handler is expecting. ~Andrew