On 09/04/2020 13:47, Paolo Bonzini wrote: > On 09/04/20 06:50, Andy Lutomirski wrote: >> The small >> (or maybe small) one is that any fancy protocol where the guest >> returns from an exception by doing, logically: >> >> Hey I'm done; /* MOV somewhere, hypercall, MOV to CR4, whatever */ >> IRET; >> >> is fundamentally racy. After we say we're done and before IRET, we >> can be recursively reentered. Hi, NMI! > That's possible in theory. In practice there would be only two levels > of nesting, one for the original page being loaded and one for the tail > of the #VE handler. The nested #VE would see IF=0, resolve the EPT > violation synchronously and both handlers would finish. For the tail > page to be swapped out again, leading to more nesting, the host's LRU > must be seriously messed up. > > With IST it would be much messier, and I haven't quite understood why > you believe the #VE handler should have an IST. Any interrupt/exception which can possibly occur between a SYSCALL and re-establishing a kernel stack (several instructions), must be IST to avoid taking said exception on a user stack and being a trivial privilege escalation. In terms of using #VE in its architecturally-expected way, this can occur in general before the kernel stack is established, so must be IST for safety. Therefore, it doesn't really matter if KVM's paravirt use of #VE does respect the interrupt flag. It is not sensible to build a paravirt interface using #VE who's safety depends on never turning on hardware-induced #VE's. ~Andrew