On 09/04/20 16:13, Andrew Cooper wrote: > On 09/04/2020 13:47, Paolo Bonzini wrote: >> On 09/04/20 06:50, Andy Lutomirski wrote: >>> The small >>> (or maybe small) one is that any fancy protocol where the guest >>> returns from an exception by doing, logically: >>> >>> Hey I'm done; /* MOV somewhere, hypercall, MOV to CR4, whatever */ >>> IRET; >>> >>> is fundamentally racy. After we say we're done and before IRET, we >>> can be recursively reentered. Hi, NMI! >> That's possible in theory. In practice there would be only two levels >> of nesting, one for the original page being loaded and one for the tail >> of the #VE handler. The nested #VE would see IF=0, resolve the EPT >> violation synchronously and both handlers would finish. For the tail >> page to be swapped out again, leading to more nesting, the host's LRU >> must be seriously messed up. >> >> With IST it would be much messier, and I haven't quite understood why >> you believe the #VE handler should have an IST. > > Any interrupt/exception which can possibly occur between a SYSCALL and > re-establishing a kernel stack (several instructions), must be IST to > avoid taking said exception on a user stack and being a trivial > privilege escalation. Doh, of course. I always confuse SYSCALL and SYSENTER. > Therefore, it doesn't really matter if KVM's paravirt use of #VE does > respect the interrupt flag. It is not sensible to build a paravirt > interface using #VE who's safety depends on never turning on > hardware-induced #VE's. No, I think we wouldn't use a paravirt #VE at this point, we would use the real thing if available. It would still be possible to switch from the IST to the main kernel stack before writing 0 to the reentrancy word. Paolo