On 09/04/2020 05:50, Andy Lutomirski wrote: > On Wed, Apr 8, 2020 at 11:01 AM Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote: >> Paolo Bonzini <pbonzini@xxxxxxxxxx> writes: >>> On 08/04/20 17:34, Sean Christopherson wrote: >>>> On Wed, Apr 08, 2020 at 10:23:58AM +0200, Paolo Bonzini wrote: >>>>> Page-not-present async page faults are almost a perfect match for the >>>>> hardware use of #VE (and it might even be possible to let the processor >>>>> deliver the exceptions). >>>> My "async" page fault knowledge is limited, but if the desired behavior is >>>> to reflect a fault into the guest for select EPT Violations, then yes, >>>> enabling EPT Violation #VEs in hardware is doable. The big gotcha is that >>>> KVM needs to set the suppress #VE bit for all EPTEs when allocating a new >>>> MMU page, otherwise not-present faults on zero-initialized EPTEs will get >>>> reflected. >>>> >>>> Attached a patch that does the prep work in the MMU. The VMX usage would be: >>>> >>>> kvm_mmu_set_spte_init_value(VMX_EPT_SUPPRESS_VE_BIT); >>>> >>>> when EPT Violation #VEs are enabled. It's 64-bit only as it uses stosq to >>>> initialize EPTEs. 32-bit could also be supported by doing memcpy() from >>>> a static page. >>> The complication is that (at least according to the current ABI) we >>> would not want #VE to kick if the guest currently has IF=0 (and possibly >>> CPL=0). But the ABI is not set in stone, and anyway the #VE protocol is >>> a decent one and worth using as a base for whatever PV protocol we design. >> Forget the current pf async semantics (or the lack of). You really want >> to start from scratch and igore the whole thing. >> >> The charm of #VE is that the hardware can inject it and it's not nesting >> until the guest cleared the second word in the VE information area. If >> that word is not 0 then you get a regular vmexit where you suspend the >> vcpu until the nested problem is solved. > Can you point me at where the SDM says this? Vol3 25.5.6.1 Convertible EPT Violations > Anyway, I see two problems with #VE, one big and one small. The small > (or maybe small) one is that any fancy protocol where the guest > returns from an exception by doing, logically: > > Hey I'm done; /* MOV somewhere, hypercall, MOV to CR4, whatever */ > IRET; > > is fundamentally racy. After we say we're done and before IRET, we > can be recursively reentered. Hi, NMI! Correct. There is no way to atomically end the #VE handler. (This causes "fun" even when using #VE for its intended purpose.) ~Andrew