On Wed, Apr 08, 2020 at 08:01:36PM +0200, Thomas Gleixner wrote: > Paolo Bonzini <pbonzini@xxxxxxxxxx> writes: > > On 08/04/20 17:34, Sean Christopherson wrote: > >> On Wed, Apr 08, 2020 at 10:23:58AM +0200, Paolo Bonzini wrote: > >>> Page-not-present async page faults are almost a perfect match for the > >>> hardware use of #VE (and it might even be possible to let the processor > >>> deliver the exceptions). > >> > >> My "async" page fault knowledge is limited, but if the desired behavior is > >> to reflect a fault into the guest for select EPT Violations, then yes, > >> enabling EPT Violation #VEs in hardware is doable. The big gotcha is that > >> KVM needs to set the suppress #VE bit for all EPTEs when allocating a new > >> MMU page, otherwise not-present faults on zero-initialized EPTEs will get > >> reflected. > >> > >> Attached a patch that does the prep work in the MMU. The VMX usage would be: > >> > >> kvm_mmu_set_spte_init_value(VMX_EPT_SUPPRESS_VE_BIT); > >> > >> when EPT Violation #VEs are enabled. It's 64-bit only as it uses stosq to > >> initialize EPTEs. 32-bit could also be supported by doing memcpy() from > >> a static page. > > > > The complication is that (at least according to the current ABI) we > > would not want #VE to kick if the guest currently has IF=0 (and possibly > > CPL=0). But the ABI is not set in stone, and anyway the #VE protocol is > > a decent one and worth using as a base for whatever PV protocol we design. > > Forget the current pf async semantics (or the lack of). You really want > to start from scratch and igore the whole thing. > > The charm of #VE is that the hardware can inject it and it's not nesting > until the guest cleared the second word in the VE information area. If > that word is not 0 then you get a regular vmexit where you suspend the > vcpu until the nested problem is solved. So IIUC, only one process on a vcpu could affort to relinquish cpu to another task. If next task also triggers EPT violation, that will result in VM exit (as previous #VE is not complete yet) and vcpu will be halted. > > So you really don't worry about the guest CPU state at all. The guest > side #VE handler has to decide what it wants from the host depending on > it's internal state: > > - Suspend me and resume once the EPT fail is solved > > - Let me park the failing task and tell me once you resolved the > problem. > > That's pretty straight forward and avoids the whole nonsense which the > current mess contains. It completely avoids the allocation stuff as well > as you need to use a PV page where the guest copies the VE information > to. > > The notification that a problem has been resolved needs to go through a > separate vector which still has the IF=1 requirement obviously. How is this vector decided between guest and host. Failure to fault in page will be communicated through same vector? Thanks Vivek