On Thu, Dec 12, 2024, Ivan Orlov wrote: > On Wed, Dec 11, 2024 at 05:01:07PM -0800, Sean Christopherson wrote: > > > Hm, by the way, what is the desired behaviour if EMULTYPE_ALLOW_RETRY_PF is > > > set? Is it correct that we return an internal error if it is set during > > > vectoring? Or KVM may try to unprotect the page and re-execute? > > > > Heh, it's sneaky, but EMULTYPE_ALLOW_RETRY_PF can be set if and only if > > RET_PF_WRITE_PROTECTED is set. Hmm, that makes me think we should do the below > > (EMULTYPE_WRITE_PF_TO_SP was a recent addition). > > > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > > index 2e713480933a..de5f6985d123 100644 > > --- a/arch/x86/kvm/x86.c > > +++ b/arch/x86/kvm/x86.c > > @@ -9077,7 +9077,7 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, > > > > if ((emulation_type & EMULTYPE_ALLOW_RETRY_PF) && > > (WARN_ON_ONCE(is_guest_mode(vcpu)) || > > - WARN_ON_ONCE(!(emulation_type & EMULTYPE_PF)))) > > + WARN_ON_ONCE(!(emulation_type & EMULTYPE_WRITE_PF_TO_SP)))) > > emulation_type &= ~EMULTYPE_ALLOW_RETRY_PF; > > > > What if we are handling a write to write-protected page, but not a write to > the page table? We still can retry after unprotecting the page, so > EMULTYPE_ALLOW_RETRY_PF should be enabled, right? Gah, I got my enums mixed up. I conflated RET_PF_WRITE_PROTECTED with EMULTYPE_WRITE_PF_TO_SP. Ignore the above. FWIW, KVM _can't_ unprotect and retry in the EMULTYPE_WRITE_PF_TO_SP case. From kvm_unprotect_and_retry_on_failure(): /* * If the failed instruction faulted on an access to page tables that * are used to translate any part of the instruction, KVM can't resolve * the issue by unprotecting the gfn, as zapping the shadow page will * result in the instruction taking a !PRESENT page fault and thus put * the vCPU into an infinite loop of page faults. E.g. KVM will create * a SPTE and write-protect the gfn to resolve the !PRESENT fault, and * then zap the SPTE to unprotect the gfn, and then do it all over * again. Report the error to userspace. */ if (emulation_type & EMULTYPE_WRITE_PF_TO_SP) return false; > > r = kvm_check_emulate_insn(vcpu, emulation_type, insn, insn_len); > > > > That said, let me get back to you on this when my brain is less tired. I'm not > > sure emulating when an exit occurred during event delivery is _ever_ correct. > > > > I believe we can re-execute the instruction if exit happened during vectoring > due to exception (and if the address is not MMIO, e.g. when it's a write to > write-protected page, for instance when stack points to it). Unprotect and re-execute is fine, what I'm worried about is *successfully* emulating the instruction. E.g. 1. CPU executes instruction X and hits a #GP. 2. While vectoring the #GP, a shadow #PF is taken. 3. On VM-Exit, KVM re-injects the #GP (see __vmx_complete_interrupts()). 4. KVM emulates because of the write-protected page. 5. KVM "successfully" emulates and also detects the #GP 6. KVM synthesizes a #GP, and because the vCPU already has injected #GP, incorrectly escalates to a #DF. The above is a bit contrived, but I think it could happen if the guest reused a page that _was_ a page table, for a vCPU's kernel stack. > KVM unprotects the page, executes the instruction one more time and > (probably) gets this exception once again (but the page is already > unprotected, so vectoring succeeds without vmexit). If not > (e.g. exception conditions are not met anymore), guest shouldn't really > care and it can continue execution. > > However, I'm not sure what happens if vectoring is caused by external > interrupt: if we unprotect the page and re-execute the instruction, > will IRQ be delivered nonetheless, or it will be lost as irq is > already in ISR? Do we need to re-inject it in such a case? In all cases, the event that was being vectored is re-injected. Restarting from scratch would be a bug. E.g. if the cause of initial exception was "fixed", say because the initial exception was #BP, and the guest finished patching out the INT3, then restarting would execute the _new_ instruction, and the INT3 would be lost. In most cases, the guest would never notice, but it's still undesriable for KVM to effectively rewrite history. As far as unprotect+retry being viable, I think we're on the same page. What I'm getting at is that I think KVM should never allow emulating on #PF when the #PF occurred while vectoring. E.g. this: static inline bool kvm_can_emulate_event_vectoring(int emul_type) { return !(emul_type & EMULTYPE_PF); } and then I believe this? Where this diff can be a separate prep patch (though I'm pretty sure it's technically pointless without the vectoring angle, because shadow #PF can't coincide with any of the failure paths for kvm_check_emulate_insn()). diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 07c6f1d5323d..63361b2da450 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -9107,6 +9107,10 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, if (r == X86EMUL_RETRY_INSTR || r == X86EMUL_PROPAGATE_FAULT) return 1; + if (kvm_unprotect_and_retry_on_failure(vcpu, cr2_or_gpa, + emulation_type)) + return 1; + if (r == X86EMUL_UNHANDLEABLE_VECTORING_IO) { kvm_prepare_event_vectoring_exit(vcpu, cr2_or_gpa); return 0;