On Wed, Aug 14, 2024, Paolo Bonzini wrote: > On 8/9/24 21:03, Sean Christopherson wrote: > > @@ -6037,8 +6018,15 @@ static int kvm_mmu_write_protect_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, > > * execute the instruction. If no shadow pages were zapped, then the > > * write-fault is due to something else entirely, i.e. KVM needs to > > * emulate, as resuming the guest will put it into an infinite loop. > > + * > > + * For indirect MMUs, i.e. if KVM is shadowing the current MMU, try to > > + * unprotect the gfn and retry if an event is awaiting reinjection. If > > + * KVM emulates multiple instructions before completing even injection, > > + * the event could be delayed beyond what is architecturally allowed, > > + * e.g. KVM could inject an IRQ after the TPR has been raised. > > This paragraph should go before the description of > kvm_mmu_unprotect_gfn_and_retry: Hmm, I disagree. The comment ends up being disconnected from the code, especially by the end of the series. E.g. when reading kvm_mmu_write_protect_fault(), someone would have to jump twice (to kvm_mmu_unprotect_gfn_and_retry and then __kvm_mmu_unprotect_gfn_and_retry()) in order to understand the checks buried in kvm_mmu_write_protect_fault(). And the comment also becomes stale when kvm_mmu_unprotect_gfn_and_retry() is used by x86_emulate_instruction(). That's obviously solvable by extending the function comment, but then we end up with a rather massive function comment that documents its callers, not the function itself. > > * There are two cases in which we try to unprotect the page here > * preemptively, i.e. zap any shadow pages, before emulating the > * instruction. > * > * First, the access may be due to L1 accessing nested NPT/EPT entries > * used for L2, i.e. if the gfn being written is for gPTEs that KVM is > * shadowing and has write-protected. In this case, because AMD CPUs > * walk nested page table using a write operation, walking NPT entries > * in L1 can trigger write faults even when L1 isn't modifying PTEs. > * KVM would then emulate an excessive number of L1 instructions > * without triggering KVM's write-flooding detection, i.e. without > * unprotecting the gfn. This is detected as a RO violation while > * translating the guest page when the current MMU is direct. > * > * Second, for indirect MMUs, i.e. if KVM is shadowing the current MMU, > * unprotect the gfn and reenter the guest if an event is awaiting > * reinjection. If KVM emulates multiple instructions before completing > * event injection, the event could be delayed beyond what is > * architecturally allowed, e.g. KVM could inject an IRQ after the > * TPR has been raised. > * > * In both cases, if one or more shadow pages were zapped, skip > * emulation and resume L1 to let it natively execute the instruction. > * If no shadow pages were zapped, then the write-fault is due to > * something else entirely and KVM needs to emulate, as resuming > * the guest will put it into an infinite loop. > > Thanks, > > Paolo > > > */ > > - if (direct && (is_write_to_guest_page_table(error_code)) && > > + if (((direct && is_write_to_guest_page_table(error_code)) || > > + (!direct && kvm_event_needs_reinjection(vcpu))) && > > kvm_mmu_unprotect_gfn_and_retry(vcpu, cr2_or_gpa)) > > return RET_PF_FIXED; >