On Wed, Aug 14, 2024 at 07:21:06AM -0700, Sean Christopherson wrote: > On Wed, Aug 14, 2024, Yuan Yao wrote: > > > @@ -5960,6 +5961,41 @@ void kvm_mmu_track_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new, > > > write_unlock(&vcpu->kvm->mmu_lock); > > > } > > > > > > +static int kvm_mmu_write_protect_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, > > > + u64 error_code, int *emulation_type) > > > +{ > > > + bool direct = vcpu->arch.mmu->root_role.direct; > > > + > > > + /* > > > + * Before emulating the instruction, check if the error code > > > + * was due to a RO violation while translating the guest page. > > > + * This can occur when using nested virtualization with nested > > > + * paging in both guests. If true, we simply unprotect the page > > > + * and resume the guest. > > > + */ > > > + if (direct && > > > + (error_code & PFERR_NESTED_GUEST_PAGE) == PFERR_NESTED_GUEST_PAGE) { > > > + kvm_mmu_unprotect_page(vcpu->kvm, gpa_to_gfn(cr2_or_gpa)); > > > + return RET_PF_FIXED; > > > + } > > > + > > > + /* > > > + * The gfn is write-protected, but if emulation fails we can still > > > + * optimistically try to just unprotect the page and let the processor > > > + * re-execute the instruction that caused the page fault. Do not allow > > > + * retrying MMIO emulation, as it's not only pointless but could also > > > + * cause us to enter an infinite loop because the processor will keep > > > + * faulting on the non-existent MMIO address. Retrying an instruction > > > + * from a nested guest is also pointless and dangerous as we are only > > > + * explicitly shadowing L1's page tables, i.e. unprotecting something > > > + * for L1 isn't going to magically fix whatever issue cause L2 to fail. > > > + */ > > > + if (!mmio_info_in_cache(vcpu, cr2_or_gpa, direct) && !is_guest_mode(vcpu)) > > > > Looks the mmio_info_in_cache() checking can be removed, > > emulation should not come here with RET_PF_WRITE_PROTECTED > > introduced, may WARN_ON_ONCE() it. > > Yeah, that was my instinct as well. I kept it mostly because I liked having the > comment, but also because I was thinking the cache could theoretically get a hit. > But that's not true. KVM should return RET_PF_WRITE_PROTECTED if and only if > there is a memslot, and creating a memslot is supposed to invalidate the MMIO > cache by virtue of changing the memslot generation. > > Unless someone feels strongly that the mmio_info_in_cache() call should be > deleted entirely, I'll tack on this patch. The cache lookup is cheap, and IMO > it's helpful to explicitly document that it should be impossible to reach this > point with what appears to be MMIO. > > --- > arch/x86/kvm/mmu/mmu.c | 28 +++++++++++++++++++--------- > 1 file changed, 19 insertions(+), 9 deletions(-) > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c > index 50695eb2ee22..7f3f57237f23 100644 > --- a/arch/x86/kvm/mmu/mmu.c > +++ b/arch/x86/kvm/mmu/mmu.c > @@ -5997,6 +5997,18 @@ static int kvm_mmu_write_protect_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, > vcpu->arch.last_retry_eip = 0; > vcpu->arch.last_retry_addr = 0; > > + /* > + * It should be impossible to reach this point with an MMIO cache hit, > + * as RET_PF_WRITE_PROTECTED is returned if and only if there's a valid, > + * writable memslot, and creating a memslot should invalidate the MMIO > + * cache by way of changing the memslot generation. WARN and disallow > + * retry if MMIO is detect, as retrying MMIO emulation is pointless and > + * could put the vCPU into an infinite loop because the processor will > + * keep faulting on the non-existent MMIO address. > + */ > + if (WARN_ON_ONCE(mmio_info_in_cache(vcpu, cr2_or_gpa, direct))) > + return RET_PF_EMULATE; > + LGTM. Reviewed-by: Yuan Yao <yuan.yao@xxxxxxxxx> > /* > * Before emulating the instruction, check to see if the access may be > * due to L1 accessing nested NPT/EPT entries used for L2, i.e. if the > @@ -6029,17 +6041,15 @@ static int kvm_mmu_write_protect_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, > return RET_PF_FIXED; > > /* > - * The gfn is write-protected, but if emulation fails we can still > - * optimistically try to just unprotect the page and let the processor > + * The gfn is write-protected, but if KVM detects its emulating an > + * instruction that is unlikely to be used to modify page tables, or if > + * emulation fails, KVM can try to unprotect the gfn and let the CPU > * re-execute the instruction that caused the page fault. Do not allow > - * retrying MMIO emulation, as it's not only pointless but could also > - * cause us to enter an infinite loop because the processor will keep > - * faulting on the non-existent MMIO address. Retrying an instruction > - * from a nested guest is also pointless and dangerous as we are only > - * explicitly shadowing L1's page tables, i.e. unprotecting something > - * for L1 isn't going to magically fix whatever issue cause L2 to fail. > + * retrying an instruction from a nested guest as KVM is only explicitly > + * shadowing L1's page tables, i.e. unprotecting something for L1 isn't > + * going to magically fix whatever issue cause L2 to fail. > */ > - if (!mmio_info_in_cache(vcpu, cr2_or_gpa, direct) && !is_guest_mode(vcpu)) > + if (!is_guest_mode(vcpu)) > *emulation_type |= EMULTYPE_ALLOW_RETRY_PF; > > return RET_PF_EMULATE; > > base-commit: 7d33880356496eb0640c6c825cc60898063c4902 > --