On Thu, Jan 13, 2022, Liam Merwick wrote: > On Fri, Apr 17, 2020 at 12:38:42PM -0400, Paolo Bonzini wrote: > > When a nested page fault is taken from an address that does not have > > a memslot associated to it, kvm_mmu_do_page_fault returns RET_PF_EMULATE > > (via mmu_set_spte) and kvm_mmu_page_fault then invokes > svm_need_emulation_on_page_fault. > > > > The default answer there is to return false, but in this case this just > > causes the page fault to be retried ad libitum. Since this is not a > > fast path, and the only other case where it is taken is an erratum, > > just stick a kvm_vcpu_gfn_to_memslot check in there to detect the > > common case where the erratum is not happening. > > > > This fixes an infinite loop in the new set_memory_region_test. > > > > Fixes: 05d5a4863525 ("KVM: SVM: Workaround errata#1096 (insn_len maybe > zero on SMAP violation)") > > Cc: stable@xxxxxxxxxxxxxxx > > Signed-off-by: Paolo Bonzini <pbonzini@xxxxxxxxxx> > > --- > > arch/x86/kvm/svm/svm.c | 7 +++++++ > > virt/kvm/kvm_main.c | 1 + > > 2 files changed, 8 insertions(+) > > > > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c > > index a91e397d6750..c86f7278509b 100644 > > --- a/arch/x86/kvm/svm/svm.c > > +++ b/arch/x86/kvm/svm/svm.c > > @@ -3837,6 +3837,13 @@ static bool svm_need_emulation_on_page_fault(struct > kvm_vcpu *vcpu) > > bool smap = cr4 & X86_CR4_SMAP; > > bool is_user = svm_get_cpl(vcpu) == 3; > > > > + /* > > + * If RIP is invalid, go ahead with emulation which will cause an > > + * internal error exit. > > + */ > > + if (!kvm_vcpu_gfn_to_memslot(vcpu, kvm_rip_read(vcpu) >> PAGE_SHIFT)) > > When looking into an SEV issue it was noted that the second arg to > kvm_vcpu_gfn_to_memslot() is a gfn_t but kvm_rip_read() will return guest > RIP which is a guest virtual address and memslots hold guest physical > addresses. How is KVM supposed to translate it to a memslot > and indicate if the guest RIP is valid? Ugh, magic? That code is complete garbage. It worked to fix the selftest issue because the selftest identity maps the relevant guest code. The entire idea is a hack. If KVM gets into an infinite loop because the guest is attempting to fetch from MMIO, then the #NPF/#PF should have the FETCH bit set in the error code. I.e. I believe the below change should fix the original issue, at which point we can revert the above. I'll test today and hopefully get a patch sent out. diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index c3d9006478a4..e1d2a46e06bf 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -1995,6 +1995,17 @@ static void svm_set_dr7(struct kvm_vcpu *vcpu, unsigned long value) vmcb_mark_dirty(svm->vmcb, VMCB_DR); } +static char *svm_get_pf_insn_bytes(struct vcpu_svm *svm) +{ + if (!static_cpu_has(X86_FEATURE_DECODEASSISTS)) + return NULL; + + if (svm->vmcb->control.exit_info_1 & PFERR_FETCH_MASK) + return NULL; + + return svm->vmcb->control.insn_bytes; +} + static int pf_interception(struct kvm_vcpu *vcpu) { struct vcpu_svm *svm = to_svm(vcpu); @@ -2003,9 +2014,8 @@ static int pf_interception(struct kvm_vcpu *vcpu) u64 error_code = svm->vmcb->control.exit_info_1; return kvm_handle_page_fault(vcpu, error_code, fault_address, - static_cpu_has(X86_FEATURE_DECODEASSISTS) ? - svm->vmcb->control.insn_bytes : NULL, - svm->vmcb->control.insn_len); + svm_get_pf_insn_bytes(svm), + svm->vmcb->control.insn_len); } static int npf_interception(struct kvm_vcpu *vcpu) @@ -2017,9 +2027,8 @@ static int npf_interception(struct kvm_vcpu *vcpu) trace_kvm_page_fault(fault_address, error_code); return kvm_mmu_page_fault(vcpu, fault_address, error_code, - static_cpu_has(X86_FEATURE_DECODEASSISTS) ? - svm->vmcb->control.insn_bytes : NULL, - svm->vmcb->control.insn_len); + svm_get_pf_insn_bytes(svm), + svm->vmcb->control.insn_len); } static int db_interception(struct kvm_vcpu *vcpu)