On Wed, Jan 27, 2021, David Stevens wrote: > From: David Stevens <stevensd@xxxxxxxxxxxx> > > Track the range being invalidated by mmu_notifier and skip page fault > retries if the fault address is not affected by the in-progress > invalidation. Handle concurrent invalidations by finding the minimal > range which includes all ranges being invalidated. Although the combined > range may include unrelated addresses and cannot be shrunk as individual > invalidation operations complete, it is unlikely the marginal gains of > proper range tracking are worth the additional complexity. > > The primary benefit of this change is the reduction in the likelihood of > extreme latency when handing a page fault due to another thread having > been preempted while modifying host virtual addresses. > > Signed-off-by: David Stevens <stevensd@xxxxxxxxxxxx> > --- > v1 -> v2: > - improve handling of concurrent invalidation requests by unioning > ranges, instead of just giving up and using [0, ULONG_MAX). Ooh, even better. > - add lockdep check > - code comments and formatting > > arch/powerpc/kvm/book3s_64_mmu_hv.c | 2 +- > arch/powerpc/kvm/book3s_64_mmu_radix.c | 2 +- > arch/x86/kvm/mmu/mmu.c | 16 ++++++++------ > arch/x86/kvm/mmu/paging_tmpl.h | 7 ++++--- > include/linux/kvm_host.h | 27 +++++++++++++++++++++++- > virt/kvm/kvm_main.c | 29 ++++++++++++++++++++++---- > 6 files changed, 67 insertions(+), 16 deletions(-) > ... > @@ -3717,7 +3720,8 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code, > mmu_seq = vcpu->kvm->mmu_notifier_seq; > smp_rmb(); > > - if (try_async_pf(vcpu, prefault, gfn, gpa, &pfn, write, &map_writable)) > + if (try_async_pf(vcpu, prefault, gfn, gpa, &pfn, &hva, > + write, &map_writable)) > return RET_PF_RETRY; > > if (handle_abnormal_pfn(vcpu, is_tdp ? 0 : gpa, gfn, pfn, ACC_ALL, &r)) > @@ -3725,7 +3729,7 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code, > > r = RET_PF_RETRY; > spin_lock(&vcpu->kvm->mmu_lock); > - if (mmu_notifier_retry(vcpu->kvm, mmu_seq)) > + if (mmu_notifier_retry_hva(vcpu->kvm, mmu_seq, hva)) 'hva' will be uninitialized at this point if the gfn did not resolve to a memslot, i.e. when handling an MMIO page fault. On the plus side, that's an opportunity for another optimization as there is no need to retry MMIO page faults on mmu_notifier invalidations. Including the attached patch as a preqreq to this will avoid consuming an uninitialized 'hva'. > goto out_unlock; > r = make_mmu_pages_available(vcpu); > if (r) ... > void kvm_release_pfn_clean(kvm_pfn_t pfn); > void kvm_release_pfn_dirty(kvm_pfn_t pfn); > @@ -1203,6 +1206,28 @@ static inline int mmu_notifier_retry(struct kvm *kvm, unsigned long mmu_seq) > return 1; > return 0; > } > + > +static inline int mmu_notifier_retry_hva(struct kvm *kvm, > + unsigned long mmu_seq, > + unsigned long hva) > +{ > +#ifdef CONFIG_LOCKDEP > + lockdep_is_held(&kvm->mmu_lock); No need to manually do the #ifdef, just use lockdep_assert_held instead of lockdep_is_held. > +#endif > + /* > + * If mmu_notifier_count is non-zero, then the range maintained by > + * kvm_mmu_notifier_invalidate_range_start contains all addresses that > + * might be being invalidated. Note that it may include some false > + * positives, due to shortcuts when handing concurrent invalidations. > + */ > + if (unlikely(kvm->mmu_notifier_count) && > + kvm->mmu_notifier_range_start <= hva && > + hva < kvm->mmu_notifier_range_end) Uber nit: I find this easier to read if 'hva' is on the left-hand side for both checks, i.e. if (unlikely(kvm->mmu_notifier_count) && hva >= kvm->mmu_notifier_range_start && hva < kvm->mmu_notifier_range_end) > + return 1; > + if (kvm->mmu_notifier_seq != mmu_seq) > + return 1; > + return 0; > +} > #endif > > #ifdef CONFIG_HAVE_KVM_IRQ_ROUTING
>From a1bfdc6fe16582440815cfecc656313dff993003 Mon Sep 17 00:00:00 2001 From: Sean Christopherson <seanjc@xxxxxxxxxx> Date: Wed, 27 Jan 2021 10:04:45 -0800 Subject: [PATCH] KVM: x86/mmu: Skip mmu_notifier check when handling MMIO page fault Don't retry a page fault due to an mmu_notifier invalidation when handling a page fault for a GPA that did not resolve to a memslot, i.e. an MMIO page fault. Invalidations from the mmu_notifier signal a change in a host virtual address (HVA) mapping; without a memslot, there is no HVA and thus no possibility that the invalidation is relevant to the page fault being handled. Note, the MMIO vs. memslot generation checks handle the case where a pending memslot will create a memslot overlapping the faulting GPA. The mmu_notifier checks are orthogonal to memslot updates. Signed-off-by: Sean Christopherson <seanjc@xxxxxxxxxx> --- arch/x86/kvm/mmu/mmu.c | 2 +- arch/x86/kvm/mmu/paging_tmpl.h | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 6d16481aa29d..9ac0a727015d 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -3725,7 +3725,7 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code, r = RET_PF_RETRY; spin_lock(&vcpu->kvm->mmu_lock); - if (mmu_notifier_retry(vcpu->kvm, mmu_seq)) + if (!is_noslot_pfn(pfn) && mmu_notifier_retry(vcpu->kvm, mmu_seq)) goto out_unlock; r = make_mmu_pages_available(vcpu); if (r) diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h index 50e268eb8e1a..ab54263d857c 100644 --- a/arch/x86/kvm/mmu/paging_tmpl.h +++ b/arch/x86/kvm/mmu/paging_tmpl.h @@ -869,7 +869,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gpa_t addr, u32 error_code, r = RET_PF_RETRY; spin_lock(&vcpu->kvm->mmu_lock); - if (mmu_notifier_retry(vcpu->kvm, mmu_seq)) + if (!is_noslot_pfn(pfn) && mmu_notifier_retry(vcpu->kvm, mmu_seq)) goto out_unlock; kvm_mmu_audit(vcpu, AUDIT_PRE_PAGE_FAULT); -- 2.30.0.280.ga3ce27912f-goog