Thanks for doing this. I'm not overly familiar with KVM implementation but am familiar with mmu notifiers so read through the KVM usage. Looks like KVM is sort of doing something similar to what mmu_interval_notifiers do and I wonder if some of that could be shared. Anyway I didn't spot anything wrong here so feel free to add: Reviewed-by: Alistair Popple <apopple@xxxxxxxxxx> Sean Christopherson <seanjc@xxxxxxxxxx> writes: > Re-request an APIC-access page reload if there is a relevant mmu_notifier > invalidation in-progress when KVM retrieves the backing pfn, i.e. stall > vCPUs until the backing pfn for the APIC-access page is "officially" > stable. Relying on the primary MMU to not make changes after invoking > ->invalidate_range() works, e.g. any additional changes to a PRESENT PTE > would also trigger an ->invalidate_range(), but using ->invalidate_range() > to fudge around KVM not honoring past and in-progress invalidations is a > bit hacky. > > Honoring invalidations will allow using KVM's standard mmu_notifier hooks > to detect APIC-access page reloads, which will in turn allow removing > KVM's implementation of ->invalidate_range() (the APIC-access page case is > a true one-off). > > Opportunistically add a comment to explain why doing nothing if a memslot > isn't found is functionally correct. > > Suggested-by: Jason Gunthorpe <jgg@xxxxxxxxxx> > Cc: Alistair Popple <apopple@xxxxxxxxxx> > Cc: Robin Murphy <robin.murphy@xxxxxxx> > Signed-off-by: Sean Christopherson <seanjc@xxxxxxxxxx> > --- > arch/x86/kvm/vmx/vmx.c | 50 +++++++++++++++++++++++++++++++++++++----- > 1 file changed, 45 insertions(+), 5 deletions(-) > > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c > index 44fb619803b8..59195f0dc7a5 100644 > --- a/arch/x86/kvm/vmx/vmx.c > +++ b/arch/x86/kvm/vmx/vmx.c > @@ -6708,7 +6708,12 @@ void vmx_set_virtual_apic_mode(struct kvm_vcpu *vcpu) > > static void vmx_set_apic_access_page_addr(struct kvm_vcpu *vcpu) > { > - struct page *page; > + const gfn_t gfn = APIC_DEFAULT_PHYS_BASE >> PAGE_SHIFT; > + struct kvm *kvm = vcpu->kvm; > + struct kvm_memslots *slots = kvm_memslots(kvm); > + struct kvm_memory_slot *slot; > + unsigned long mmu_seq; > + kvm_pfn_t pfn; > > /* Defer reload until vmcs01 is the current VMCS. */ > if (is_guest_mode(vcpu)) { > @@ -6720,18 +6725,53 @@ static void vmx_set_apic_access_page_addr(struct kvm_vcpu *vcpu) > SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES)) > return; > > - page = gfn_to_page(vcpu->kvm, APIC_DEFAULT_PHYS_BASE >> PAGE_SHIFT); > - if (is_error_page(page)) > + /* > + * Grab the memslot so that the hva lookup for the mmu_notifier retry > + * is guaranteed to use the same memslot as the pfn lookup, i.e. rely > + * on the pfn lookup's validation of the memslot to ensure a valid hva > + * is used for the retry check. > + */ > + slot = id_to_memslot(slots, APIC_ACCESS_PAGE_PRIVATE_MEMSLOT); > + if (!slot || slot->flags & KVM_MEMSLOT_INVALID) > return; > > - vmcs_write64(APIC_ACCESS_ADDR, page_to_phys(page)); > + /* > + * Ensure that the mmu_notifier sequence count is read before KVM > + * retrieves the pfn from the primary MMU. Note, the memslot is > + * protected by SRCU, not the mmu_notifier. Pairs with the smp_wmb() > + * in kvm_mmu_invalidate_end(). > + */ > + mmu_seq = kvm->mmu_invalidate_seq; > + smp_rmb(); > + > + /* > + * No need to retry if the memslot does not exist or is invalid. KVM > + * controls the APIC-access page memslot, and only deletes the memslot > + * if APICv is permanently inhibited, i.e. the memslot won't reappear. > + */ > + pfn = gfn_to_pfn_memslot(slot, gfn); > + if (is_error_noslot_pfn(pfn)) > + return; > + > + read_lock(&vcpu->kvm->mmu_lock); > + if (mmu_invalidate_retry_hva(kvm, mmu_seq, > + gfn_to_hva_memslot(slot, gfn))) { > + kvm_make_request(KVM_REQ_APIC_PAGE_RELOAD, vcpu); > + read_unlock(&vcpu->kvm->mmu_lock); > + goto out; > + } > + > + vmcs_write64(APIC_ACCESS_ADDR, pfn_to_hpa(pfn)); > + read_unlock(&vcpu->kvm->mmu_lock); > + > vmx_flush_tlb_current(vcpu); > > +out: > /* > * Do not pin apic access page in memory, the MMU notifier > * will call us again if it is migrated or swapped out. > */ > - put_page(page); > + kvm_release_pfn_clean(pfn); > } > > static void vmx_hwapic_isr_update(int max_isr)