On Wed, Jan 30, 2019 at 05:43:53PM +0100, Paolo Bonzini wrote: > On 10/01/19 19:06, Sean Christopherson wrote: > > Revert back to a dedicated (and slower) mechanism for handling the > > scenario where all MMIO shadow PTEs need to be zapped due to overflowing > > the MMIO generation number. The MMIO generation scenario is almost > > literally a one-in-a-million occurrence, i.e. is not a performance > > sensitive scenario. > > > > Restoring kvm_mmu_zap_mmio_sptes() leaves VM teardown as the only user > > of kvm_mmu_invalidate_zap_all_pages() and paves the way for removing > > the fast invalidate mechanism altogether. > > > > This reverts commit a8eca9dcc656a405a28ffba43f3d86a1ff0eb331. > > > > Cc: Xiao Guangrong <guangrong.xiao@xxxxxxxxx> > > Signed-off-by: Sean Christopherson <sean.j.christopherson@xxxxxxxxx> > > --- [...] > > diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c > > index bb56beb166e4..df41193aba38 100644 > > --- a/arch/x86/kvm/mmu.c > > +++ b/arch/x86/kvm/mmu.c > > @@ -386,6 +386,8 @@ static void mark_mmio_spte(struct kvm_vcpu *vcpu, u64 *sptep, u64 gfn, > > mask |= (gpa & shadow_nonpresent_or_rsvd_mask) > > << shadow_nonpresent_or_rsvd_mask_len; > > > > + page_header(__pa(sptep))->mmio_cached = true; > > + > > trace_mark_mmio_spte(sptep, gfn, access, gen); > > mmu_spte_set(sptep, mask); > > } > > @@ -5932,6 +5934,24 @@ static bool kvm_has_zapped_obsolete_pages(struct kvm *kvm) > > return unlikely(!list_empty_careful(&kvm->arch.zapped_obsolete_pages)); > > } > > > > +static void kvm_mmu_zap_mmio_sptes(struct kvm *kvm) > > +{ > > + struct kvm_mmu_page *sp, *node; > > + LIST_HEAD(invalid_list); > > + > > + spin_lock(&kvm->mmu_lock); > > +restart: > > + list_for_each_entry_safe(sp, node, &kvm->arch.active_mmu_pages, link) { > > + if (!sp->mmio_cached) > > + continue; > > + if (kvm_mmu_prepare_zap_page(kvm, sp, &invalid_list)) > > + goto restart; > > This is potentially quadratic. If the "- 150" hack is removed, I'd > rather have a zap-all instead. Why is the walk restarted after zapping children? Commit 0738541396be ("KVM: MMU: awareness of new kvm_mmu_zap_page behaviour") added the restart and simply states: kvm_mmu_zap_page will soon zap the unsynced children of a page. Restart list walk in such case. Which isn't exactly illuminating, but it does imply that the MMIO case should never need to restart since MMIO sptes can't have children. The "unsynced children" behavior was added by commit 4731d4c7a077 ("KVM: MMU: out of sync shadow core"), and at that time kvm_mmu_zap_page() only returned the number of zapped unsynced children. Commit 54a4f0239f2e ("KVM: MMU: make kvm_mmu_zap_page() return the number of pages it actually freed") extended the return value to include the page itself, which makes sense for the general case, but it means restarting the walk on a non-zero return value is at least partially wrong, if not completely unnecessary. > I understand it is a huge headache to squash the change at this point of > the series precisely, so it's okay to have a patch on top at the end. > > Paolo > > > + } > > + > > + kvm_mmu_commit_zap_page(kvm, &invalid_list); > > + spin_unlock(&kvm->mmu_lock); > > +} > > + > > void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm, struct kvm_memslots *slots) > > { > > /* > > @@ -5940,7 +5960,7 @@ void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm, struct kvm_memslots *slots) > > */ > > if (unlikely((slots->generation & MMIO_GEN_MASK) == 0)) { > > kvm_debug_ratelimited("kvm: zapping shadow pages for mmio generation wraparound\n"); > > - kvm_mmu_invalidate_zap_all_pages(kvm); > > + kvm_mmu_zap_mmio_sptes(kvm); > > } > > } > > > > >