On Wed, Mar 31, 2021, Paolo Bonzini wrote: > On 26/03/21 03:19, Sean Christopherson wrote: > > +#ifdef KVM_ARCH_WANT_NEW_MMU_NOTIFIER_APIS > > + kvm_handle_hva_range(mn, address, address + 1, pte, kvm_set_spte_gfn); > > +#else > > struct kvm *kvm = mmu_notifier_to_kvm(mn); > > int idx; > > trace_kvm_set_spte_hva(address); > > idx = srcu_read_lock(&kvm->srcu); > > > > KVM_MMU_LOCK(kvm); > > > > kvm->mmu_notifier_seq++; > > > > if (kvm_set_spte_hva(kvm, address, pte)) > > kvm_flush_remote_tlbs(kvm); > > > > KVM_MMU_UNLOCK(kvm); > > srcu_read_unlock(&kvm->srcu, idx); > > +#endif > > The kvm->mmu_notifier_seq is missing in the new API side. I guess you can > add an argument to __kvm_handle_hva_range and handle it also in patch 15 > ("KVM: Take mmu_lock when handling MMU notifier iff the hva hits a > memslot"). Yikes. Superb eyes! That does bring up an oddity I discovered when digging into this. Every call to .change_pte() is bookended by .invalidate_range_{start,end}(), i.e. the above missing kvm->mmu_notifier_seq++ is benign because kvm->mmu_notifier_count is guaranteed to be non-zero. I'm also fairly certain it means kvm_set_spte_gfn() is effectively dead code on _all_ architectures. x86 and MIPS are clearcut nops if the old SPTE is not-present, and that's guaranteed due to the prior invalidation. PPC simply unmaps the SPTE, which again should be a nop due to the invalidation. arm64 is a bit murky, but if I'm reading the code correctly, it's also a nop because kvm_pgtable_stage2_map() is called without a cache pointer, which I think means it will map an entry if and only if an existing PTE was found. I haven't actually tested the above analysis, e.g. by asserting that kvm->mmu_notifier_count is indeed non-zero. I'll do that sooner than later. But, given the shortlog of commit: 6bdb913f0a70 ("mm: wrap calls to set_pte_at_notify with invalidate_range_start and invalidate_range_end") I'm fairly confident my analysis is correct. And if so, it also means that the whole point of adding .change_pte() in the first place (for KSM, commit 828502d30073, "ksm: add mmu_notifier set_pte_at_notify()"), has since been lost. When it was originally added, .change_pte() was a pure alternative to invalidating the entry. void __mmu_notifier_change_pte(struct mm_struct *mm, unsigned long address, pte_t pte) { struct mmu_notifier *mn; struct hlist_node *n; rcu_read_lock(); hlist_for_each_entry_rcu(mn, n, &mm->mmu_notifier_mm->list, hlist) { if (mn->ops->change_pte) mn->ops->change_pte(mn, mm, address, pte); /* * Some drivers don't have change_pte, * so we must call invalidate_page in that case. */ else if (mn->ops->invalidate_page) mn->ops->invalidate_page(mn, mm, address); } rcu_read_unlock(); } The aforementioned commit 6bdb913f0a70 wrapped set_pte_at_notify() with invalidate_range_{start,end}() so that .invalidate_page() implementations could sleep. But, no one noticed that in doing so, .change_pte() was completely neutered. Assuming all of the above is correct, I'm very tempted to rip out .change_pte() entirely. It's been dead weight for 8+ years and no one has complained about KSM+KVM performance (I'd also be curious to know how much performance was gained by shaving VM-Exits). As KVM is the only user of .change_pte(), dropping it in KVM would mean the entire MMU notifier could also go away. _______________________________________________ kvmarm mailing list kvmarm@xxxxxxxxxxxxxxxxxxxxx https://lists.cs.columbia.edu/mailman/listinfo/kvmarm