On Tue, Aug 20, 2024, Will Deacon wrote: > Hi Sean, > > On Fri, Aug 02, 2024 at 12:16:17PM -0700, Sean Christopherson wrote: > > Do arch-specific range-based TLB flushes (if they're supported) when > > flushing in response to mmu_notifier events, as a single range-based flush > > is almost always more performant. This is especially true in the case of > > mmu_notifier events, as the majority of events that hit a running VM > > operate on a relatively small range of memory. > > > > Cc: Marc Zyngier <maz@xxxxxxxxxx> > > Cc: Will Deacon <will@xxxxxxxxxx> > > Signed-off-by: Sean Christopherson <seanjc@xxxxxxxxxx> > > --- > > > > This is *very* lightly tested, a thumbs up from the ARM world would be much > > appreciated. > > > > virt/kvm/kvm_main.c | 15 ++++++++++++++- > > 1 file changed, 14 insertions(+), 1 deletion(-) > > > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > > index d0788d0a72cc..46bb95d58d53 100644 > > --- a/virt/kvm/kvm_main.c > > +++ b/virt/kvm/kvm_main.c > > @@ -599,6 +599,7 @@ static __always_inline kvm_mn_ret_t __kvm_handle_hva_range(struct kvm *kvm, > > struct kvm_gfn_range gfn_range; > > struct kvm_memory_slot *slot; > > struct kvm_memslots *slots; > > + bool need_flush = false; > > int i, idx; > > > > if (WARN_ON_ONCE(range->end <= range->start)) > > @@ -651,10 +652,22 @@ static __always_inline kvm_mn_ret_t __kvm_handle_hva_range(struct kvm *kvm, > > goto mmu_unlock; > > } > > r.ret |= range->handler(kvm, &gfn_range); > > + > > + /* > > + * Use a precise gfn-based TLB flush when possible, as > > + * most mmu_notifier events affect a small-ish range. > > + * Fall back to a full TLB flush if the gfn-based flush > > + * fails, and don't bother trying the gfn-based flush > > + * if a full flush is already pending. > > + */ > > + if (range->flush_on_ret && !need_flush && r.ret && > > + kvm_arch_flush_remote_tlbs_range(kvm, gfn_range.start, > > + gfn_range.end - gfn_range.start)) > > + need_flush = true; > > Thanks for having a crack at this. > > We could still do better in the ->clear_flush_young() case if the For clear_flush_young(), I 100% think we should let architectures opt out of the flush. For architectures where it's safe, the primary MMU doesn't do a TLB flush, and hasn't for years. Sending patches for this (for at least x86 and arm64) is on my todo list. Even better would be to kill off mmu_notifier_clear_flush_young() entirely, e.g. if all KVM architectures can elide the flush. And even better than that would be to kill pxxx_clear_flush_young_notify() in the kernel, but I suspect that's not feasible as there are architectures that require a TLB flush for correctness. > handler could do the invalidation as part of its page-table walk (for > example, it could use information about the page-table structure such > as the level of the leaves to optimise the invalidation further), but > this does at least avoid zapping the whole VMID on CPUs with range > support. > > My only slight concern is that, should clear_flush_young() be extended > to operate on more than a single page-at-a-time in future, this will > silently end up invalidating the entire VMID for each memslot unless we > teach kvm_arch_flush_remote_tlbs_range() to return !0 in that case. I'm not sure I follow the "entire VMID for each memslot" concern. Are you worried about kvm_arch_flush_remote_tlbs_range() failing and triggering a VM-wide flush?