On Fri, Oct 22, 2021, Maciej S. Szmigiero wrote: > On 22.10.2021 03:00, Sean Christopherson wrote: > > Remove an unnecessary remote TLB flush in kvm_zap_gfn_range() now that > > said function holds mmu_lock for write for its entire duration. The > > flush was added by the now-reverted commit to allow TDP MMU to flush while > > holding mmu_lock for read, as the transition from write=>read required > > dropping the lock and thus a pending flush needed to be serviced. > > > > Fixes: 5a324c24b638 ("Revert "KVM: x86/mmu: Allow zap gfn range to operate under the mmu read lock"") > > Cc: Maxim Levitsky <mlevitsk@xxxxxxxxxx> > > Cc: Maciej S. Szmigiero <maciej.szmigiero@xxxxxxxxxx> > > Cc: Ben Gardon <bgardon@xxxxxxxxxx> > > Signed-off-by: Sean Christopherson <seanjc@xxxxxxxxxx> > > --- > > arch/x86/kvm/mmu/mmu.c | 3 --- > > 1 file changed, 3 deletions(-) > > > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c > > index f82b192bba0b..e8b8a665e2e9 100644 > > --- a/arch/x86/kvm/mmu/mmu.c > > +++ b/arch/x86/kvm/mmu/mmu.c > > @@ -5700,9 +5700,6 @@ void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end) > > end - 1, true, flush); > > } > > } > > - if (flush) > > - kvm_flush_remote_tlbs_with_address(kvm, gfn_start, > > - gfn_end - gfn_start); > > } > > if (is_tdp_mmu_enabled(kvm)) { > > > > Unfortunately, it seems that a pending flush from __kvm_zap_rmaps() > can be reset back to false by the following line: > > flush = kvm_tdp_mmu_zap_gfn_range(kvm, i, gfn_start, gfn_end, flush); > > kvm_tdp_mmu_zap_gfn_range() calls __kvm_tdp_mmu_zap_gfn_range with > "can_yield" set to true, which passes it to zap_gfn_range, which has > this code: > > if (can_yield && > > tdp_mmu_iter_cond_resched(kvm, &iter, flush, shared)) { > > flush = false; > > continue; > > } That's working by design. If the MMU (legacy or TDP) yields during zap, it _must_ flush before dropping mmu_lock so that any SPTE modifications are guaranteed to be observed by all vCPUs. Clearing "flush" is deliberate/correct as another is flush is needed if and only if additional SPTE modifications are made. static inline bool tdp_mmu_iter_cond_resched(struct kvm *kvm, struct tdp_iter *iter, bool flush, bool shared) { /* Ensure forward progress has been made before yielding. */ if (iter->next_last_level_gfn == iter->yielded_gfn) return false; if (need_resched() || rwlock_needbreak(&kvm->mmu_lock)) { rcu_read_unlock(); if (flush) kvm_flush_remote_tlbs(kvm); <------- ****** HERE ****** if (shared) cond_resched_rwlock_read(&kvm->mmu_lock); else cond_resched_rwlock_write(&kvm->mmu_lock); rcu_read_lock(); WARN_ON(iter->gfn > iter->next_last_level_gfn); tdp_iter_restart(iter); return true; } return false; }