On Tue, Mar 12, 2019 at 09:17:27AM -0700, Ben Gardon wrote: > zap_direct_gfn_range contains an optimization to use gfn-limited TLB > flushes, if enabled. If using these limited flushes, > zap_direct_gfn_range passes lock_flush_tlb=false to > slot_handle_level_range which creates a race when the function unlocks > to call cond_resched. See an example of this race below: > > CPU 0 CPU 1 CPU 3 > // zap_direct_gfn_range > mmu_lock() > // *ptep == pte_1 > *ptep = 0 > if (lock_flush_tlb) > flush_tlbs() > mmu_unlock() > // In invalidate range > // MMU notifier > mmu_lock() > if (pte != 0) > *ptep = 0 > flush = true > if (flush) > flush_remote_tlbs() > mmu_unlock() > return > // Host MM reallocates > // page previously > // backing guest memory. > // Guest accesses > // invalid page > // through pte_1 > // in its TLB!! > > Avoid the above race by passing lock_flush_tlb=true from kvm_zap_gfn_range > and replacing kvm_flush_remote_tlbs with kvm_flush_remote_tlbs_with_address > in slot_handle_level_range. When range based flushes are not enabled > kvm_flush_remote_tlbs_with_address falls back to kvm_flush_remote_tlbs. > > This does change the behavior of many other functions that indirectly use > slot_handle_level_range, iff the range based flushes are enabled. The > only potential problem I see with this is that kvm->tlbs_dirty will be > cleared less often, however the only caller of slot_handle_level_range that > checks tlbs_dirty is kvm_mmu_notifier_invalidate_range_start which > checks it and does a kvm_flush_remote_tlbs after calling > kvm_unmap_hva_range anyway. > > Tested: Ran all kvm-unit-tests on a Intel Haswell machine with and > without this patch. The patch introduced no new failures. > > Signed-off-by: Ben Gardon <bgardon@xxxxxxxxxx> > Reviewed-by: Junaid Shadid <junaids@xxxxxxxxxx> > --- > arch/x86/kvm/mmu.c | 17 ++++++----------- > 1 file changed, 6 insertions(+), 11 deletions(-) > > diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c > index 8d43b7c0f56fd..2e6abfa47a2e4 100644 > --- a/arch/x86/kvm/mmu.c > +++ b/arch/x86/kvm/mmu.c > @@ -5501,7 +5501,9 @@ slot_handle_level_range(struct kvm *kvm, struct kvm_memory_slot *memslot, > > if (need_resched() || spin_needbreak(&kvm->mmu_lock)) { > if (flush && lock_flush_tlb) { > - kvm_flush_remote_tlbs(kvm); > + kvm_flush_remote_tlbs_with_address(kvm, > + start_gfn, > + iterator.gfn - start_gfn + 1); > flush = false; > } > cond_resched_lock(&kvm->mmu_lock); > @@ -5509,7 +5511,8 @@ slot_handle_level_range(struct kvm *kvm, struct kvm_memory_slot *memslot, > } > > if (flush && lock_flush_tlb) { > - kvm_flush_remote_tlbs(kvm); > + kvm_flush_remote_tlbs_with_address(kvm, start_gfn, > + end_gfn - start_gfn + 1); > flush = false; > } > > @@ -5660,13 +5663,9 @@ void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end) > { > struct kvm_memslots *slots; > struct kvm_memory_slot *memslot; > - bool flush_tlb = true; > bool flush = false; 'flush' is no longer used. It's set by the slot_handle_level_range() call, but never used. And I'm pretty sure that removing 'flush' makes this part of the patch a full revert of commit 71883a62fcd6 ("KVM/MMU: Flush tlb directly in the kvm_zap_gfn_range()"). >From a stable backporting perspective, I think it would make sense to first revert the aforementioned commit, and then introduce the new logic. That'd allow the pure fix to be backported to stable kernels without potentially introducing new issues due to changing the behavior of slot_handle_level_range(). > int i; > > - if (kvm_available_flush_tlb_with_range()) > - flush_tlb = false; > - > spin_lock(&kvm->mmu_lock); > for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) { > slots = __kvm_memslots(kvm, i); > @@ -5681,14 +5680,10 @@ void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end) > flush |= slot_handle_level_range(kvm, memslot, > kvm_zap_rmapp, PT_PAGE_TABLE_LEVEL, > PT_MAX_HUGEPAGE_LEVEL, start, > - end - 1, flush_tlb); > + end - 1, true); > } > } > > - if (flush) > - kvm_flush_remote_tlbs_with_address(kvm, gfn_start, > - gfn_end - gfn_start + 1); > - > spin_unlock(&kvm->mmu_lock); > } > > -- > 2.21.0.360.g471c308f928-goog >