Good suggestion Sean, thank you. I've split this change into a revert of 71883a62fcd6 and another patch with the new logic, and sent them both out as a v2 patch set. I put the "Cc: stable@xxxxxxxxxxxxxxx" tag on the revert as well, I hope that was correct. On Tue, Mar 12, 2019 at 9:57 AM Sean Christopherson <sean.j.christopherson@xxxxxxxxx> wrote: > > On Tue, Mar 12, 2019 at 09:17:27AM -0700, Ben Gardon wrote: > > zap_direct_gfn_range contains an optimization to use gfn-limited TLB > > flushes, if enabled. If using these limited flushes, > > zap_direct_gfn_range passes lock_flush_tlb=false to > > slot_handle_level_range which creates a race when the function unlocks > > to call cond_resched. See an example of this race below: > > > > CPU 0 CPU 1 CPU 3 > > // zap_direct_gfn_range > > mmu_lock() > > // *ptep == pte_1 > > *ptep = 0 > > if (lock_flush_tlb) > > flush_tlbs() > > mmu_unlock() > > // In invalidate range > > // MMU notifier > > mmu_lock() > > if (pte != 0) > > *ptep = 0 > > flush = true > > if (flush) > > flush_remote_tlbs() > > mmu_unlock() > > return > > // Host MM reallocates > > // page previously > > // backing guest memory. > > // Guest accesses > > // invalid page > > // through pte_1 > > // in its TLB!! > > > > Avoid the above race by passing lock_flush_tlb=true from kvm_zap_gfn_range > > and replacing kvm_flush_remote_tlbs with kvm_flush_remote_tlbs_with_address > > in slot_handle_level_range. When range based flushes are not enabled > > kvm_flush_remote_tlbs_with_address falls back to kvm_flush_remote_tlbs. > > > > This does change the behavior of many other functions that indirectly use > > slot_handle_level_range, iff the range based flushes are enabled. The > > only potential problem I see with this is that kvm->tlbs_dirty will be > > cleared less often, however the only caller of slot_handle_level_range that > > checks tlbs_dirty is kvm_mmu_notifier_invalidate_range_start which > > checks it and does a kvm_flush_remote_tlbs after calling > > kvm_unmap_hva_range anyway. > > > > Tested: Ran all kvm-unit-tests on a Intel Haswell machine with and > > without this patch. The patch introduced no new failures. > > > > Signed-off-by: Ben Gardon <bgardon@xxxxxxxxxx> > > Reviewed-by: Junaid Shadid <junaids@xxxxxxxxxx> > > --- > > arch/x86/kvm/mmu.c | 17 ++++++----------- > > 1 file changed, 6 insertions(+), 11 deletions(-) > > > > diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c > > index 8d43b7c0f56fd..2e6abfa47a2e4 100644 > > --- a/arch/x86/kvm/mmu.c > > +++ b/arch/x86/kvm/mmu.c > > @@ -5501,7 +5501,9 @@ slot_handle_level_range(struct kvm *kvm, struct kvm_memory_slot *memslot, > > > > if (need_resched() || spin_needbreak(&kvm->mmu_lock)) { > > if (flush && lock_flush_tlb) { > > - kvm_flush_remote_tlbs(kvm); > > + kvm_flush_remote_tlbs_with_address(kvm, > > + start_gfn, > > + iterator.gfn - start_gfn + 1); > > flush = false; > > } > > cond_resched_lock(&kvm->mmu_lock); > > @@ -5509,7 +5511,8 @@ slot_handle_level_range(struct kvm *kvm, struct kvm_memory_slot *memslot, > > } > > > > if (flush && lock_flush_tlb) { > > - kvm_flush_remote_tlbs(kvm); > > + kvm_flush_remote_tlbs_with_address(kvm, start_gfn, > > + end_gfn - start_gfn + 1); > > flush = false; > > } > > > > @@ -5660,13 +5663,9 @@ void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end) > > { > > struct kvm_memslots *slots; > > struct kvm_memory_slot *memslot; > > - bool flush_tlb = true; > > bool flush = false; > > 'flush' is no longer used. It's set by the slot_handle_level_range() > call, but never used. And I'm pretty sure that removing 'flush' makes > this part of the patch a full revert of commit 71883a62fcd6 ("KVM/MMU: > Flush tlb directly in the kvm_zap_gfn_range()"). > > From a stable backporting perspective, I think it would make sense to > first revert the aforementioned commit, and then introduce the new > logic. That'd allow the pure fix to be backported to stable kernels > without potentially introducing new issues due to changing the > behavior of slot_handle_level_range(). > > > int i; > > > > - if (kvm_available_flush_tlb_with_range()) > > - flush_tlb = false; > > - > > spin_lock(&kvm->mmu_lock); > > for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) { > > slots = __kvm_memslots(kvm, i); > > @@ -5681,14 +5680,10 @@ void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end) > > flush |= slot_handle_level_range(kvm, memslot, > > kvm_zap_rmapp, PT_PAGE_TABLE_LEVEL, > > PT_MAX_HUGEPAGE_LEVEL, start, > > - end - 1, flush_tlb); > > + end - 1, true); > > } > > } > > > > - if (flush) > > - kvm_flush_remote_tlbs_with_address(kvm, gfn_start, > > - gfn_end - gfn_start + 1); > > - > > spin_unlock(&kvm->mmu_lock); > > } > > > > -- > > 2.21.0.360.g471c308f928-goog > >