On Tue, Aug 24, 2021, Lai Jiangshan wrote: > From: Lai Jiangshan <laijs@xxxxxxxxxxxxxxxxx> > > When kvm->tlbs_dirty > 0, some rmaps might have been deleted > without flushing tlb remotely after kvm_sync_page(). If @gfn > was writable before and it's rmaps was deleted in kvm_sync_page(), > we need to flush tlb too even if __rmap_write_protect() doesn't > request it. > > Fixes: 4731d4c7a077 ("KVM: MMU: out of sync shadow core") Should be Fixes: a4ee1ca4a36e ("KVM: MMU: delay flush all tlbs on sync_page path") > Signed-off-by: Lai Jiangshan <laijs@xxxxxxxxxxxxxxxxx> > --- > arch/x86/kvm/mmu/mmu.c | 16 ++++++++++++++++ > 1 file changed, 16 insertions(+) > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c > index 4853c033e6ce..313918df1a10 100644 > --- a/arch/x86/kvm/mmu/mmu.c > +++ b/arch/x86/kvm/mmu/mmu.c > @@ -1420,6 +1420,14 @@ bool kvm_mmu_slot_gfn_write_protect(struct kvm *kvm, > rmap_head = gfn_to_rmap(gfn, i, slot); > write_protected |= __rmap_write_protect(kvm, rmap_head, true); > } > + /* > + * When kvm->tlbs_dirty > 0, some rmaps might have been deleted > + * without flushing tlb remotely after kvm_sync_page(). If @gfn > + * was writable before and it's rmaps was deleted in kvm_sync_page(), > + * we need to flush tlb too. > + */ > + if (min_level == PG_LEVEL_4K && kvm->tlbs_dirty) > + write_protected = true; > } > > if (is_tdp_mmu_enabled(kvm)) > @@ -5733,6 +5741,14 @@ void kvm_mmu_slot_remove_write_access(struct kvm *kvm, > flush = slot_handle_level(kvm, memslot, slot_rmap_write_protect, > start_level, KVM_MAX_HUGEPAGE_LEVEL, > false); > + /* > + * When kvm->tlbs_dirty > 0, some rmaps might have been deleted > + * without flushing tlb remotely after kvm_sync_page(). If @gfn > + * was writable before and it's rmaps was deleted in kvm_sync_page(), > + * we need to flush tlb too. > + */ > + if (start_level == PG_LEVEL_4K && kvm->tlbs_dirty) > + flush = true; > write_unlock(&kvm->mmu_lock); > } My vote is to do a revert of a4ee1ca4a36e with slightly less awful batching, and then improve the batching even further if there's a noticeable loss of performance (or just tell people to stop using shadow paging :-D). Zapping SPTEs but not flushing is just asking for these types of whack-a-mole bugs. E.g. instead of a straight revert, do this for sync_page(): diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h index 50ade6450ace..1fca27a08c00 100644 --- a/arch/x86/kvm/mmu/paging_tmpl.h +++ b/arch/x86/kvm/mmu/paging_tmpl.h @@ -1095,13 +1095,7 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp) return 0; if (FNAME(prefetch_invalid_gpte)(vcpu, sp, &sp->spt[i], gpte)) { - /* - * Update spte before increasing tlbs_dirty to make - * sure no tlb flush is lost after spte is zapped; see - * the comments in kvm_flush_remote_tlbs(). - */ - smp_wmb(); - vcpu->kvm->tlbs_dirty++; + set_spte_ret |= SET_SPTE_NEED_REMOTE_TLB_FLUSH; continue; } @@ -1116,12 +1110,7 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp) if (gfn != sp->gfns[i]) { drop_spte(vcpu->kvm, &sp->spt[i]); - /* - * The same as above where we are doing - * prefetch_invalid_gpte(). - */ - smp_wmb(); - vcpu->kvm->tlbs_dirty++; + set_spte_ret |= SET_SPTE_NEED_REMOTE_TLB_FLUSH; continue; }