On Tue, Sep 24, 2024 at 12:45:52AM -0700, Sean Christopherson wrote: > On Tue, Sep 24, 2024, Yan Zhao wrote: > > On Mon, Sep 23, 2024 at 11:37:14AM -0700, Sean Christopherson wrote: > > > > +static void kvm_mmu_zap_memslot_leafs(struct kvm *kvm, struct kvm_memory_slot *slot) > > > > +{ > > > > + struct kvm_gfn_range range = { > > > > + .slot = slot, > > > > + .start = slot->base_gfn, > > > > + .end = slot->base_gfn + slot->npages, > > > > + .may_block = true, > > > > + }; > > > > + bool flush = false; > > > > + > > > > + write_lock(&kvm->mmu_lock); > > > > + > > > > + if (kvm_memslots_have_rmaps(kvm)) > > > > + flush = kvm_handle_gfn_range(kvm, &range, kvm_zap_rmap); > > > > > > This, and Paolo's merged variant, break shadow paging. As was tried in commit > > > 4e103134b862 ("KVM: x86/mmu: Zap only the relevant pages when removing a memslot"), > > > all shadow pages, i.e. non-leaf SPTEs, need to be zapped. All of the accounting > > > for a shadow page is tied to the memslot, i.e. the shadow page holds a reference > > > to the memslot, for all intents and purposes. Deleting the memslot without removing > > > all relevant shadow pages results in NULL pointer derefs when tearing down the VM. > > > > > > Note, that commit is/was buggy, and I suspect my follow-up attempt[*] was as well. > > > https://lore.kernel.org/all/20190820200318.GA15808@xxxxxxxxxxxxxxx > > > > > > Rather than trying to get this functional for shadow paging (which includes nested > > > TDP), I think we should scrap the quirk idea and simply make this the behavior for > > > S-EPT and nothing else. > > Ok. Thanks for identifying this error. Will change code to this way. > > For now, I think a full revert of the entire series makes sense. Irrespective of > this bug, I don't think KVM should commit to specific implementation behavior, > i.e. KVM shouldn't explicitly say only leaf SPTEs are zapped. The quirk docs > should instead say that if the quirk is disabled, KVM will only guarantee that > the delete memslot will be inaccessible. That way, KVM can still do a fast zap > when it makes sense, e.g. for large memslots, do a complete fast zap, and for > small memslots, do a targeted zap of the TDP MMU but a fast zap of any shadow > MMUs. For TDX, could we do as below after the full revert of this series? void kvm_arch_flush_shadow_memslot(struct kvm *kvm, struct kvm_memory_slot *slot) { kvm_mmu_zap_all_fast(kvm); ==> this will skip mirror root kvm_mmu_zap_memslot_mirror_leafs(kvm, slot); ==> zap memslot leaf entries in mirror root } > > > BTW: update some findings regarding to the previous bug with Nvidia GPU > > assignment: > > I found that after v5.19-rc1+, even with nx_huge_pages=N, the bug is not > > reproducible when only leaf entries of memslot are zapped. > > (no more detailed info due to limited time to debug). > > Heh, I've given up hope on ever finding a root cause for that issue.