On Mon, May 08, 2023 at 03:19:56PM +0800, Chao Gao wrote: > On Mon, May 08, 2023 at 11:47:00AM +0800, Yan Zhao wrote: > >Zap all TDP leaf entries when noncoherent DMA count goes from 0 to !0, or > >from !0 to 0. > > > >When there's no noncoherent DMA device, EPT memory type is > >((MTRR_TYPE_WRBACK << VMX_EPT_MT_EPTE_SHIFT) | VMX_EPT_IPAT_BIT) > > > >When there're noncoherent DMA devices, EPT memory type needs to honor > >guest CR0_CD and MTRR settings. > > > >So, if noncoherent DMA count changes between 0 and !0, EPT leaf entries > >need to be zapped to clear stale memory type. > > > >This issue might be hidden when VFIO adding/removing MMIO regions of the > >noncoherent DMA devices on device attaching/de-attaching because > >usually the MMIO regions will be disabled/enabled for several times during > >guest PCI probing. And in KVM, TDP entries are all zapped on memslot > >removal. > > > >However, this issue may appear when kvm_mmu_zap_all_fast() is not called > >before KVM slot removal, e.g. as for TDX, only leaf entries for the > >memslot to be removed is zapped. > > > >static void kvm_mmu_invalidate_zap_pages_in_memslot(struct kvm *kvm, > > struct kvm_memory_slot *slot, > > struct kvm_page_track_notifier_node *node) > >{ > > if (kvm_gfn_shared_mask(kvm)) > > /* > > * Secure-EPT requires to release PTs from the leaf. The > > * optimization to zap root PT first with child PT doesn't > > * work. > > */ > > kvm_mmu_zap_memslot(kvm, slot); > > else > > kvm_mmu_zap_all_fast(kvm); > >} > > TDX code isn't merged. So, I think you'd better not use TDX as an argument. > Ok. But I just want to explain that kvm_mmu_zap_all_fast() is not desired in some cases during slot DELETE. TDX is a good example here. > > > >And even without TDX's case, in some extreme conditions if MMIO regions > >are not disabled during device attaching, e.g. if guest does not cause > >the MMIO region disabling in QEMU. > >Then TDP zap will not be called and wrong EPT memory type might be > >retained. > > > >So, do the TDP zapping of all leaf entries when present/non-present state > >of noncoherent DMA devices changes to ensure stale entries cleaned away. > >And as this is not a frequent operation, the extra zap should be fine. > > > >Signed-off-by: Yan Zhao <yan.y.zhao@xxxxxxxxx> > >--- > > arch/x86/kvm/x86.c | 6 ++++-- > > 1 file changed, 4 insertions(+), 2 deletions(-) > > > >diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > >index e7f78fe79b32..99a825722d95 100644 > >--- a/arch/x86/kvm/x86.c > >+++ b/arch/x86/kvm/x86.c > >@@ -13145,13 +13145,15 @@ EXPORT_SYMBOL_GPL(kvm_arch_has_assigned_device); > > > > void kvm_arch_register_noncoherent_dma(struct kvm *kvm) > > { > >- atomic_inc(&kvm->arch.noncoherent_dma_count); > >+ if (atomic_inc_return(&kvm->arch.noncoherent_dma_count) == 1) > > >+ kvm_zap_gfn_range(kvm, gpa_to_gfn(0), gpa_to_gfn(~0ULL)); > > The issue is specific to EPT. shouldn't this be conditional on tdp_enabled, like > update_mtrr()? > Yes. good point. Maybe also include checking of shadow_memtype_mask. > Likewise, shouldn't we omit to call kvm_zap_gfn_range() in kvm_post_set_cr0() if > tdp_enabled is false? I think so. And also check tdp_enabled and shadow_memtype_mask in the case of update_mtrr(). Will add a helper function in next version. Thanks, Chao! > > > } > > EXPORT_SYMBOL_GPL(kvm_arch_register_noncoherent_dma); > > > > void kvm_arch_unregister_noncoherent_dma(struct kvm *kvm) > > { > >- atomic_dec(&kvm->arch.noncoherent_dma_count); > >+ if (!atomic_dec_return(&kvm->arch.noncoherent_dma_count)) > >+ kvm_zap_gfn_range(kvm, gpa_to_gfn(0), gpa_to_gfn(~0ULL)); > > } > > EXPORT_SYMBOL_GPL(kvm_arch_unregister_noncoherent_dma); > > > >-- > >2.17.1 > >