On Mon, May 08, 2023 at 11:47:00AM +0800, Yan Zhao wrote: >Zap all TDP leaf entries when noncoherent DMA count goes from 0 to !0, or >from !0 to 0. > >When there's no noncoherent DMA device, EPT memory type is >((MTRR_TYPE_WRBACK << VMX_EPT_MT_EPTE_SHIFT) | VMX_EPT_IPAT_BIT) > >When there're noncoherent DMA devices, EPT memory type needs to honor >guest CR0_CD and MTRR settings. > >So, if noncoherent DMA count changes between 0 and !0, EPT leaf entries >need to be zapped to clear stale memory type. > >This issue might be hidden when VFIO adding/removing MMIO regions of the >noncoherent DMA devices on device attaching/de-attaching because >usually the MMIO regions will be disabled/enabled for several times during >guest PCI probing. And in KVM, TDP entries are all zapped on memslot >removal. > >However, this issue may appear when kvm_mmu_zap_all_fast() is not called >before KVM slot removal, e.g. as for TDX, only leaf entries for the >memslot to be removed is zapped. > >static void kvm_mmu_invalidate_zap_pages_in_memslot(struct kvm *kvm, > struct kvm_memory_slot *slot, > struct kvm_page_track_notifier_node *node) >{ > if (kvm_gfn_shared_mask(kvm)) > /* > * Secure-EPT requires to release PTs from the leaf. The > * optimization to zap root PT first with child PT doesn't > * work. > */ > kvm_mmu_zap_memslot(kvm, slot); > else > kvm_mmu_zap_all_fast(kvm); >} TDX code isn't merged. So, I think you'd better not use TDX as an argument. > >And even without TDX's case, in some extreme conditions if MMIO regions >are not disabled during device attaching, e.g. if guest does not cause >the MMIO region disabling in QEMU. >Then TDP zap will not be called and wrong EPT memory type might be >retained. > >So, do the TDP zapping of all leaf entries when present/non-present state >of noncoherent DMA devices changes to ensure stale entries cleaned away. >And as this is not a frequent operation, the extra zap should be fine. > >Signed-off-by: Yan Zhao <yan.y.zhao@xxxxxxxxx> >--- > arch/x86/kvm/x86.c | 6 ++++-- > 1 file changed, 4 insertions(+), 2 deletions(-) > >diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c >index e7f78fe79b32..99a825722d95 100644 >--- a/arch/x86/kvm/x86.c >+++ b/arch/x86/kvm/x86.c >@@ -13145,13 +13145,15 @@ EXPORT_SYMBOL_GPL(kvm_arch_has_assigned_device); > > void kvm_arch_register_noncoherent_dma(struct kvm *kvm) > { >- atomic_inc(&kvm->arch.noncoherent_dma_count); >+ if (atomic_inc_return(&kvm->arch.noncoherent_dma_count) == 1) >+ kvm_zap_gfn_range(kvm, gpa_to_gfn(0), gpa_to_gfn(~0ULL)); The issue is specific to EPT. shouldn't this be conditional on tdp_enabled, like update_mtrr()? Likewise, shouldn't we omit to call kvm_zap_gfn_range() in kvm_post_set_cr0() if tdp_enabled is false? > } > EXPORT_SYMBOL_GPL(kvm_arch_register_noncoherent_dma); > > void kvm_arch_unregister_noncoherent_dma(struct kvm *kvm) > { >- atomic_dec(&kvm->arch.noncoherent_dma_count); >+ if (!atomic_dec_return(&kvm->arch.noncoherent_dma_count)) >+ kvm_zap_gfn_range(kvm, gpa_to_gfn(0), gpa_to_gfn(~0ULL)); > } > EXPORT_SYMBOL_GPL(kvm_arch_unregister_noncoherent_dma); > >-- >2.17.1 >