On Mon, 2024-02-26 at 00:26 -0800, isaku.yamahata@xxxxxxxxx wrote: > From: Sean Christopherson <sean.j.christopherson@xxxxxxxxx> > > TDX supports only write-back(WB) memory type for private memory > architecturally so that (virtualized) memory type change doesn't make > sense > for private memory. Also currently, page migration isn't supported > for TDX > yet. (TDX architecturally supports page migration. it's KVM and > kernel > implementation issue.) > > Regarding memory type change (mtrr virtualization and lapic page > mapping > change), pages are zapped by kvm_zap_gfn_range(). On the next KVM > page > fault, the SPTE entry with a new memory type for the page is > populated. > Regarding page migration, pages are zapped by the mmu notifier. On > the next > KVM page fault, the new migrated page is populated. Don't zap > private > pages on unmapping for those two cases. Is the migration case relevant to TDX? > > When deleting/moving a KVM memory slot, zap private pages. Typically > tearing down VM. Don't invalidate private page tables. i.e. zap only > leaf > SPTEs for KVM mmu that has a shared bit mask. The existing > kvm_tdp_mmu_invalidate_all_roots() depends on role.invalid with read- > lock > of mmu_lock so that other vcpu can operate on KVM mmu concurrently. > It > marks the root page table invalid and zaps SPTEs of the root page > tables. The TDX module doesn't allow to unlink a protected root page > table > from the hardware and then allocate a new one for it. i.e. replacing > a > protected root page table. Instead, zap only leaf SPTEs for KVM mmu > with a > shared bit mask set. I get the part about only zapping leafs and not the root and mid-level PTEs. But why the MTRR, lapic page and migration part? Why should those not be zapped? Why is migration a consideration when it is not supported? > > Signed-off-by: Sean Christopherson <sean.j.christopherson@xxxxxxxxx> > Signed-off-by: Isaku Yamahata <isaku.yamahata@xxxxxxxxx> > --- > arch/x86/kvm/mmu/mmu.c | 61 > ++++++++++++++++++++++++++++++++++++-- > arch/x86/kvm/mmu/tdp_mmu.c | 37 +++++++++++++++++++---- > arch/x86/kvm/mmu/tdp_mmu.h | 5 ++-- > 3 files changed, 92 insertions(+), 11 deletions(-) > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c > index 0d6d4506ec97..30c86e858ae4 100644 > --- a/arch/x86/kvm/mmu/mmu.c > +++ b/arch/x86/kvm/mmu/mmu.c > @@ -6339,7 +6339,7 @@ static void kvm_mmu_zap_all_fast(struct kvm > *kvm) > * e.g. before kvm_zap_obsolete_pages() could drop mmu_lock > and yield. > */ > if (tdp_mmu_enabled) > - kvm_tdp_mmu_invalidate_all_roots(kvm); > + kvm_tdp_mmu_invalidate_all_roots(kvm, true); > > /* > * Notify all vcpus to reload its shadow page table and flush > TLB. > @@ -6459,7 +6459,16 @@ void kvm_zap_gfn_range(struct kvm *kvm, gfn_t > gfn_start, gfn_t gfn_end) > flush = kvm_rmap_zap_gfn_range(kvm, gfn_start, gfn_end); > > if (tdp_mmu_enabled) > - flush = kvm_tdp_mmu_zap_leafs(kvm, gfn_start, > gfn_end, flush); > + /* > + * zap_private = false. Zap only shared pages. > + * > + * kvm_zap_gfn_range() is used when MTRR or PAT > memory > + * type was changed. Later on the next kvm page > fault, > + * populate it with updated spte entry. > + * Because only WB is supported for private pages, > don't > + * care of private pages. > + */ > + flush = kvm_tdp_mmu_zap_leafs(kvm, gfn_start, > gfn_end, flush, false); > > if (flush) > kvm_flush_remote_tlbs_range(kvm, gfn_start, gfn_end - > gfn_start); > @@ -6905,10 +6914,56 @@ void kvm_arch_flush_shadow_all(struct kvm > *kvm) > kvm_mmu_zap_all(kvm); > } > > +static void kvm_mmu_zap_memslot(struct kvm *kvm, struct > kvm_memory_slot *slot) What about kvm_mmu_zap_memslot_leafs() instead? > +{ > + bool flush = false; It doesn't need to be initialized if it passes false directly into kvm_tdp_mmu_unmap_gfn_range(). It would make the code easier to understand. > + > + write_lock(&kvm->mmu_lock); > + > + /* > + * Zapping non-leaf SPTEs, a.k.a. not-last SPTEs, isn't > required, worst > + * case scenario we'll have unused shadow pages lying around > until they > + * are recycled due to age or when the VM is destroyed. > + */ > + if (tdp_mmu_enabled) { > + struct kvm_gfn_range range = { > + .slot = slot, > + .start = slot->base_gfn, > + .end = slot->base_gfn + slot->npages, > + .may_block = true, > + > + /* > + * This handles both private gfn and shared > gfn. > + * All private page should be zapped on memslot > deletion. > + */ > + .only_private = true, > + .only_shared = true, only_private and only_shared are both true? Shouldn't they both be false? (or just unset) > + }; > + > + flush = kvm_tdp_mmu_unmap_gfn_range(kvm, &range, > flush); > + } else { > + /* TDX supports only TDP-MMU case. */ > + WARN_ON_ONCE(1); How about a KVM_BUG_ON() instead? If somehow this is reached, we don't want the caller thinking the pages are zapped, then enter the guest with pages mapped that have gone elsewhere. > + flush = true; Why flush? > + } > + if (flush) > + kvm_flush_remote_tlbs(kvm); > + > + write_unlock(&kvm->mmu_lock); > +} > + > void kvm_arch_flush_shadow_memslot(struct kvm *kvm, > struct kvm_memory_slot *slot) > { > - kvm_mmu_zap_all_fast(kvm); > + if (kvm_gfn_shared_mask(kvm)) There seems to be an attempt to abstract away the existence of Secure- EPT in mmu.c, that is not fully successful. In this case the code checks kvm_gfn_shared_mask() to see if it needs to handle the zapping in a way specific needed by S-EPT. It ends up being a little confusing because the actual check is about whether there is a shared bit. It only works because only S-EPT is the only thing that has a kvm_gfn_shared_mask(). Doing something like (kvm->arch.vm_type == KVM_X86_TDX_VM) looks wrong, but is more honest about what we are getting up to here. I'm not sure though, what do you think? > + /* > + * Secure-EPT requires to release PTs from the leaf. > The > + * optimization to zap root PT first with child PT > doesn't > + * work. > + */ > + kvm_mmu_zap_memslot(kvm, slot); > + else > + kvm_mmu_zap_all_fast(kvm); > } > > void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm, u64 gen) > diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c > index d47f0daf1b03..e7514a807134 100644 > --- a/arch/x86/kvm/mmu/tdp_mmu.c > +++ b/arch/x86/kvm/mmu/tdp_mmu.c > @@ -37,7 +37,7 @@ void kvm_mmu_uninit_tdp_mmu(struct kvm *kvm) > * for zapping and thus puts the TDP MMU's reference to each > root, i.e. > * ultimately frees all roots. > */ > - kvm_tdp_mmu_invalidate_all_roots(kvm); > + kvm_tdp_mmu_invalidate_all_roots(kvm, false); > kvm_tdp_mmu_zap_invalidated_roots(kvm); > > WARN_ON(atomic64_read(&kvm->arch.tdp_mmu_pages)); > @@ -771,7 +771,8 @@ bool kvm_tdp_mmu_zap_sp(struct kvm *kvm, struct > kvm_mmu_page *sp) > * operation can cause a soft lockup. > */ > static bool tdp_mmu_zap_leafs(struct kvm *kvm, struct kvm_mmu_page > *root, > - gfn_t start, gfn_t end, bool can_yield, > bool flush) > + gfn_t start, gfn_t end, bool can_yield, > bool flush, > + bool zap_private) > { > struct tdp_iter iter; > > @@ -779,6 +780,10 @@ static bool tdp_mmu_zap_leafs(struct kvm *kvm, > struct kvm_mmu_page *root, > > lockdep_assert_held_write(&kvm->mmu_lock); > > + WARN_ON_ONCE(zap_private && !is_private_sp(root)); All the callers have zap_private as zap_private && is_private_sp(root). What badness is it trying to uncover? > + if (!zap_private && is_private_sp(root)) > + return false; > +