On Wed, Dec 14, 2022 at 11:17:32AM +0000, "Huang, Kai" <kai.huang@xxxxxxxxx> wrote: > On Sat, 2022-10-29 at 23:22 -0700, isaku.yamahata@xxxxxxxxx wrote: > > From: Sean Christopherson <sean.j.christopherson@xxxxxxxxx> > > > > TDX supports only write-back(WB) memory type for private memory > > architecturally so that (virtualized) memory type change doesn't make sense > > for private memory. Also currently, page migration isn't supported for TDX > > yet. (TDX architecturally supports page migration. it's KVM and kernel > > implementation issue.) > > > > Regarding memory type change (mtrr virtualization and lapic page mapping > > change), pages are zapped by kvm_zap_gfn_range(). On the next KVM page > > fault, the SPTE entry with a new memory type for the page is populated. > > Regarding page migration, pages are zapped by the mmu notifier. On the next > > KVM page fault, the new migrated page is populated. Don't zap private > > pages on unmapping for those two cases. > > > > When deleting/moving a KVM memory slot, zap private pages. Typically > > tearing down VM. Don't invalidate private page tables. i.e. zap only leaf > > SPTEs for KVM mmu that has a shared bit mask. The existing > > kvm_tdp_mmu_invalidate_all_roots() depends on role.invalid with read-lock > > of mmu_lock so that other vcpu can operate on KVM mmu concurrently. It > > marks the root page table invalid and zaps SPTEs of the root page > > tables. The TDX module doesn't allow to unlink a protected root page table > > from the hardware and then allocate a new one for it. i.e. replacing a > > protected root page table. Instead, zap only leaf SPTEs for KVM mmu with a > > shared bit mask set. > > > > Signed-off-by: Sean Christopherson <sean.j.christopherson@xxxxxxxxx> > > Signed-off-by: Isaku Yamahata <isaku.yamahata@xxxxxxxxx> > > --- > > arch/x86/kvm/mmu/mmu.c | 85 ++++++++++++++++++++++++++++++++++++-- > > arch/x86/kvm/mmu/tdp_mmu.c | 24 ++++++++--- > > arch/x86/kvm/mmu/tdp_mmu.h | 5 ++- > > 3 files changed, 103 insertions(+), 11 deletions(-) > > > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c > > index faf69774c7ce..0237e143299c 100644 > > --- a/arch/x86/kvm/mmu/mmu.c > > +++ b/arch/x86/kvm/mmu/mmu.c > > @@ -1577,8 +1577,38 @@ bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range) > > if (kvm_memslots_have_rmaps(kvm)) > > flush = kvm_handle_gfn_range(kvm, range, kvm_zap_rmap); > > > > - if (is_tdp_mmu_enabled(kvm)) > > - flush = kvm_tdp_mmu_unmap_gfn_range(kvm, range, flush); > > + if (is_tdp_mmu_enabled(kvm)) { > > + bool zap_private; > > + > > + if (kvm_slot_can_be_private(range->slot)) { > > + if (range->flags & KVM_GFN_RANGE_FLAGS_RESTRICTED_MEM) > > + /* > > + * For private slot, the callback is triggered > > + * via falloc. Mode can be allocation or punch > ^ > fallocate(), please? > > > + * hole. Because the private-shared conversion > > + * is done via > > + * KVM_MEMORY_ENCRYPT_REG/UNREG_REGION, we can > > + * ignore the request from restrictedmem. > > + */ > > + return flush; > > Sorry why "private-shared conversion is done via KVM_MEMORY_ENCRYPT_REG" results > in "we can ignore the requres from restrictedmem"? > > If we punch a hole, the pages are de-allocated, correct? With v10 UPM, we can have zap_private = true always. With v9 UPM, the callback is triggered both for allocation and punch-hole without any further argument. With v10 UPM, the callback is triggered only for punching hole. > > > + else if (range->flags & KVM_GFN_RANGE_FLAGS_SET_MEM_ATTR) { > > + if (range->attr == KVM_MEM_ATTR_SHARED) > > + zap_private = true; > > + else { > > + WARN_ON_ONCE(range->attr != KVM_MEM_ATTR_PRIVATE); > > + zap_private = false; > > + } > > + } else > > + /* > > + * kvm_unmap_gfn_range() is called via mmu > > + * notifier. For now page migration for private > > + * page isn't supported yet, don't zap private > > + * pages. > > + */ > > + zap_private = false; > > Page migration is not the only reason that KVM will receive the MMU notifer -- > just say something like "for now all private pages are pinned during VM's life > time". Will update the comment. > > > > + } > > + flush = kvm_tdp_mmu_unmap_gfn_range(kvm, range, flush, zap_private); > > + } > > > > return flush; > > } > > @@ -6066,11 +6096,48 @@ static bool kvm_has_zapped_obsolete_pages(struct kvm *kvm) > > return unlikely(!list_empty_careful(&kvm->arch.zapped_obsolete_pages)); > > } > > > > +static void kvm_mmu_zap_memslot(struct kvm *kvm, struct kvm_memory_slot *slot) > > +{ > > + bool flush = false; > > + > > + write_lock(&kvm->mmu_lock); > > + > > + /* > > + * Zapping non-leaf SPTEs, a.k.a. not-last SPTEs, isn't required, worst > > + * case scenario we'll have unused shadow pages lying around until they > > + * are recycled due to age or when the VM is destroyed. > > + */ > > + if (is_tdp_mmu_enabled(kvm)) { > > + struct kvm_gfn_range range = { > > + .slot = slot, > > + .start = slot->base_gfn, > > + .end = slot->base_gfn + slot->npages, > > + .may_block = false, > > + }; > > + > > + /* > > + * this handles both private gfn and shared gfn. > > + * All private page should be zapped on memslot deletion. > > + */ > > + flush = kvm_tdp_mmu_unmap_gfn_range(kvm, &range, flush, true); > > + } else { > > + flush = slot_handle_level(kvm, slot, __kvm_zap_rmap, PG_LEVEL_4K, > > + KVM_MAX_HUGEPAGE_LEVEL, true); > > + } > > + if (flush) > > + kvm_flush_remote_tlbs(kvm); > > + > > + write_unlock(&kvm->mmu_lock); > > +} > > + > > static void kvm_mmu_invalidate_zap_pages_in_memslot(struct kvm *kvm, > > struct kvm_memory_slot *slot, > > struct kvm_page_track_notifier_node *node) > > { > > - kvm_mmu_zap_all_fast(kvm); > > + if (kvm_gfn_shared_mask(kvm)) > > + kvm_mmu_zap_memslot(kvm, slot); > > + else > > + kvm_mmu_zap_all_fast(kvm); > > } > > A comment would be nice here. Will add a comment. > > > > int kvm_mmu_init_vm(struct kvm *kvm) > > @@ -6173,8 +6240,18 @@ void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end) > > > > if (is_tdp_mmu_enabled(kvm)) { > > for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) > > + /* > > + * zap_private = true. Zap both private/shared pages. > > + * > > + * kvm_zap_gfn_range() is used when PAT memory type was > > Is it PAT or MTRR, or both (thus just memory type)? Both. Will update the comment. > > > + * changed. Later on the next kvm page fault, populate > > + * it with updated spte entry. > > + * Because only WB is supported for private pages, don't > > + * care of private pages. > > + */ > > Then why bother zapping private? If I read correctly, the changelog says "don't > zap private"? Right. Will fix. > > flush = kvm_tdp_mmu_zap_leafs(kvm, i, gfn_start, > > - gfn_end, true, flush); > > + gfn_end, true, flush, > > + true); > > } > > > > Btw, as you mentioned in the changelog, private memory always has WB memory > type, thus cannot be virtualized. Is it better to modify update_mtrr() to just > return early if the gfn range is purely private? MTRR support in cpuid is fixed to 1, PAT in cpuid is native. MTRR and PAT are supported on shared pages. > IMHO the handling of MTRR/PAT virtualization for TDX guest deserves dedicated > patch(es) to put them together so it's easier to review. Now the relevant parts > spread in multiple independent patches (MSR handling, vt_get_mt_mask(), etc). Ok, let me check it. -- Isaku Yamahata <isaku.yamahata@xxxxxxxxx>