On Thu, Apr 04, 2024 at 02:50:31PM -0400, Paolo Bonzini <pbonzini@xxxxxxxxxx> wrote: > During guest run-time, kvm_arch_gmem_prepare() is issued as needed to > prepare newly-allocated gmem pages prior to mapping them into the guest. > In the case of SEV-SNP, this mainly involves setting the pages to > private in the RMP table. > > However, for the GPA ranges comprising the initial guest payload, which > are encrypted/measured prior to starting the guest, the gmem pages need > to be accessed prior to setting them to private in the RMP table so they > can be initialized with the userspace-provided data. Additionally, an > SNP firmware call is needed afterward to encrypt them in-place and > measure the contents into the guest's launch digest. > > While it is possible to bypass the kvm_arch_gmem_prepare() hooks so that > this handling can be done in an open-coded/vendor-specific manner, this > may expose more gmem-internal state/dependencies to external callers > than necessary. Try to avoid this by implementing an interface that > tries to handle as much of the common functionality inside gmem as > possible, while also making it generic enough to potentially be > usable/extensible for TDX as well. I explored how TDX will use this hook. However, it resulted in not using this hook, and instead used kvm_tdp_mmu_get_walk() with a twist. The patch is below. Because SEV-SNP manages the RMP that is not tied to NPT directly, SEV-SNP can ignore TDP MMU page tables when updating RMP. On the other hand, TDX essentially updates Secure-EPT when it adds a page to the guest by TDH.MEM.PAGE.ADD(). It needs to protect KVM TDP MMU page tables with mmu_lock, not guest memfd file mapping with invalidate_lock. The hook doesn't apply to TDX well. The resulted KVM_TDX_INIT_MEM_REGION logic is as follows. get_user_pages_fast(source addr) read_lock(mmu_lock) kvm_tdp_mmu_get_walk_private_pfn(vcpu, gpa, &pfn); if the page table doesn't map gpa, error. TDH.MEM.PAGE.ADD() TDH.MR.EXTEND() read_unlock(mmu_lock) put_page() >From 7d4024049b51969a2431805c2117992fc7ec0981 Mon Sep 17 00:00:00 2001 Message-ID: <7d4024049b51969a2431805c2117992fc7ec0981.1713913379.git.isaku.yamahata@xxxxxxxxx> In-Reply-To: <cover.1713913379.git.isaku.yamahata@xxxxxxxxx> References: <cover.1713913379.git.isaku.yamahata@xxxxxxxxx> From: Isaku Yamahata <isaku.yamahata@xxxxxxxxx> Date: Tue, 23 Apr 2024 11:33:44 -0700 Subject: [PATCH] KVM: x86/tdp_mmu: Add a helper function to walk down the TDP MMU KVM_TDX_INIT_MEM_REGION needs to check if the given GFN is already populated. Add wrapping logic to kvm_tdp_mmu_get_walk() to export it. Alternatives are as follows. Choose the approach of this patch as the least intrusive change. - Refactor kvm page fault handler. Populating part and unlock function. The page fault handler to populate with keeping lock, TDH.MEM.PAGE.ADD(), unlock. - Add a callback function to struct kvm_page_fault and call it after the page fault handler before unlocking mmu_lock and releasing PFN. Based on the feedback of https://lore.kernel.org/kvm/ZfBkle1eZFfjPI8l@xxxxxxxxxx/ https://lore.kernel.org/kvm/Zh8DHbb8FzoVErgX@xxxxxxxxxx/ Signed-off-by: Isaku Yamahata <isaku.yamahata@xxxxxxxxx> --- arch/x86/kvm/mmu.h | 3 +++ arch/x86/kvm/mmu/tdp_mmu.c | 44 ++++++++++++++++++++++++++++++++------ 2 files changed, 40 insertions(+), 7 deletions(-) diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h index 712e9408f634..4f61f4b9fd64 100644 --- a/arch/x86/kvm/mmu.h +++ b/arch/x86/kvm/mmu.h @@ -287,6 +287,9 @@ extern bool tdp_mmu_enabled; #define tdp_mmu_enabled false #endif +int kvm_tdp_mmu_get_walk_private_pfn(struct kvm_vcpu *vcpu, u64 gpa, + kvm_pfn_t *pfn); + static inline bool kvm_memslots_have_rmaps(struct kvm *kvm) { return !tdp_mmu_enabled || kvm_shadow_root_allocated(kvm); diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 3592ae4e485f..bafcd8aeb3b3 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -2035,14 +2035,25 @@ bool kvm_tdp_mmu_write_protect_gfn(struct kvm *kvm, * * Must be called between kvm_tdp_mmu_walk_lockless_{begin,end}. */ -int kvm_tdp_mmu_get_walk(struct kvm_vcpu *vcpu, u64 addr, u64 *sptes, - int *root_level) +static int __kvm_tdp_mmu_get_walk(struct kvm_vcpu *vcpu, u64 addr, u64 *sptes, + bool is_private) { struct tdp_iter iter; struct kvm_mmu *mmu = vcpu->arch.mmu; gfn_t gfn = addr >> PAGE_SHIFT; int leaf = -1; + tdp_mmu_for_each_pte(iter, mmu, is_private, gfn, gfn + 1) { + leaf = iter.level; + sptes[leaf] = iter.old_spte; + } + + return leaf; +} + +int kvm_tdp_mmu_get_walk(struct kvm_vcpu *vcpu, u64 addr, u64 *sptes, + int *root_level) +{ *root_level = vcpu->arch.mmu->root_role.level; /* @@ -2050,15 +2061,34 @@ int kvm_tdp_mmu_get_walk(struct kvm_vcpu *vcpu, u64 addr, u64 *sptes, * instructions in protected guest memory can't be parsed by VMM. */ if (WARN_ON_ONCE(kvm_gfn_shared_mask(vcpu->kvm))) - return leaf; + return -1; - tdp_mmu_for_each_pte(iter, mmu, false, gfn, gfn + 1) { - leaf = iter.level; - sptes[leaf] = iter.old_spte; + return __kvm_tdp_mmu_get_walk(vcpu, addr, sptes, false); +} + +int kvm_tdp_mmu_get_walk_private_pfn(struct kvm_vcpu *vcpu, u64 gpa, + kvm_pfn_t *pfn) +{ + u64 sptes[PT64_ROOT_MAX_LEVEL + 1], spte; + int leaf; + + lockdep_assert_held(&vcpu->kvm->mmu_lock); + + kvm_tdp_mmu_walk_lockless_begin(); + leaf = __kvm_tdp_mmu_get_walk(vcpu, gpa, sptes, true); + kvm_tdp_mmu_walk_lockless_end(); + if (leaf < 0) + return -ENOENT; + + spte = sptes[leaf]; + if (is_shadow_present_pte(spte) && is_last_spte(spte, leaf)) { + *pfn = spte_to_pfn(spte); + return leaf; } - return leaf; + return -ENOENT; } +EXPORT_SYMBOL_GPL(kvm_tdp_mmu_get_walk_private_pfn); /* * Returns the last level spte pointer of the shadow page walk for the given -- 2.43.2 -- Isaku Yamahata <isaku.yamahata@xxxxxxxxx>