On Tue, Apr 23, 2024, Isaku Yamahata wrote: > On Thu, Apr 04, 2024 at 02:50:31PM -0400, > Paolo Bonzini <pbonzini@xxxxxxxxxx> wrote: > > > During guest run-time, kvm_arch_gmem_prepare() is issued as needed to > > prepare newly-allocated gmem pages prior to mapping them into the guest. > > In the case of SEV-SNP, this mainly involves setting the pages to > > private in the RMP table. > > > > However, for the GPA ranges comprising the initial guest payload, which > > are encrypted/measured prior to starting the guest, the gmem pages need > > to be accessed prior to setting them to private in the RMP table so they > > can be initialized with the userspace-provided data. Additionally, an > > SNP firmware call is needed afterward to encrypt them in-place and > > measure the contents into the guest's launch digest. > > > > While it is possible to bypass the kvm_arch_gmem_prepare() hooks so that > > this handling can be done in an open-coded/vendor-specific manner, this > > may expose more gmem-internal state/dependencies to external callers > > than necessary. Try to avoid this by implementing an interface that > > tries to handle as much of the common functionality inside gmem as > > possible, while also making it generic enough to potentially be > > usable/extensible for TDX as well. > > I explored how TDX will use this hook. However, it resulted in not using this > hook, and instead used kvm_tdp_mmu_get_walk() with a twist. The patch is below. > > Because SEV-SNP manages the RMP that is not tied to NPT directly, SEV-SNP can > ignore TDP MMU page tables when updating RMP. > On the other hand, TDX essentially updates Secure-EPT when it adds a page to > the guest by TDH.MEM.PAGE.ADD(). It needs to protect KVM TDP MMU page tables > with mmu_lock, not guest memfd file mapping with invalidate_lock. The hook > doesn't apply to TDX well. The resulted KVM_TDX_INIT_MEM_REGION logic is as > follows. > > get_user_pages_fast(source addr) > read_lock(mmu_lock) > kvm_tdp_mmu_get_walk_private_pfn(vcpu, gpa, &pfn); > if the page table doesn't map gpa, error. > TDH.MEM.PAGE.ADD() > TDH.MR.EXTEND() > read_unlock(mmu_lock) > put_page() Hmm, KVM doesn't _need_ to use invalidate_lock to protect against guest_memfd invalidation, but I also don't see why it would cause problems. I.e. why not take mmu_lock() in TDX's post_populate() implementation? That would allow having a sanity check that the PFN that guest_memfd() has is indeed the PFN that KVM's S-EPT mirror has, i.e. the PFN that KVM is going to PAGE.ADD. > >From 7d4024049b51969a2431805c2117992fc7ec0981 Mon Sep 17 00:00:00 2001 > Message-ID: <7d4024049b51969a2431805c2117992fc7ec0981.1713913379.git.isaku.yamahata@xxxxxxxxx> > In-Reply-To: <cover.1713913379.git.isaku.yamahata@xxxxxxxxx> > References: <cover.1713913379.git.isaku.yamahata@xxxxxxxxx> > From: Isaku Yamahata <isaku.yamahata@xxxxxxxxx> > Date: Tue, 23 Apr 2024 11:33:44 -0700 > Subject: [PATCH] KVM: x86/tdp_mmu: Add a helper function to walk down the TDP MMU > > KVM_TDX_INIT_MEM_REGION needs to check if the given GFN is already > populated. Add wrapping logic to kvm_tdp_mmu_get_walk() to export it. > +int kvm_tdp_mmu_get_walk_private_pfn(struct kvm_vcpu *vcpu, u64 gpa, > + kvm_pfn_t *pfn) > +{ > + u64 sptes[PT64_ROOT_MAX_LEVEL + 1], spte; > + int leaf; > + > + lockdep_assert_held(&vcpu->kvm->mmu_lock); > + > + kvm_tdp_mmu_walk_lockless_begin(); This is obviously not a lockless walk. > + leaf = __kvm_tdp_mmu_get_walk(vcpu, gpa, sptes, true); > + kvm_tdp_mmu_walk_lockless_end(); > + if (leaf < 0) > + return -ENOENT; > + > + spte = sptes[leaf]; > + if (is_shadow_present_pte(spte) && is_last_spte(spte, leaf)) { > + *pfn = spte_to_pfn(spte); > + return leaf; > } > > - return leaf; > + return -ENOENT; > } > +EXPORT_SYMBOL_GPL(kvm_tdp_mmu_get_walk_private_pfn); > > /* > * Returns the last level spte pointer of the shadow page walk for the given > -- > 2.43.2 > > -- > Isaku Yamahata <isaku.yamahata@xxxxxxxxx>