On Mon, Jul 15, 2024 at 10:57:35PM +0000, Edgecombe, Rick P wrote: > On Mon, 2024-07-15 at 16:47 -0500, Michael Roth wrote: > > Makes sense. > > > > If we document mutual exclusion between ranges touched by > > gmem_populate() vs ranges touched by actual userspace issuance of > > KVM_PRE_FAULT_MEMORY (and make sure nothing too crazy happens if users > > don't abide by the documentation), then I think most problems go away... > > > > But there is still at least one awkward constraint for SNP: > > KVM_PRE_FAULT_MEMORY cannot be called for private GPA ranges until after > > SNP_LAUNCH_START is called. This is true even if the GPA range is not > > one of the ranges that will get passed to > > gmem_populate()/SNP_LAUNCH_UPDATE. The reason for this is that when > > binding the ASID to the SNP context as part of SNP_LAUNCH_START, firmware > > will perform checks to make sure that ASID is not already being used in > > the RMP table, and that check will fail if KVM_PRE_FAULT_MEMORY triggered > > for a private page before calling SNP_LAUNCH_START. > > > > So we have this constraint that KVM_PRE_FAULT_MEMORY can't be issued > > before SNP_LAUNCH_START. So it makes me wonder if we should just broaden > > that constraint and for now just disallow KVM_PRE_FAULT_MEMORY prior to > > finalizing a guest, since it'll be easier to lift that restriction later > > versus discovering some other sort of edge case and need to > > retroactively place restrictions. > > > > I've taken Isaku's original pre_fault_memory_test and added a new > > x86-specific coco_pre_fault_memory_test to try to better document and > > exercise these corner cases for SEV and SNP, but I'm hoping it could > > also be useful for TDX (hence the generic name). These are based on > > Pratik's initial SNP selftests (which are in turn based on kvm/queue + > > these patches): > > > > https://github.com/mdroth/linux/commits/snp-uptodate0-kst/ > > > > https://github.com/mdroth/linux/commit/dd7d4b1983ceeb653132cfd54ad63f597db85f74 > > > > > > From the TDX side it wouldn't be horrible to not have to worry about userspace > mucking around with the mirrored page tables in unexpected ways during the > special period. TDX already has its own "finalized" state in kvm_tdx, is there > something similar on the SEV side we could unify with? Unfortunately there isn't currently anything in place like that for SNP, but if we had a common 'finalized' field somewhere it could easily be set in snp_launch_finish() as well. > > I looked at moving from kvm_arch_vcpu_pre_fault_memory() to directly calling > kvm_tdp_map_page(), so we could potentially put whatever check in > kvm_arch_vcpu_pre_fault_memory(). It required a couple exports: Thanks. I'll give this a spin for SNP as well. -Mike > > diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h > index 03737f3aaeeb..9004ac597a85 100644 > --- a/arch/x86/kvm/mmu.h > +++ b/arch/x86/kvm/mmu.h > @@ -277,6 +277,7 @@ extern bool tdp_mmu_enabled; > #endif > > int kvm_tdp_mmu_get_walk_mirror_pfn(struct kvm_vcpu *vcpu, u64 gpa, kvm_pfn_t > *pfn); > +int kvm_tdp_map_page(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code, u8 > *level); > > static inline bool kvm_memslots_have_rmaps(struct kvm *kvm) > { > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c > index 7bb6b17b455f..4a3e471ec9fe 100644 > --- a/arch/x86/kvm/mmu/mmu.c > +++ b/arch/x86/kvm/mmu/mmu.c > @@ -4721,8 +4721,7 @@ int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct > kvm_page_fault *fault) > return direct_page_fault(vcpu, fault); > } > > -static int kvm_tdp_map_page(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code, > - u8 *level) > +int kvm_tdp_map_page(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code, u8 > *level) > { > int r; > > @@ -4759,6 +4758,7 @@ static int kvm_tdp_map_page(struct kvm_vcpu *vcpu, gpa_t > gpa, u64 error_code, > return -EIO; > } > } > +EXPORT_SYMBOL_GPL(kvm_tdp_map_page); > > long kvm_arch_vcpu_pre_fault_memory(struct kvm_vcpu *vcpu, > struct kvm_pre_fault_memory *range) > @@ -5770,6 +5770,7 @@ int kvm_mmu_load(struct kvm_vcpu *vcpu) > out: > return r; > } > +EXPORT_SYMBOL_GPL(kvm_mmu_load); > > void kvm_mmu_unload(struct kvm_vcpu *vcpu) > { > diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c > index 9ac0821eb44b..7161ef68f3da 100644 > --- a/arch/x86/kvm/vmx/tdx.c > +++ b/arch/x86/kvm/vmx/tdx.c > @@ -2809,11 +2809,13 @@ struct tdx_gmem_post_populate_arg { > static int tdx_gmem_post_populate(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn, > void __user *src, int order, void *_arg) > { > + u64 error_code = PFERR_GUEST_FINAL_MASK | PFERR_PRIVATE_ACCESS; > struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm); > struct tdx_gmem_post_populate_arg *arg = _arg; > struct kvm_vcpu *vcpu = arg->vcpu; > struct kvm_memory_slot *slot; > gpa_t gpa = gfn_to_gpa(gfn); > + u8 level = PG_LEVEL_4K; > struct page *page; > kvm_pfn_t mmu_pfn; > int ret, i; > @@ -2832,6 +2834,10 @@ static int tdx_gmem_post_populate(struct kvm *kvm, gfn_t > gfn, kvm_pfn_t pfn, > goto out_put_page; > } > > + ret = kvm_tdp_map_page(vcpu, gpa, error_code, &level); > + if (ret < 0) > + goto out_put_page; > + > read_lock(&kvm->mmu_lock); > > ret = kvm_tdp_mmu_get_walk_mirror_pfn(vcpu, gpa, &mmu_pfn); > @@ -2910,6 +2916,7 @@ static int tdx_vcpu_init_mem_region(struct kvm_vcpu *vcpu, > struct kvm_tdx_cmd *c > mutex_lock(&kvm->slots_lock); > idx = srcu_read_lock(&kvm->srcu); > > + kvm_mmu_reload(vcpu); > ret = 0; > while (region.nr_pages) { > if (signal_pending(current)) { >