On Wed, May 15, 2024 at 03:32:31PM -0700, Sean Christopherson wrote: > On Fri, May 10, 2024, Michael Roth wrote: > > Implement a platform hook to do the work of restoring the direct map > > entries of gmem-managed pages and transitioning the corresponding RMP > > table entries back to the default shared/hypervisor-owned state. > > ... > > > +void sev_gmem_invalidate(kvm_pfn_t start, kvm_pfn_t end) > > +{ > > + kvm_pfn_t pfn; > > + > > + pr_debug("%s: PFN start 0x%llx PFN end 0x%llx\n", __func__, start, end); > > + > > + for (pfn = start; pfn < end;) { > > + bool use_2m_update = false; > > + int rc, rmp_level; > > + bool assigned; > > + > > + rc = snp_lookup_rmpentry(pfn, &assigned, &rmp_level); > > + if (WARN_ONCE(rc, "SEV: Failed to retrieve RMP entry for PFN 0x%llx error %d\n", > > + pfn, rc)) > > + goto next_pfn; > > This is comically trivial to hit, as it fires when running guest_memfd_test on a > !SNP host. Presumably the correct fix is to simply do nothing for !sev_snp_guest(), > but that's easier said than done due to the lack of a @kvm in .gmem_invalidate(). Yah, the code assumes that SNP is the only SVM user that would use gmem pages. Unfortunately KVM_X86_SW_PROTECTED_VM is the one other situation where this can be the case. The minimal fix would be to squash the below into this patch: diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c index 176ba117413a..56b0b59b8263 100644 --- a/arch/x86/kvm/svm/sev.c +++ b/arch/x86/kvm/svm/sev.c @@ -4675,6 +4675,9 @@ void sev_gmem_invalidate(kvm_pfn_t start, kvm_pfn_t end) { kvm_pfn_t pfn; + if (!cc_platform_has(CC_ATTR_HOST_SEV_SNP)) + return; + pr_debug("%s: PFN start 0x%llx PFN end 0x%llx\n", __func__, start, end); for (pfn = start; pfn < end;) { It's not perfect because the callback will still run for KVM_X86_SW_PROTECTED_VM if SNP is enabled, but in the context of KVM_X86_SW_PROTECTED_VM being a stand-in for testing SNP/TDX, that might not be such a bad thing. Longer term if we need something more robust would be to modify the .free_folio callback path to pass along folio->mapping, or switch to something else that provides similar functionality. Another approach might be to set .free_folio dynamically based on the vm_type of the gmem user when creating the gmem instance. > > That too is not a big fix, but that's beside the point. IMO, the fact that I'm > the first person to (completely inadvertantly) hit this rather basic bug is a > good hint that we should wait until 6.11 to merge SNP support. We do regular testing of normal guests with/without SNP enabled, but unfortunately we've only been doing KST runs on SNP-enabled hosts. I've retested with the above fix and everything looks good with SVM/SEV/SEV-ES/SNP/selftests with and without SNP enabled, but I understand if we still have reservations after this. -Mike