On Tue, May 21, 2024, Kai Huang wrote: > On 21/05/2024 5:35 am, Sean Christopherson wrote: > > On Mon, May 20, 2024, Kai Huang wrote: > > > I am wondering whether this can be done in the KVM page fault handler? > > > > No, because the state of a pfn in the RMP is tied to the guest_memfd inode, > > not to the file descriptor, i.e. not to an individual VM. > > It's strange that as state of a PFN of SNP doesn't bind to individual VM, at > least for the private pages. The command rpm_make_private() indeed reflects > the mapping between PFN <-> <GFN, SSID>. s/SSID/ASID KVM allows a single ASID to be bound to multiple "struct kvm" instances, e.g. for intra-host migration. If/when trusted I/O is a thing, presumably KVM will also need to share the ASID with other entities, e.g. IOMMUFD. > rc = rmp_make_private(pfn_aligned, gfn_to_gpa(gfn_aligned), > level, sev->asid, false); > > > And the NPT page tables are treated as ephemeral for SNP. > > Do you mean private mappings for SNP guest can be zapped from the VM (the > private pages are still there unchanged) and re-mapped later w/o needing to > have guest's explicit acceptance? Correct. > If so, I think "we can zap" doesn't mean "we need to zap"? Correct. > Because the privates are now pinned anyway. Pinning is an orthogonal issue. And it's not so much that the pfns are pinned as it is that guest_memfd simply doesn't support page migration or swap at this time. Regardless of whether or not guest_memfd supports page migration, KVM needs to track the state of the physical page in guest_memfd, e.g. if it's been assigned to the ASID versus if it's still in a shared state. > If we truly want to zap private mappings for SNP, IIUC it can be done by > distinguishing whether a VM needs to use a separate private table, which is > TDX-only. I wouldn't say we "want" to zap private mappings for SNP, rather that it's a lot less work to keep KVM's existing behavior (literally do nothing) than it is to rework the MMU and whatnot to not zap SPTEs. And there's no big motivation to avoid zapping because SNP VMs are unlikely to delete memslots. If it turns out that it's easy to preserve SNP mappings after TDX lands, then we can certainly go that route, but AFAIK there's no reason to force the issue.