On Wed, Mar 01, 2023 at 08:21:17AM -0800, Dave Hansen wrote: > On 2/20/23 10:38, Michael Roth wrote: > > +static int handle_split_page_fault(struct vm_fault *vmf) > > +{ > > + __split_huge_pmd(vmf->vma, vmf->pmd, vmf->address, false, NULL); > > + return 0; > > +} > > + > > /* > > * By the time we get here, we already hold the mm semaphore > > * > > @@ -5078,6 +5084,10 @@ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma, > > pmd_migration_entry_wait(mm, vmf.pmd); > > return 0; > > } > > + > > + if (flags & FAULT_FLAG_PAGE_SPLIT) > > + return handle_split_page_fault(&vmf); > > I asked this long ago, but how do you prevent these faults from > occurring on hugetlbfs mappings that can't be split? > In v6 there used to be a KVM ioctl to register a user HVA range for use with SEV-SNP guests, and as part of that registration the code would scan all the VMAs encompassed by that range and check for VM_HUGETLB in vma->vm_flags. With v7+ this registration mechanism has been replaced with the new restricted memfd implementation provided by UPM to manage private guest memory. Normal shmem/memfd backend can specify HugeTLBFS via a MFD_HUGETLB flag when creating the memfd, but for restricted memfd no special flags are allowed, so HugeTLBFS isn't possible for the pages that are used for private memory. Though it might make sense to enforce that in SNP-specific code still, in case restricted memfd does eventually gain that ability... But now, with v7+, the non-private memory that doesn't get allocated via restricted memfd (and thus can actually be mapped into userspace and used for things like buffers shared between host/guest), can still be allocated via HugeTLBFS since there is nothing SNP is doing to specifically guard against that. So we'd probably want to reimplement similar logic to what was in v6 to guard against this, since it's these mapping that would potentially be triggering the RMP faults and require splitting. However... The fact that any pages potentially triggering these #PFs are able to be mapped as 2M in the first place means that all the PFNs covered by that 2M mapping must also been allocated by via mappable/VMA memory rather than via restricted memfd where userspace mappings are not possible. So I think we should be able to drop this patch entirely, as well as allow the use of HugeTLBFS for non-restricted memfd memory (though eventually the guest will switch all its memory to private/restricted so not gaining much there other than reducing management complexity). -Mike