Re: [PATCH RFC v8 17/56] x86/fault: Add support to handle the RMP fault for user address

Michael Roth <michael.roth@xxxxxxx> · Tue, 28 Mar 2023 18:31:01 -0500

On Wed, Mar 01, 2023 at 08:21:17AM -0800, Dave Hansen wrote:
> On 2/20/23 10:38, Michael Roth wrote:
> > +static int handle_split_page_fault(struct vm_fault *vmf)
> > +{
> > +	__split_huge_pmd(vmf->vma, vmf->pmd, vmf->address, false, NULL);
> > +	return 0;
> > +}
> > +
> >  /*
> >   * By the time we get here, we already hold the mm semaphore
> >   *
> > @@ -5078,6 +5084,10 @@ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma,
> >  				pmd_migration_entry_wait(mm, vmf.pmd);
> >  			return 0;
> >  		}
> > +
> > +		if (flags & FAULT_FLAG_PAGE_SPLIT)
> > +			return handle_split_page_fault(&vmf);
> 
> I asked this long ago, but how do you prevent these faults from
> occurring on hugetlbfs mappings that can't be split?
> 

In v6 there used to be a KVM ioctl to register a user HVA range for use
with SEV-SNP guests, and as part of that registration the code would scan
all the VMAs encompassed by that range and check for VM_HUGETLB in
vma->vm_flags.

With v7+ this registration mechanism has been replaced with the
new restricted memfd implementation provided by UPM to manage private guest
memory. Normal shmem/memfd backend can specify HugeTLBFS via a
MFD_HUGETLB flag when creating the memfd, but for restricted memfd no
special flags are allowed, so HugeTLBFS isn't possible for the pages
that are used for private memory. Though it might make sense to enforce
that in SNP-specific code still, in case restricted memfd does
eventually gain that ability...

But now, with v7+, the non-private memory that doesn't get allocated via
restricted memfd (and thus can actually be mapped into userspace and
used for things like buffers shared between host/guest), can still be
allocated via HugeTLBFS since there is nothing SNP is doing to
specifically guard against that. So we'd probably want to reimplement
similar logic to what was in v6 to guard against this, since it's these
mapping that would potentially be triggering the RMP faults and require
splitting.

However...

The fact that any pages potentially triggering these #PFs are able to be
mapped as 2M in the first place means that all the PFNs covered by that
2M mapping must also been allocated by via mappable/VMA memory rather
than via restricted memfd where userspace mappings are not possible.

So I think we should be able to drop this patch entirely, as well as
allow the use of HugeTLBFS for non-restricted memfd memory (though
eventually the guest will switch all its memory to private/restricted
so not gaining much there other than reducing management complexity).

-Mike