Re: [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support

Dave Hansen <dave.hansen@xxxxxxxxx> · Fri, 12 Nov 2021 09:59:46 -0800

On 11/12/21 7:43 AM, Peter Gonda wrote:
...
> Here is an alternative to the current approach: On RMP violation (host
> or userspace) the page fault handler converts the page from private to
> shared to allow the write to continue. This pulls from s390’s error
> handling which does exactly this. See ‘arch_make_page_accessible()’.
> Additionally it adds less complexity to the SNP kernel patches, and
> requires no new ABI.

I think it's important to very carefully describe where these RMP page
faults can occur within the kernel.  They can obvious occur on things like:

	copy_to_user(&user_buf, &kernel_buf, len);

That's not a big deal.  Those can obviously handle page faults.  We know
exactly the instruction on which the fault can occur and we handle it
gracefully.

*But*, these are harder:

	get_user_pages(addr, len, &pages);
	spin_lock(&lock);
	ptr = kmap_atomic(pages[0]);
	*ptr = foo; // #PF here
	kunmap_atomic(ptr);
	spin_unlock(&lock);
	put_page(pages[0]);

In this case, the place where the fault happens are not especially well
bounded.  It can be in compiler-generated code.  It can happen on any
random instruction in there.

Or, is there some mechanism that prevent guest-private memory from being
accessed in random host kernel code?

> This proposal does cause guest memory corruption for some bugs but one
> of SEV-SNP’s goals extended from SEV-ES’s goals is for guest’s to be
> able to detect when its memory has been corrupted / replayed by the
> host. So SNP already has features for allowing guests to detect this
> kind of memory corruption. Additionally this is very similar to a page
> of memory generating a machine check because of 2-bit memory
> corruption. In other words SNP guests must be enlightened and ready
> for these kinds of errors.
> 
> For an SNP guest running under this proposal the flow would look like this:
> * Host gets a #PF because its trying to write to a private page.
> * Host #PF handler updates the page to shared.
> * Write continues normally.
> * Guest accesses memory (r/w).
> * Guest gets a #VC error because the page is not PVALIDATED
> * Guest is now in control. Guest can terminate because its memory has
> been corrupted. Guest could try and continue to log the error to its
> owner.

This sounds like a _possible_ opportunity for the guest to do some extra
handling.  It's also quite possible that this #VC happens in a place
that the guest can't handle.

> A similar approach was introduced in the SNP patches V1 and V2 for
> kernel page fault handling. The pushback around this convert to shared
> approach was largely focused around the idea that the kernel has all
> the information about which pages are shared vs private so it should
> be able to check shared status before write to pages. After V2 the
> patches were updated to not have a kernel page fault handler for RMP
> violations (other than dumping state during a panic). The current
> patches protect the host with new post_{map,unmap}_gfn() function that
> checks if a page is shared before mapping it, then locks the page
> shared until unmapped. Given the discussions on ‘[Part2,v5,39/45] KVM:
> SVM: Introduce ops for the post gfn map and unmap’ building a solution
> to do this is non trivial and adds new overheads to KVM. Additionally
> the current solution is local to the kernel. So a new ABI just now be
> created to allow the userspace VMM to access the kernel-side locks for
> this to work generically for the whole host. This is more complicated
> than this proposal and adding more lock holders seems like it could
> reduce performance further.

The locking is complicated.  But, I think it is necessary.  Once the
kernel can map private memory, it can access it in random spots.  It's
hard to make random kernel code recoverable from faults.