On 11/12/21 7:43 AM, Peter Gonda wrote: ... > Here is an alternative to the current approach: On RMP violation (host > or userspace) the page fault handler converts the page from private to > shared to allow the write to continue. This pulls from s390’s error > handling which does exactly this. See ‘arch_make_page_accessible()’. > Additionally it adds less complexity to the SNP kernel patches, and > requires no new ABI. I think it's important to very carefully describe where these RMP page faults can occur within the kernel. They can obvious occur on things like: copy_to_user(&user_buf, &kernel_buf, len); That's not a big deal. Those can obviously handle page faults. We know exactly the instruction on which the fault can occur and we handle it gracefully. *But*, these are harder: get_user_pages(addr, len, &pages); spin_lock(&lock); ptr = kmap_atomic(pages[0]); *ptr = foo; // #PF here kunmap_atomic(ptr); spin_unlock(&lock); put_page(pages[0]); In this case, the place where the fault happens are not especially well bounded. It can be in compiler-generated code. It can happen on any random instruction in there. Or, is there some mechanism that prevent guest-private memory from being accessed in random host kernel code? > This proposal does cause guest memory corruption for some bugs but one > of SEV-SNP’s goals extended from SEV-ES’s goals is for guest’s to be > able to detect when its memory has been corrupted / replayed by the > host. So SNP already has features for allowing guests to detect this > kind of memory corruption. Additionally this is very similar to a page > of memory generating a machine check because of 2-bit memory > corruption. In other words SNP guests must be enlightened and ready > for these kinds of errors. > > For an SNP guest running under this proposal the flow would look like this: > * Host gets a #PF because its trying to write to a private page. > * Host #PF handler updates the page to shared. > * Write continues normally. > * Guest accesses memory (r/w). > * Guest gets a #VC error because the page is not PVALIDATED > * Guest is now in control. Guest can terminate because its memory has > been corrupted. Guest could try and continue to log the error to its > owner. This sounds like a _possible_ opportunity for the guest to do some extra handling. It's also quite possible that this #VC happens in a place that the guest can't handle. > A similar approach was introduced in the SNP patches V1 and V2 for > kernel page fault handling. The pushback around this convert to shared > approach was largely focused around the idea that the kernel has all > the information about which pages are shared vs private so it should > be able to check shared status before write to pages. After V2 the > patches were updated to not have a kernel page fault handler for RMP > violations (other than dumping state during a panic). The current > patches protect the host with new post_{map,unmap}_gfn() function that > checks if a page is shared before mapping it, then locks the page > shared until unmapped. Given the discussions on ‘[Part2,v5,39/45] KVM: > SVM: Introduce ops for the post gfn map and unmap’ building a solution > to do this is non trivial and adds new overheads to KVM. Additionally > the current solution is local to the kernel. So a new ABI just now be > created to allow the userspace VMM to access the kernel-side locks for > this to work generically for the whole host. This is more complicated > than this proposal and adding more lock holders seems like it could > reduce performance further. The locking is complicated. But, I think it is necessary. Once the kernel can map private memory, it can access it in random spots. It's hard to make random kernel code recoverable from faults.