> On May 3, 2021, at 10:19 AM, Brijesh Singh <brijesh.singh@xxxxxxx> wrote: > > >> On 5/3/21 11:15 AM, Dave Hansen wrote: >>> On 5/3/21 8:37 AM, Brijesh Singh wrote: >>> GHCB was just an example. Another example is a vfio driver accessing the >>> shared page. If those pages are not marked shared then kernel access >>> will cause an RMP fault. Ideally we should not be running into this >>> situation, but if we do, then I am trying to see how best we can avoid >>> the host crashes. >> I'm confused. Are you suggesting that the VFIO driver could be passed >> an address such that the host kernel would blindly try to write private >> guest memory? > > Not blindly. But a guest could trick a VMM (qemu) to ask the host driver > to access a GPA which is guest private page (Its a hypothetical case, so > its possible that I may missing something). Let's see with an example: > > - A guest provides a GPA to VMM to write to (e.g DMA operation). > > - VMM translates the GPA->HVA and calls down to host kernel with the HVA. > > - The host kernel may pin the HVA to get the PFN for it and then kmap(). > Write to the mapped PFN will cause an RMP fault if the guest provided > GPA was not a marked shared in the RMP table. In an ideal world, a guest > should *never* do this but what if it does ? > > >> The host kernel *knows* which memory is guest private and what is >> shared. It had to set it up in the first place. It can also consult >> the RMP at any time if it somehow forgot. >> >> So, this scenario seems to be that the host got a guest physical address >> (gpa) from the guest, it did a gpa->hpa->hva conversion and then wrote >> the page all without bothering to consult the RMP. Shouldn't the the >> gpa->hpa conversion point offer a perfect place to determine if the page >> is shared or private? > > The GPA->HVA is typically done by the VMM, and HVA->HPA is done by the > host drivers. So, only time we could verify is after the HVA->HPA. One > of my patch provides a snp_lookup_page_in_rmptable() helper that can be > used to query the page state in the RMP table. This means the all the > host backend drivers need to enlightened to always read the RMP table > before making a write access to guest provided GPA. A good guest should > *never* be using a private page for the DMA operation and if it does > then the fault handler introduced in this patch can avoid the host crash > and eliminate the need to enlightened the drivers to check for the > permission before the access. Can we arrange for the page walk plus kmap process to fail? > > I felt it is good idea to have some kind of recovery specially when a > malicious guest could lead us into this path. > > >> >>> Another reason for having this is to catch the hypervisor bug, during >>> the SNP guest create, the KVM allocates few backing pages and sets the >>> assigned bit for it (the examples are VMSA, and firmware context page). >>> If hypervisor accidentally free's these pages without clearing the >>> assigned bit in the RMP table then it will result in RMP fault and thus >>> a kernel crash. >> I think I'd be just fine with a BUG_ON() in those cases instead of an >> attempt to paper over the issue. Kernel crashes are fine in the case of >> kernel bugs. > > Yes, fine with me. > > >> >>>> Or, worst case, you could use exception tables and something like >>>> copy_to_user() to write to the GHCB. That way, the thread doing the >>>> write can safely recover from the fault without the instruction actually >>>> ever finishing execution. >>>> >>>> BTW, I went looking through the spec. I didn't see anything about the >>>> guest being able to write the "Assigned" RMP bit. Did I miss that? >>>> Which of the above three conditions is triggered by the guest failing to >>>> make the GHCB page shared? >>> The GHCB spec section "Page State Change" provides an interface for the >>> guest to request the page state change. During bootup, the guest uses >>> the Page State Change VMGEXIT to request hypervisor to make the page >>> shared. The hypervisor uses the RMPUPDATE instruction to write to >>> "assigned" bit in the RMP table. >> Right... So the *HOST* is in control. Why should the host ever be >> surprised by a page transitioning from shared to private? > > I am trying is a cover a malicious guest cases. A good guest should > follow the GHCB spec and change the page state before the access. > >> >>> On VMGEXIT, the very first thing which vmgexit handler does is to map >>> the GHCB page for the access and then later using the copy_to_user() to >>> sync the GHCB updates from hypervisor to guest. The copy_to_user() will >>> cause a RMP fault if the GHCB is not mapped shared. As I explained >>> above, GHCB page was just an example, vfio or other may also get into >>> this situation. >> Causing an RMP fault is fine. The problem is shoving a whole bunch of >> *recovery* code in the kernel when recovery isn't necessary. Just look >> for the -EFAULT from copy_to_user() and move on with life.