Re: [PATCH v11 18/35] KVM: SEV: Add KVM_SEV_SNP_LAUNCH_UPDATE command

Sean Christopherson <seanjc@xxxxxxxxxx> · Tue, 6 Feb 2024 18:43:38 -0800

On Wed, Feb 07, 2024, Paolo Bonzini wrote:
> On Fri, Feb 2, 2024 at 11:55 PM Sean Christopherson <seanjc@xxxxxxxxxx> wrote:
> > > It doesn't really matter if the attributes are set before or after
> > > KVM_SNP_LAUNCH_UPDATE, only that by the time the guest actually launches
> > > they pages get set to private so they get faulted in from gmem. We could
> > > document our expectations and enforce them here if that's preferable
> > > however. Maybe requiring KVM_SET_MEMORY_ATTRIBUTES(private) in advance
> > > would make it easier to enforce that userspace does the right thing.
> > > I'll see how that looks if there are no objections.
> >
> > Userspace owns whether a page is PRIVATE or SHARED, full stop.  If KVM can't
> > honor that, then we need to come up with better uAPI.
> 
> Can you explain more verbosely what you mean?

As proposed, snp_launch_update_gfn_handler() doesn't verify the state of the
gfns' attributes.  But that's a minor problem and probably not a sticking point.

My overarching complaint is that the code is to be wildly unsafe, or at the very
least brittle.  Without guest_memfd's knowledge, and without holding any locks
beyond kvm->lock, it 

 1) checks if a pfn is shared in the RMP
 2) copies data to the page
 3) converts the page to private in the RMP
 4) does PSP stuff
 5) on failure, converts the page back to shared in RMP
 6) conditionally on failure, writes to the page via a gfn

I'm not at all confident that 1-4 isn't riddled with TOCTOU bugs, and that's
before KVM gains support for intrahost migration, i.e. before KVM allows multiple
VM instances to bind to a single guest_memfd.

But I _think_ we mostly sorted this out at PUCK.  IIRC, the plan is to have guest_memfd
provide (kernel) APIs to allow arch/vendor code to initialize a guest_memfd range.
That will give guest_memfd complete control over the state of a given page, will
allow guest_memfd to take the appropriate locks, and if we're lucky, will be reusable
by other CoCo flavors beyond SNP.

> > > > > +                  * When invalid CPUID function entries are detected, the firmware
> > > > > +                  * corrects these entries for debugging purpose and leaves the
> > > > > +                  * page unencrypted so it can be provided users for debugging
> > > > > +                  * and error-reporting.
> > > >
> > > > Why?  IIUC, this is basically backdooring reads/writes into guest_memfd to avoid
> > > > having to add proper mmap() support.
> >
> > Yes, I am specifically complaining about writing guest memory on failure, which is
> > all kinds of weird.
> 
> It is weird but I am not sure if you are complaining about firmware
> behavior or something else.

This proposed KVM code:

+                               host_rmp_make_shared(pfns[i], PG_LEVEL_4K, true);
+
+                               ret = kvm_write_guest_page(kvm, gfn, kvaddr, 0, PAGE_SIZE);
+                               if (ret)
+                                       pr_err("Failed to write CPUID page back to userspace, ret: 0x%x\n",
+                                              ret);

I have no objection to propagating error/debug information back to userspace,
but it needs to be routed through the source page (or I suppose some dedicated
error page, but that seems like overkill).  Shoving the error information into
guest memory is gross.

But this should naturally go away when the requirement that the source be
covered by the same memslot also goes away.