Re: [PATCH RFC gmem v1 4/8] KVM: x86: Add gmem hook for invalidating memory

Michael Roth <michael.roth@xxxxxxx> · Mon, 11 Mar 2024 12:24:31 -0500

On Fri, Feb 09, 2024 at 07:13:13AM -0800, Sean Christopherson wrote:
> On Fri, Feb 09, 2024, Steven Price wrote:
> > >> One option that I've considered is to implement a seperate CCA ioctl to
> > >> notify KVM whether the memory should be mapped protected.
> > > 
> > > That's what KVM_SET_MEMORY_ATTRIBUTES+KVM_MEMORY_ATTRIBUTE_PRIVATE is for, no?
> > 
> > Sorry, I really didn't explain that well. Yes effectively this is the
> > attribute flag, but there's corner cases for destruction of the VM. My
> > thought was that if the VMM wanted to tear down part of the protected
> > range (without making it shared) then a separate ioctl would be needed
> > to notify KVM of the unmap.
> 
> No new uAPI should be needed, because the only scenario time a benign VMM should
> do this is if the guest also knows the memory is being removed, in which case
> PUNCH_HOLE will suffice.
> 
> > >> This 'solves' the problem nicely except for the case where the VMM
> > >> deliberately punches holes in memory which the guest is using.
> > > 
> > > I don't see what problem there is to solve in this case.  PUNCH_HOLE is destructive,
> > > so don't do that.
> > 
> > A well behaving VMM wouldn't PUNCH_HOLE when the guest is using it, but
> > my concern here is a VMM which is trying to break the host. In this case
> > either the PUNCH_HOLE needs to fail, or we actually need to recover the
> > memory from the guest (effectively killing the guest in the process).
> 
> The latter.  IIRC, we talked about this exact case somewhere in the hour-long
> rambling discussion on guest_memfd at PUCK[1].  And we've definitely discussed
> this multiple times on-list, though I don't know that there is a single thread
> that captures the entire plan.
> 
> The TL;DR is that gmem will invoke an arch hook for every "struct kvm_gmem"
> instance that's attached to a given guest_memfd inode when a page is being fully
> removed, i.e. when a page is being freed back to the normal memory pool.  Something
> like this proposed SNP patch[2].
> 
> Mike, do have WIP patches you can share?

Sorry, I missed this query earlier. I'm a bit confused though, I thought
the kvm_arch_gmem_invalidate() hook provided in this patch was what we
ended up agreeing on during the PUCK call in question.

There was an open question about what to do if a use-case came along
where we needed to pass additional parameters to
kvm_arch_gmem_invalidate() other than just the start/end PFN range for
the pages being freed, but we'd determined that SNP and TDX did not
currently need this, so I didn't have any changes planned in this
regard.

If we now have such a need, what we had proposed was to modify
__filemap_remove_folio()/page_cache_delete() to defer setting
folio->mapping to NULL so that we could still access it in
kvm_gmem_free_folio() so that we can still access mapping->i_private_list
to get the list of gmem/KVM instances and pass them on via
kvm_arch_gmem_invalidate().

So that's doable, but it's not clear from this discussion that that's
needed. If the idea to block/kill the guest if VMM tries to hole-punch,
and ARM CCA already has plans to wire up the shared/private flags in
kvm_unmap_gfn_range(), wouldn't that have all the information needed to
kill that guest? At that point, kvm_gmem_free_folio() can handle
additional per-page cleanup (with additional gmem/KVM info plumbed in
if necessary).

-Mike

[1] https://lore.kernel.org/kvm/20240202230611.351544-1-seanjc@xxxxxxxxxx/T/

> 
> [1] https://drive.google.com/corp/drive/folders/116YTH1h9yBZmjqeJc03cV4_AhSe-VBkc?resourcekey=0-sOGeFEUi60-znJJmZBsTHQ
> [2] https://lore.kernel.org/all/20231230172351.574091-30-michael.roth@xxxxxxx