Re: [PATCH RFC gmem v1 4/8] KVM: x86: Add gmem hook for invalidating memory

Steven Price <steven.price@xxxxxxx> · Wed, 13 Mar 2024 17:11:29 +0000

On 12/03/2024 20:26, Sean Christopherson wrote:
> On Mon, Mar 11, 2024, Michael Roth wrote:
>> On Fri, Feb 09, 2024 at 07:13:13AM -0800, Sean Christopherson wrote:
>>> On Fri, Feb 09, 2024, Steven Price wrote:
>>>>>> One option that I've considered is to implement a seperate CCA ioctl to
>>>>>> notify KVM whether the memory should be mapped protected.
>>>>>
>>>>> That's what KVM_SET_MEMORY_ATTRIBUTES+KVM_MEMORY_ATTRIBUTE_PRIVATE is for, no?
>>>>
>>>> Sorry, I really didn't explain that well. Yes effectively this is the
>>>> attribute flag, but there's corner cases for destruction of the VM. My
>>>> thought was that if the VMM wanted to tear down part of the protected
>>>> range (without making it shared) then a separate ioctl would be needed
>>>> to notify KVM of the unmap.
>>>
>>> No new uAPI should be needed, because the only scenario time a benign VMM should
>>> do this is if the guest also knows the memory is being removed, in which case
>>> PUNCH_HOLE will suffice.
>>>
>>>>>> This 'solves' the problem nicely except for the case where the VMM
>>>>>> deliberately punches holes in memory which the guest is using.
>>>>>
>>>>> I don't see what problem there is to solve in this case.  PUNCH_HOLE is destructive,
>>>>> so don't do that.
>>>>
>>>> A well behaving VMM wouldn't PUNCH_HOLE when the guest is using it, but
>>>> my concern here is a VMM which is trying to break the host. In this case
>>>> either the PUNCH_HOLE needs to fail, or we actually need to recover the
>>>> memory from the guest (effectively killing the guest in the process).
>>>
>>> The latter.  IIRC, we talked about this exact case somewhere in the hour-long
>>> rambling discussion on guest_memfd at PUCK[1].  And we've definitely discussed
>>> this multiple times on-list, though I don't know that there is a single thread
>>> that captures the entire plan.
>>>
>>> The TL;DR is that gmem will invoke an arch hook for every "struct kvm_gmem"
>>> instance that's attached to a given guest_memfd inode when a page is being fully
>>> removed, i.e. when a page is being freed back to the normal memory pool.  Something
>>> like this proposed SNP patch[2].
>>>
>>> Mike, do have WIP patches you can share?
>>
>> Sorry, I missed this query earlier. I'm a bit confused though, I thought
>> the kvm_arch_gmem_invalidate() hook provided in this patch was what we
>> ended up agreeing on during the PUCK call in question.
> 
> Heh, I trust your memory of things far more than I trust mine.  I'm just proving
> Cunningham's Law.  :-)
> 
>> There was an open question about what to do if a use-case came along
>> where we needed to pass additional parameters to
>> kvm_arch_gmem_invalidate() other than just the start/end PFN range for
>> the pages being freed, but we'd determined that SNP and TDX did not
>> currently need this, so I didn't have any changes planned in this
>> regard.
>>
>> If we now have such a need, what we had proposed was to modify
>> __filemap_remove_folio()/page_cache_delete() to defer setting
>> folio->mapping to NULL so that we could still access it in
>> kvm_gmem_free_folio() so that we can still access mapping->i_private_list
>> to get the list of gmem/KVM instances and pass them on via
>> kvm_arch_gmem_invalidate().
> 
> Yeah, this is what I was remembering.  I obviously forgot that we didn't have a
> need to iterate over all bindings at this time.
> 
>> So that's doable, but it's not clear from this discussion that that's
>> needed.
> 
> Same here.  And even if it is needed, it's not your problem to solve.  The above
> blurb about needing to preserve folio->mapping being free_folio() is sufficient
> to get the ARM code moving in the right direction.
> 
> Thanks!
> 
>> If the idea to block/kill the guest if VMM tries to hole-punch,
>> and ARM CCA already has plans to wire up the shared/private flags in
>> kvm_unmap_gfn_range(), wouldn't that have all the information needed to
>> kill that guest? At that point, kvm_gmem_free_folio() can handle
>> additional per-page cleanup (with additional gmem/KVM info plumbed in
>> if necessary).

Yes, the missing piece of the puzzle was provided by "KVM: Prepare for
handling only shared mappings in mmu_notifier events"[1] - namely the
"only_shared" flag. We don't need to actually block/kill the guest until
it attempts access to the memory which has been removed from the guest -
at that point the guest cannot continue because the security properties
have been violated (the protected memory contents have been lost) so
attempts to continue the guest will fail.

You can ignore most of my other ramblings - as long as everyone is happy
with that flag then Arm CCA should be fine. I was just looking at other
options.

Thanks,

Steve

[1]
https://lore.kernel.org/lkml/20231027182217.3615211-13-seanjc@xxxxxxxxxx/