On Mon, Mar 11, 2024, Michael Roth wrote: > On Fri, Feb 09, 2024 at 07:13:13AM -0800, Sean Christopherson wrote: > > On Fri, Feb 09, 2024, Steven Price wrote: > > > >> One option that I've considered is to implement a seperate CCA ioctl to > > > >> notify KVM whether the memory should be mapped protected. > > > > > > > > That's what KVM_SET_MEMORY_ATTRIBUTES+KVM_MEMORY_ATTRIBUTE_PRIVATE is for, no? > > > > > > Sorry, I really didn't explain that well. Yes effectively this is the > > > attribute flag, but there's corner cases for destruction of the VM. My > > > thought was that if the VMM wanted to tear down part of the protected > > > range (without making it shared) then a separate ioctl would be needed > > > to notify KVM of the unmap. > > > > No new uAPI should be needed, because the only scenario time a benign VMM should > > do this is if the guest also knows the memory is being removed, in which case > > PUNCH_HOLE will suffice. > > > > > >> This 'solves' the problem nicely except for the case where the VMM > > > >> deliberately punches holes in memory which the guest is using. > > > > > > > > I don't see what problem there is to solve in this case. PUNCH_HOLE is destructive, > > > > so don't do that. > > > > > > A well behaving VMM wouldn't PUNCH_HOLE when the guest is using it, but > > > my concern here is a VMM which is trying to break the host. In this case > > > either the PUNCH_HOLE needs to fail, or we actually need to recover the > > > memory from the guest (effectively killing the guest in the process). > > > > The latter. IIRC, we talked about this exact case somewhere in the hour-long > > rambling discussion on guest_memfd at PUCK[1]. And we've definitely discussed > > this multiple times on-list, though I don't know that there is a single thread > > that captures the entire plan. > > > > The TL;DR is that gmem will invoke an arch hook for every "struct kvm_gmem" > > instance that's attached to a given guest_memfd inode when a page is being fully > > removed, i.e. when a page is being freed back to the normal memory pool. Something > > like this proposed SNP patch[2]. > > > > Mike, do have WIP patches you can share? > > Sorry, I missed this query earlier. I'm a bit confused though, I thought > the kvm_arch_gmem_invalidate() hook provided in this patch was what we > ended up agreeing on during the PUCK call in question. Heh, I trust your memory of things far more than I trust mine. I'm just proving Cunningham's Law. :-) > There was an open question about what to do if a use-case came along > where we needed to pass additional parameters to > kvm_arch_gmem_invalidate() other than just the start/end PFN range for > the pages being freed, but we'd determined that SNP and TDX did not > currently need this, so I didn't have any changes planned in this > regard. > > If we now have such a need, what we had proposed was to modify > __filemap_remove_folio()/page_cache_delete() to defer setting > folio->mapping to NULL so that we could still access it in > kvm_gmem_free_folio() so that we can still access mapping->i_private_list > to get the list of gmem/KVM instances and pass them on via > kvm_arch_gmem_invalidate(). Yeah, this is what I was remembering. I obviously forgot that we didn't have a need to iterate over all bindings at this time. > So that's doable, but it's not clear from this discussion that that's > needed. Same here. And even if it is needed, it's not your problem to solve. The above blurb about needing to preserve folio->mapping being free_folio() is sufficient to get the ARM code moving in the right direction. Thanks! > If the idea to block/kill the guest if VMM tries to hole-punch, > and ARM CCA already has plans to wire up the shared/private flags in > kvm_unmap_gfn_range(), wouldn't that have all the information needed to > kill that guest? At that point, kvm_gmem_free_folio() can handle > additional per-page cleanup (with additional gmem/KVM info plumbed in > if necessary).