> What happens if it does so for a page that a guest hasn't shared back? When the hole is punched, KVM will unmap the corresponding private SPTEs. If the guest is still accessing the page as private, the next access will fault and KVM will exit to userspace with KVM_EXIT_MEMORY_ERROR. Of course the guest is probably hosed if the hole punch was truly spurious, as at least hardware-based protected VMs effectively destroy data when a private page is unmapped from the guest private SPTEs. E.g. Linux guests for TDX and SNP will panic/terminate in such a scenario as they will get a fault (injected by trusted hardware/firmware) saying that the guest is trying to access an unaccepted/unvalidated page (TDX and SNP require the guest to explicit accept all private pages that aren't part of the guest's initial pre-boot image). > > Another important detail is that this approach means the kernel and KVM treat the > > shared backing store and private backing store as independent, albeit related, > > entities. This is very deliberate as it makes it easier to reason about what is > > and isn't allowed/required. E.g. the kernel only needs to handle freeing private > > memory, there is no special handling for conversion to shared because no such path > > exists as far as host pfns are concerned. And userspace doesn't need any new "rules" > > for protecting itself against a malicious guest, e.g. userspace already needs to > > ensure that it has a valid mapping prior to accessing guest memory (or be able to > > handle any resulting signals). A malicious guest can DoS itself by instructing > > userspace to communicate over memory that is currently mapped private, but there > > are no new novel attack vectors from the host's perspective as coercing the host > > into accessing an invalid mapping after shared=>private conversion is just a variant > > of a use-after-free. > > Interesting. I was (maybe incorrectly) assuming that it would be > difficult to handle illegal host accesses w/ TDX. IOW, this would > essentially crash the host. Is this remotely correct or did I get that > wrong? Handling illegal host kernel accesses for both TDX and SEV-SNP is extremely difficult, bordering on impossible. That's one of the biggest, if not _the_ biggest, motivations for the private fd approach. On "conversion", the page that is used to back the shared variant is a completely different, unrelated host physical page. Whether or not the private/shared backing page is freed is orthogonal to what version is mapped into the guest. E.g. if the guest converts a 4kb chunk of a 2mb hugepage, the private backing store could keep the physical page on hole punch (example only, I don't know if this is the actual proposed implementation). The idea is that it'll be much, much more difficult for the host to perform an illegal access if the actual private memory is not mapped anywhere (modulo the kernel's direct map, which we may or may not leave intact). The private backing store just needs to ensure it properly sanitizing pages before freeing them.