On 9/4/24 05:07, Rick Edgecombe wrote:
From: Isaku Yamahata <isaku.yamahata@xxxxxxxxx>
Add a new ioctl for the user space VMM to initialize guest memory with the
specified memory contents.
Because TDX protects the guest's memory, the creation of the initial guest
memory requires a dedicated TDX module API, TDH.MEM.PAGE.ADD(), instead of
directly copying the memory contents into the guest's memory in the case of
the default VM type.
Define a new subcommand, KVM_TDX_INIT_MEM_REGION, of vCPU-scoped
KVM_MEMORY_ENCRYPT_OP. Check if the GFN is already pre-allocated, assign
the guest page in Secure-EPT, copy the initial memory contents into the
guest memory, and encrypt the guest memory. Optionally, extend the memory
measurement of the TDX guest.
Discussion history:
While useful for the reviewers, in the end this is the simplest possible
userspace API (the one that we started with) and the objections just
went away because it reuses the infrastructure that was introduced for
pre-faulting memory.
So I'd replace everything with:
---
The ioctl uses the vCPU file descriptor because of the TDX module's
requirement that the memory is added to the S-EPT (via TDH.MEM.SEPT.ADD)
prior to initialization (TDH.MEM.PAGE.ADD). Accessing the MMU in turn
requires a vCPU file descriptor, just like for KVM_PRE_FAULT_MEMORY. In
fact, the post-populate callback is able to reuse the same logic used by
KVM_PRE_FAULT_MEMORY, so that userspace can do everything with a single
ioctl.
Note that this is the only way to invoke TDH.MEM.SEPT.ADD before the TD
in finalized, as userspace cannot use KVM_PRE_FAULT_MEMORY at that
point. This ensures that there cannot be pages in the S-EPT awaiting
TDH.MEM.PAGE.ADD, which would be treated incorrectly as spurious by
tdp_mmu_map_handle_target_level() (KVM would see the SPTE as PRESENT,
but the corresponding S-EPT entry will be !PRESENT).
---
Part of the second paragraph comes from your link [4],
https://lore.kernel.org/kvm/Ze-TJh0BBOWm9spT@xxxxxxxxxx/, but updated
for recent changes to KVM_PRE_FAULT_MEMORY.
This drops the historical information that is not particularly relevant
for the future, it updates what's relevant to mention changes done for
SEV-SNP, and also preserves most of the other information:
* why the vCPU file descriptor
* the desirability of a single ioctl for userspace
* the relationship between KVM_TDX_INIT_MEM_REGION and KVM_PRE_FAULT_MEMORY
Paolo