On Mon, Jan 15, 2024, Michael Roth wrote: > On Wed, Jan 10, 2024 at 07:45:36AM -0800, Sean Christopherson wrote: > > On Sat, Dec 30, 2023, Michael Roth wrote: > > > diff --git a/Documentation/virt/kvm/x86/amd-memory-encryption.rst b/Documentation/virt/kvm/x86/amd-memory-encryption.rst > > > index b1beb2fe8766..d4325b26724c 100644 > > > --- a/Documentation/virt/kvm/x86/amd-memory-encryption.rst > > > +++ b/Documentation/virt/kvm/x86/amd-memory-encryption.rst > > > @@ -485,6 +485,34 @@ Returns: 0 on success, -negative on error > > > > > > See the SEV-SNP specification for further detail on the launch input. > > > > > > +20. KVM_SNP_LAUNCH_UPDATE > > > +------------------------- > > > + > > > +The KVM_SNP_LAUNCH_UPDATE is used for encrypting a memory region. It also > > > +calculates a measurement of the memory contents. The measurement is a signature > > > +of the memory contents that can be sent to the guest owner as an attestation > > > +that the memory was encrypted correctly by the firmware. > > > + > > > +Parameters (in): struct kvm_snp_launch_update > > > + > > > +Returns: 0 on success, -negative on error > > > + > > > +:: > > > + > > > + struct kvm_sev_snp_launch_update { > > > + __u64 start_gfn; /* Guest page number to start from. */ > > > + __u64 uaddr; /* userspace address need to be encrypted */ > > > > Huh? Why is KVM taking a userspace address? IIUC, the address unconditionally > > gets translated into a gfn, so why not pass a gfn? > > > > And speaking of gfns, AFAICT start_gfn is never used. > > I think having both the uaddr and start_gfn parameters makes sense. I > think it's only awkward because how I'm using the memslot to translate > the uaddr to a GFN in the current implementation, Yes. > > Oof, reading more of the code, this *requires* an effective in-place copy-and-convert > > of guest memory. > > Yes, I'm having some trouble locating the various threads, but initially > there were some discussions about having a userspace API that allows for > UPM/gmem pages to be pre-populated before they are in-place encrypted, but > we'd all eventually decided that having KVM handle this internally was > the simplest approach. > > So that's how it's done here, KVM_SNP_LAUNCH_UPDATE copies the pages into > gmem, then passes those pages on to firmware for encryption. Then the > VMM is expected to mark the GFN range as private via > KVM_SET_MEMORY_ATTRIBUTES, since the VMM understands what constitutes > initial private/encrypted payload. I should document that better in > KVM_SNP_LAUNCH_UPDATE docs however. That's fine. As above, my complaints are that the unencrypted source *must* be covered by a memslot. That's beyond ugly. > > What's "the IMI"? Initial Measurement Image? I assume this is essentially just > > a flag that communicates whether or not the page should be measured? > > This is actually for loading a measured migration agent into the target > system so that it can handle receiving the encrypted guest data from the > source. There's still a good deal of planning around how live migration > will be handled however so if you think it's premature to expose this > via KVM I can remove the related fields. Yes, please. Though FWIW, I honestly hope KVM_SEV_SNP_LAUNCH_UPDATE goes away and we end up with a common uAPI for populating guest memory: https://lore.kernel.org/all/Zbrj5WKVgMsUFDtb@xxxxxxxxxx > > > + __u8 page_type; /* page type */ > > > + __u8 vmpl3_perms; /* VMPL3 permission mask */ > > > + __u8 vmpl2_perms; /* VMPL2 permission mask */ > > > + __u8 vmpl1_perms; /* VMPL1 permission mask */ > > > > Why? KVM doesn't support VMPLs. > > It does actually get used by the SVSM. > I can remove these but then we'd need some capability bit or something to > know when they are available if they get re-introduced. _If_. We don't merge dead code, and we _definitely_ don't merge dead code that creates ABI. > > > + kvaddr = pfn_to_kaddr(pfns[i]); > > > + if (!virt_addr_valid(kvaddr)) { > > > > I really, really don't like that this assume guest_memfd is backed by struct page. > > There are similar enforcements in the SEV/SEV-ES code. There was some > initial discussion about relaxing this for SNP so we could support > things like /dev/mem-mapped guest memory, but then guest_memfd came > along and made that to be an unlikely use-case in the near-term given > that it relies on alloc_pages() currently and explicitly guards against > mmap()'ing pages in userspace. > > I think it makes to keep the current tightened restrictions in-place > until such a use-case comes along, since otherwise we are relaxing a > bunch of currently-useful sanity checks that span all throughout the code > to support some nebulous potential use-case that might never come along. > I think it makes more sense to cross that bridge when we get there. I disagree. You say "sanity checks", while I see a bunch of arbitrary assumptions that don't need to exist. Yes, today guest_memfd always uses struct page memory, but that should have _zero_ impact on SNP. Arbitrary assumptions often cause a lot of confusion for future readers, e.g. a few years from now, if the above code still exists, someone might wonder what is so special about struct page memory, and then waste a bunch of time trying to figure out that there's no technical reason SNP "requires" struct page memory. This is partly why I was pushing for guest_memfd to handle some of this; the gmem code _knows_ what backing type it's using, it _knows_ if the direct map is valid, etc. > > > + pr_err("Invalid HVA 0x%llx for GFN 0x%llx\n", (uint64_t)kvaddr, gfn); > > > + ret = -EINVAL; > > > + goto e_release; > > > + } > > > + > > > + ret = kvm_read_guest_page(kvm, gfn, kvaddr, 0, PAGE_SIZE); > > > > Good gravy. If I'm reading this correctly, KVM: > > > > 1. Translates an HVA into a GFN. > > 2. Gets the PFN for that GFN from guest_memfd > > 3. Verifies the PFN is not assigned to the guest > > 4. Copies memory from the shared memslot page to the guest_memfd page > > 5. Converts the page to private and asks the PSP to encrypt it > > > > (a) As above, why is #1 a thing? > > Yah, it's probably best to avoid this, as proposed above. > > > (b) Why are KVM's memory attributes never consulted? > > It doesn't really matter if the attributes are set before or after > KVM_SNP_LAUNCH_UPDATE, only that by the time the guest actually launches > they pages get set to private so they get faulted in from gmem. We could > document our expectations and enforce them here if that's preferable > however. Maybe requiring KVM_SET_MEMORY_ATTRIBUTES(private) in advance > would make it easier to enforce that userspace does the right thing. > I'll see how that looks if there are no objections. Userspace owns whether a page is PRIVATE or SHARED, full stop. If KVM can't honor that, then we need to come up with better uAPI. > > (c) What prevents TOCTOU issues with respect to the RMP? > > Time-of-use will be when the guest faults the gmem page in with C-bit > set. If it is not in the expected Guest-owned/pre-validated state that > SEV_CMD_SNP_LAUNCH_UPDATE expected/set, then the guest will get an RMP > fault or #VC exception for unvalidated page access. It will also fail > attestation. Not sure if that covers the scenarios you had in mind. I don't think this covers what I was asking, but I suspect my concern will go away once the new APIs come along, so let's table this for now. > > > (d) Why is *KVM* copying memory into guest_memfd? > > As mentioned above, there were various discussions of ways of allowing > userspace to pwrite() to the guest_memfd in advance of > "sealing"/"binding" it and then encrypting it in place. I think this was > one of the related threads: > > https://lore.kernel.org/linux-mm/YkyKywkQYbr9U0CA@xxxxxxxxxx/ > > My read at the time was that the requirements between pKVM/TDX/SNP were all > so unique that we'd spin forever trying to come up with a userspace ABI > that worked for everyone. At the time you'd suggested for pKVM to handle > their specific requirements internally in pKVM to avoid unecessary churn on > TDX/SNP side, and I took the same approach with SNP in implementing it > internally in SNP's KVM interfaces since it seemed unlikely there would > be much common ground with how TDX handles it via KVM_TDX_INIT_MEM_REGION. Yeah, the whole "intra-memslot copy" thing threw me. > > (e) What guarantees the direct map is valid for guest_memfd? > > Are you suggesting this may change in the near-term? I asking because, when I asked, I was unaware that the plan was to shatter the direct to address the 2MiB vs. 4KiB erratum (as opposed to unmapping guest_memfd pfns). > > > + if (ret) { > > > + pr_err("SEV-SNP launch update failed, ret: 0x%x, fw_error: 0x%x\n", > > > + ret, *error); > > > + snp_page_reclaim(pfns[i]); > > > + > > > + /* > > > + * When invalid CPUID function entries are detected, the firmware > > > + * corrects these entries for debugging purpose and leaves the > > > + * page unencrypted so it can be provided users for debugging > > > + * and error-reporting. > > > + * > > > + * Copy the corrected CPUID page back to shared memory so > > > + * userpsace can retrieve this information. > > > > Why? IIUC, this is basically backdooring reads/writes into guest_memfd to avoid > > having to add proper mmap() support. > > The CPUID page is private/encrypted, so it needs to be a gmem page. > SNP firmware is doing the backdooring when it writes the unencrypted, > expected contents into the page during failure. What's wrong with copying > the contents back into the source page so userspace can be use of it? > If we implement the above-mentioned changes then the source page is just > a userspace buffer that isn't necessarily associated with a memslot, so > it becomes even more straightforward. > > Would that be acceptable? Yes, I am specifically complaining about writing guest memory on failure, which is all kinds of weird. > > > + kvfree(pfns); > > > > Saving PFNs from guest_memfd, which is fully owned by KVM, is so unnecessarily > > complex. Add a guest_memfd API (or three) to do this safely, e.g. to lock the > > pages, do (and track) the RMP conversion, etc... > > Is adding 3 gmem APIs and tracking RMP states inside gmem really less > complex than what's going on here? I think we covered this in PUCK? Holler if you still have questions here.