On Sat, Dec 30, 2023, Michael Roth wrote: > From: Brijesh Singh <brijesh.singh@xxxxxxx> > > The KVM_SEV_SNP_LAUNCH_UPDATE command can be used to insert data into > the guest's memory. The data is encrypted with the cryptographic context > created with the KVM_SEV_SNP_LAUNCH_START. > > In addition to the inserting data, it can insert a two special pages > into the guests memory: the secrets page and the CPUID page. > > While terminating the guest, reclaim the guest pages added in the RMP > table. If the reclaim fails, then the page is no longer safe to be > released back to the system and leak them. > > For more information see the SEV-SNP specification. Please rewrite all changelogs to explain what *KVM* support is being added, why the proposed uAPI looks like it does, and how the new uAPI is intended be used. Porividing a crash course on the relevant hardware behavior is definitely helpful, but the changelog absolutely needs to explain/justify the patch. > Co-developed-by: Michael Roth <michael.roth@xxxxxxx> > Signed-off-by: Michael Roth <michael.roth@xxxxxxx> > Signed-off-by: Brijesh Singh <brijesh.singh@xxxxxxx> > Signed-off-by: Ashish Kalra <ashish.kalra@xxxxxxx> > --- > .../virt/kvm/x86/amd-memory-encryption.rst | 28 +++ > arch/x86/kvm/svm/sev.c | 181 ++++++++++++++++++ > include/uapi/linux/kvm.h | 19 ++ > 3 files changed, 228 insertions(+) > > diff --git a/Documentation/virt/kvm/x86/amd-memory-encryption.rst b/Documentation/virt/kvm/x86/amd-memory-encryption.rst > index b1beb2fe8766..d4325b26724c 100644 > --- a/Documentation/virt/kvm/x86/amd-memory-encryption.rst > +++ b/Documentation/virt/kvm/x86/amd-memory-encryption.rst > @@ -485,6 +485,34 @@ Returns: 0 on success, -negative on error > > See the SEV-SNP specification for further detail on the launch input. > > +20. KVM_SNP_LAUNCH_UPDATE > +------------------------- > + > +The KVM_SNP_LAUNCH_UPDATE is used for encrypting a memory region. It also > +calculates a measurement of the memory contents. The measurement is a signature > +of the memory contents that can be sent to the guest owner as an attestation > +that the memory was encrypted correctly by the firmware. > + > +Parameters (in): struct kvm_snp_launch_update > + > +Returns: 0 on success, -negative on error > + > +:: > + > + struct kvm_sev_snp_launch_update { > + __u64 start_gfn; /* Guest page number to start from. */ > + __u64 uaddr; /* userspace address need to be encrypted */ Huh? Why is KVM taking a userspace address? IIUC, the address unconditionally gets translated into a gfn, so why not pass a gfn? And speaking of gfns, AFAICT start_gfn is never used. Oof, reading more of the code, this *requires* an effective in-place copy-and-convert of guest memory. > + __u32 len; /* length of memory region */ Bytes? Pages? One field above operates on frame numbers, one apparently operates on a byte-granularity address. > + __u8 imi_page; /* 1 if memory is part of the IMI */ What's "the IMI"? Initial Measurement Image? I assume this is essentially just a flag that communicates whether or not the page should be measured? > + __u8 page_type; /* page type */ > + __u8 vmpl3_perms; /* VMPL3 permission mask */ > + __u8 vmpl2_perms; /* VMPL2 permission mask */ > + __u8 vmpl1_perms; /* VMPL1 permission mask */ Why? KVM doesn't support VMPLs. > +static int snp_page_reclaim(u64 pfn) > +{ > + struct sev_data_snp_page_reclaim data = {0}; > + int err, rc; > + > + data.paddr = __sme_set(pfn << PAGE_SHIFT); > + rc = sev_do_cmd(SEV_CMD_SNP_PAGE_RECLAIM, &data, &err); > + if (rc) { > + /* > + * If the reclaim failed, then page is no longer safe > + * to use. Uh, why can reclaim fail, and why does the kernel apparently not care about leaking pages? AFAICT, nothing ever complains beyond a pr_debug. That sounds bonkers to me, i.e. at the very minimum, why doesn't this warrant a WARN_ON_ONCE? > + */ > + snp_leak_pages(pfn, 1); > + } > + > + return rc; > +} > + > +static int host_rmp_make_shared(u64 pfn, enum pg_level level, bool leak) > +{ > + int rc; > + > + rc = rmp_make_shared(pfn, level); > + if (rc && leak) > + snp_leak_pages(pfn, > + page_level_size(level) >> PAGE_SHIFT); Completely unnecessary wrap. > + > + return rc; > +} > + > static void sev_unbind_asid(struct kvm *kvm, unsigned int handle) > { > struct sev_data_deactivate deactivate; > @@ -1990,6 +2020,154 @@ static int snp_launch_start(struct kvm *kvm, struct kvm_sev_cmd *argp) > return rc; > } > > +static int snp_launch_update_gfn_handler(struct kvm *kvm, > + struct kvm_gfn_range *range, > + void *opaque) > +{ > + struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info; > + struct kvm_memory_slot *memslot = range->slot; > + struct sev_data_snp_launch_update data = {0}; > + struct kvm_sev_snp_launch_update params; > + struct kvm_sev_cmd *argp = opaque; > + int *error = &argp->error; > + int i, n = 0, ret = 0; > + unsigned long npages; > + kvm_pfn_t *pfns; > + gfn_t gfn; > + > + if (!kvm_slot_can_be_private(memslot)) { > + pr_err("SEV-SNP requires private memory support via guest_memfd.\n"); Yeah, no. Sprinkling pr_err() all over the place in user-triggerable error paths is not acceptable. I get that it's often hard to extract "what went wrong" out of the kernel, but adding pr_err() everywhere is not a viable solution. > + return -EINVAL; > + } > + > + if (copy_from_user(¶ms, (void __user *)(uintptr_t)argp->data, sizeof(params))) { > + pr_err("Failed to copy user parameters for SEV-SNP launch.\n"); > + return -EFAULT; > + } > + > + data.gctx_paddr = __psp_pa(sev->snp_context); > + > + npages = range->end - range->start; > + pfns = kvmalloc_array(npages, sizeof(*pfns), GFP_KERNEL_ACCOUNT); > + if (!pfns) > + return -ENOMEM; > + > + pr_debug("%s: GFN range 0x%llx-0x%llx, type %d\n", __func__, > + range->start, range->end, params.page_type); > + > + for (gfn = range->start, i = 0; gfn < range->end; gfn++, i++) { > + int order, level; > + bool assigned; > + void *kvaddr; > + > + ret = __kvm_gmem_get_pfn(kvm, memslot, gfn, &pfns[i], &order, false); > + if (ret) > + goto e_release; > + > + n++; > + ret = snp_lookup_rmpentry((u64)pfns[i], &assigned, &level); > + if (ret || assigned) { > + pr_err("Failed to ensure GFN 0x%llx is in initial shared state, ret: %d, assigned: %d\n", > + gfn, ret, assigned); > + return -EFAULT; > + } > + > + kvaddr = pfn_to_kaddr(pfns[i]); > + if (!virt_addr_valid(kvaddr)) { I really, really don't like that this assume guest_memfd is backed by struct page. > + pr_err("Invalid HVA 0x%llx for GFN 0x%llx\n", (uint64_t)kvaddr, gfn); > + ret = -EINVAL; > + goto e_release; > + } > + > + ret = kvm_read_guest_page(kvm, gfn, kvaddr, 0, PAGE_SIZE); Good gravy. If I'm reading this correctly, KVM: 1. Translates an HVA into a GFN. 2. Gets the PFN for that GFN from guest_memfd 3. Verifies the PFN is not assigned to the guest 4. Copies memory from the shared memslot page to the guest_memfd page 5. Converts the page to private and asks the PSP to encrypt it (a) As above, why is #1 a thing? (b) Why are KVM's memory attributes never consulted? (c) What prevents TOCTOU issues with respect to the RMP? (d) Why is *KVM* copying memory into guest_memfd? (e) What guarantees the direct map is valid for guest_memfd? (f) Why does KVM's uAPI *require* the source page to come from the same memslot? > + if (ret) { > + pr_err("Guest read failed, ret: 0x%x\n", ret); > + goto e_release; > + } > + > + ret = rmp_make_private(pfns[i], gfn << PAGE_SHIFT, PG_LEVEL_4K, > + sev_get_asid(kvm), true); > + if (ret) { > + ret = -EFAULT; > + goto e_release; > + } > + > + data.address = __sme_set(pfns[i] << PAGE_SHIFT); > + data.page_size = PG_LEVEL_TO_RMP(PG_LEVEL_4K); > + data.page_type = params.page_type; > + data.vmpl3_perms = params.vmpl3_perms; > + data.vmpl2_perms = params.vmpl2_perms; > + data.vmpl1_perms = params.vmpl1_perms; > + ret = __sev_issue_cmd(argp->sev_fd, SEV_CMD_SNP_LAUNCH_UPDATE, > + &data, error); > + if (ret) { > + pr_err("SEV-SNP launch update failed, ret: 0x%x, fw_error: 0x%x\n", > + ret, *error); > + snp_page_reclaim(pfns[i]); > + > + /* > + * When invalid CPUID function entries are detected, the firmware > + * corrects these entries for debugging purpose and leaves the > + * page unencrypted so it can be provided users for debugging > + * and error-reporting. > + * > + * Copy the corrected CPUID page back to shared memory so > + * userpsace can retrieve this information. Why? IIUC, this is basically backdooring reads/writes into guest_memfd to avoid having to add proper mmap() support. > + */ > + if (params.page_type == SNP_PAGE_TYPE_CPUID && > + *error == SEV_RET_INVALID_PARAM) { > + int ret; Ugh, do not shadow variables. > + > + host_rmp_make_shared(pfns[i], PG_LEVEL_4K, true); > + > + ret = kvm_write_guest_page(kvm, gfn, kvaddr, 0, PAGE_SIZE); > + if (ret) > + pr_err("Failed to write CPUID page back to userspace, ret: 0x%x\n", > + ret); > + } > + > + goto e_release; > + } > + } > + > +e_release: > + /* Content of memory is updated, mark pages dirty */ > + for (i = 0; i < n; i++) { > + set_page_dirty(pfn_to_page(pfns[i])); > + mark_page_accessed(pfn_to_page(pfns[i])); > + > + /* > + * If its an error, then update RMP entry to change page ownership > + * to the hypervisor. > + */ > + if (ret) > + host_rmp_make_shared(pfns[i], PG_LEVEL_4K, true); > + > + put_page(pfn_to_page(pfns[i])); > + } > + > + kvfree(pfns); Saving PFNs from guest_memfd, which is fully owned by KVM, is so unnecessarily complex. Add a guest_memfd API (or three) to do this safely, e.g. to lock the pages, do (and track) the RMP conversion, etc...