On Mon, Mar 07, 2022, Brijesh Singh wrote: > From: Tom Lendacky <thomas.lendacky@xxxxxxx> > > To provide a more secure way to start APs under SEV-SNP, use the SEV-SNP > AP Creation NAE event. This allows for guest control over the AP register > state rather than trusting the hypervisor with the SEV-ES Jump Table > address. > > During native_smp_prepare_cpus(), invoke an SEV-SNP function that, if > SEV-SNP is active, will set/override apic->wakeup_secondary_cpu. This > will allow the SEV-SNP AP Creation NAE event method to be used to boot > the APs. As a result of installing the override when SEV-SNP is active, > this method of starting the APs becomes the required method. The override > function will fail to start the AP if the hypervisor does not have > support for AP creation. ... > @@ -823,6 +843,230 @@ void snp_set_memory_private(unsigned long vaddr, unsigned int npages) > pvalidate_pages(vaddr, npages, true); > } > > +static int snp_set_vmsa(void *va, bool vmsa) > +{ > + u64 attrs; > + > + /* > + * Running at VMPL0 allows the kernel to change the VMSA bit for a page > + * using the RMPADJUST instruction. However, for the instruction to > + * succeed it must target the permissions of a lesser privileged > + * VMPL level, so use VMPL1 (refer to the RMPADJUST instruction in the > + * AMD64 APM Volume 3). > + */ > + attrs = 1; > + if (vmsa) > + attrs |= RMPADJUST_VMSA_PAGE_BIT; > + > + return rmpadjust((unsigned long)va, RMP_PG_SIZE_4K, attrs); > +} > + > +#define __ATTR_BASE (SVM_SELECTOR_P_MASK | SVM_SELECTOR_S_MASK) > +#define INIT_CS_ATTRIBS (__ATTR_BASE | SVM_SELECTOR_READ_MASK | SVM_SELECTOR_CODE_MASK) > +#define INIT_DS_ATTRIBS (__ATTR_BASE | SVM_SELECTOR_WRITE_MASK) > + > +#define INIT_LDTR_ATTRIBS (SVM_SELECTOR_P_MASK | 2) > +#define INIT_TR_ATTRIBS (SVM_SELECTOR_P_MASK | 3) > + > +static void *snp_alloc_vmsa_page(void) > +{ > + struct page *p; > + > + /* > + * Allocate VMSA page to work around the SNP erratum where the CPU will > + * incorrectly signal an RMP violation #PF if a large page (2MB or 1GB) > + * collides with the RMP entry of VMSA page. The recommended workaround > + * is to not use a large page. > + */ > + > + /* Allocate an 8k page which is also 8k-aligned */ > + p = alloc_pages(GFP_KERNEL_ACCOUNT | __GFP_ZERO, 1); > + if (!p) > + return NULL; > + > + split_page(p, 1); > + > + /* Free the first 4k. This page may be 2M/1G aligned and cannot be used. */ > + __free_page(p); > + > + return page_address(p + 1); > +} > + > +static void snp_cleanup_vmsa(struct sev_es_save_area *vmsa) > +{ > + int err; > + > + err = snp_set_vmsa(vmsa, false); Uh, so what happens if a malicious guest does RMPADJUST to convert a VMSA page back to a "normal" page while the host is trying to VMRUN that VMSA? Does VMRUN fault? Can Linux refuse to support this madness and instead require the ACPI MP wakeup protocol being proposed/implemented for TDX? That would allow KVM to have at least a chance of refusing to support AP "creation", which IMO is a CVE or three waiting to happen. From a KVM perspective, I don't ever want to be running a guest-defined VMSA. https://lore.kernel.org/all/YWnbfCet84Vup6q9@xxxxxxxxxx > + if (err) > + pr_err("clear VMSA page failed (%u), leaking page\n", err); > + else > + free_page((unsigned long)vmsa);