On 4/6/21 5:33 AM, Borislav Petkov wrote: > On Wed, Mar 24, 2021 at 11:44:17AM -0500, Brijesh Singh wrote: >> Many of the integrity guarantees of SEV-SNP are enforced through the >> Reverse Map Table (RMP). Each RMP entry contains the GPA at which a >> particular page of DRAM should be mapped. The VMs can request the >> hypervisor to add pages in the RMP table via the Page State Change VMGEXIT >> defined in the GHCB specification section 2.5.1 and 4.1.6. Inside each RMP >> entry is a Validated flag; this flag is automatically cleared to 0 by the >> CPU hardware when a new RMP entry is created for a guest. Each VM page >> can be either validated or invalidated, as indicated by the Validated >> flag in the RMP entry. Memory access to a private page that is not >> validated generates a #VC. A VM can use PVALIDATE instruction to validate >> the private page before using it. > I guess this should say "A VM must use the PVALIDATE insn to validate > that private page before using it." Otherwise it can't use it, right. > Thus the "must" and not "can". Noted, I should have used "must". > >> To maintain the security guarantee of SEV-SNP guests, when transitioning >> a memory from private to shared, the guest must invalidate the memory range >> before asking the hypervisor to change the page state to shared in the RMP >> table. > So first you talk about memory pages, now about memory range... > >> After the page is mapped private in the page table, the guest must issue a > ... and now about pages again. Let's talk pages only pls. Noted, I will stick to memory pages. thanks > >> page state change VMGEXIT to make the memory private in the RMP table and >> validate it. If the memory is not validated after its added in the RMP table >> as private, then a VC exception (page-not-validated) will be raised. > Didn't you just say this already above? Yes I said it in the start of the commit, I will work to avoid the repetition. >> We do > Who's "we"? > >> not support the page-not-validated exception yet, so it will crash the guest. >> >> On boot, BIOS should have validated the entire system memory. During >> the kernel decompression stage, the VC handler uses the >> set_memory_decrypted() to make the GHCB page shared (i.e clear encryption >> attribute). And while exiting from the decompression, it calls the >> set_memory_encyrpted() to make the page private. > Hmm, that commit message needs reorganizing, from > Documentation/process/submitting-patches.rst: > > "Describe your changes in imperative mood, e.g. "make xyzzy do frotz" > instead of "[This patch] makes xyzzy do frotz" or "[I] changed xyzzy > to do frotz", as if you are giving orders to the codebase to change > its behaviour." > > So this should say something along the lines of "Add helpers for validating > pages in the decompression stage" or so. I will improve the commit message to avoid using the "[we]" or "[I]". Add helpers looks good, thanks. >> Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx> >> Cc: Ingo Molnar <mingo@xxxxxxxxxx> >> Cc: Borislav Petkov <bp@xxxxxxxxx> >> Cc: Joerg Roedel <jroedel@xxxxxxx> >> Cc: "H. Peter Anvin" <hpa@xxxxxxxxx> >> Cc: Tony Luck <tony.luck@xxxxxxxxx> >> Cc: Dave Hansen <dave.hansen@xxxxxxxxx> >> Cc: "Peter Zijlstra (Intel)" <peterz@xxxxxxxxxxxxx> >> Cc: Paolo Bonzini <pbonzini@xxxxxxxxxx> >> Cc: Tom Lendacky <thomas.lendacky@xxxxxxx> >> Cc: David Rientjes <rientjes@xxxxxxxxxx> >> Cc: Sean Christopherson <seanjc@xxxxxxxxxx> >> Cc: x86@xxxxxxxxxx >> Cc: kvm@xxxxxxxxxxxxxxx > Btw, you don't really need to add those CCs to the patch - it is enough > if you Cc the folks when you send the patches with git. Noted. > >> Signed-off-by: Brijesh Singh <brijesh.singh@xxxxxxx> >> --- >> arch/x86/boot/compressed/Makefile | 1 + >> arch/x86/boot/compressed/ident_map_64.c | 18 ++++ >> arch/x86/boot/compressed/sev-snp.c | 115 ++++++++++++++++++++++++ >> arch/x86/boot/compressed/sev-snp.h | 25 ++++++ >> 4 files changed, 159 insertions(+) >> create mode 100644 arch/x86/boot/compressed/sev-snp.c >> create mode 100644 arch/x86/boot/compressed/sev-snp.h >> >> diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile >> index e0bc3988c3fa..4d422aae8a86 100644 >> --- a/arch/x86/boot/compressed/Makefile >> +++ b/arch/x86/boot/compressed/Makefile >> @@ -93,6 +93,7 @@ ifdef CONFIG_X86_64 >> vmlinux-objs-y += $(obj)/mem_encrypt.o >> vmlinux-objs-y += $(obj)/pgtable_64.o >> vmlinux-objs-$(CONFIG_AMD_MEM_ENCRYPT) += $(obj)/sev-es.o >> + vmlinux-objs-$(CONFIG_AMD_MEM_ENCRYPT) += $(obj)/sev-snp.o > Yeah, as before, make that a single sev.o and put everything in it. Yes, all the SNP changes will be merged into sev.c. > >> endif >> >> vmlinux-objs-$(CONFIG_ACPI) += $(obj)/acpi.o >> diff --git a/arch/x86/boot/compressed/ident_map_64.c b/arch/x86/boot/compressed/ident_map_64.c >> index f7213d0943b8..0a420ce5550f 100644 >> --- a/arch/x86/boot/compressed/ident_map_64.c >> +++ b/arch/x86/boot/compressed/ident_map_64.c >> @@ -37,6 +37,8 @@ >> #include <asm/setup.h> /* For COMMAND_LINE_SIZE */ >> #undef _SETUP >> >> +#include "sev-snp.h" >> + >> extern unsigned long get_cmd_line_ptr(void); >> >> /* Used by PAGE_KERN* macros: */ >> @@ -278,12 +280,28 @@ static int set_clr_page_flags(struct x86_mapping_info *info, >> if ((set | clr) & _PAGE_ENC) >> clflush_page(address); >> >> + /* >> + * If the encryption attribute is being cleared, then change the page state to >> + * shared in the RMP entry. Change of the page state must be done before the >> + * PTE updates. >> + */ >> + if (clr & _PAGE_ENC) >> + sev_snp_set_page_shared(pte_pfn(*ptep) << PAGE_SHIFT); > The statement above already looks at clr - just merge the two together. Noted. > >> + >> /* Update PTE */ >> pte = *ptep; >> pte = pte_set_flags(pte, set); >> pte = pte_clear_flags(pte, clr); >> set_pte(ptep, pte); >> >> + /* >> + * If the encryption attribute is being set, then change the page state to >> + * private in the RMP entry. The page state must be done after the PTE >> + * is updated. >> + */ >> + if (set & _PAGE_ENC) >> + sev_snp_set_page_private(pte_pfn(*ptep) << PAGE_SHIFT); >> + >> /* Flush TLB after changing encryption attribute */ >> write_cr3(top_level_pgt); >> >> diff --git a/arch/x86/boot/compressed/sev-snp.c b/arch/x86/boot/compressed/sev-snp.c >> new file mode 100644 >> index 000000000000..5c25103b0df1 >> --- /dev/null >> +++ b/arch/x86/boot/compressed/sev-snp.c >> @@ -0,0 +1,115 @@ >> +// SPDX-License-Identifier: GPL-2.0 >> +/* >> + * AMD SEV SNP support >> + * >> + * Author: Brijesh Singh <brijesh.singh@xxxxxxx> >> + * >> + */ >> + >> +#include "misc.h" >> +#include "error.h" >> + >> +#include <asm/msr-index.h> >> +#include <asm/sev-snp.h> >> +#include <asm/sev-es.h> >> + >> +#include "sev-snp.h" >> + >> +static bool sev_snp_enabled(void) >> +{ >> + unsigned long low, high; >> + u64 val; >> + >> + asm volatile("rdmsr\n" : "=a" (low), "=d" (high) : >> + "c" (MSR_AMD64_SEV)); >> + >> + val = (high << 32) | low; >> + >> + if (val & MSR_AMD64_SEV_SNP_ENABLED) >> + return true; >> + >> + return false; >> +} > arch/x86/boot/compressed/mem_encrypt.S already touches > MSR_AMD64_SEV - you can extend that function there and cache the > MSR_AMD64_SEV_SNP_ENABLED too, depending on where you need it. That > function is called in .code32 though. > > If not, you should at least cache the MSR so that you don't have to read > it each time. I think we will not be able to use the status saved from the mem_encrypt.S. The call sequence is something like this: arch/x86/boot/compressed/head_64.S call get_sev_encryption_bit arch/x86/boot/compressed/mem_encrypt.S get_sev_encryption_bit: // Reads the Memory encryption CPUID Fn8000_001F // Check if the SEV is available // If SEV is available then call the rdmsr MSR_AMD64_SEV When SEV-{ES,SNP} is enabled, read CPUID Fn8000_001F will cause a #VC. Inside the #VC handler, the first thing we do is setup the early GHCB. While setting up the GHCB we check if SNP is enabled. If SNP is enabled then perform additional setup (e.g GHCB GPA request etc). I will try caching the value on first read. > >> + >> +/* Provides sev_snp_{wr,rd}_ghcb_msr() */ >> +#include "sev-common.c" >> + >> +/* Provides sev_es_terminate() */ >> +#include "../../kernel/sev-common-shared.c" >> + >> +static void sev_snp_pages_state_change(unsigned long paddr, int op) > no need for too many prefixes on static functions - just call this one > __change_page_state() or so, so that the below one can be called... Noted. >> +{ >> + u64 pfn = paddr >> PAGE_SHIFT; >> + u64 old, val; >> + >> + /* save the old GHCB MSR */ >> + old = sev_es_rd_ghcb_msr(); > Why do you need to save/restore GHCB MSR? Other callers simply go and > write into it the new command... Before the GHCB is established the caller does not need to save and restore MSRs. The page_state_change() uses the GHCB MSR protocol and it can be called before and after the GHCB is established hence I am saving and restoring GHCB MSRs. > >> + >> + /* Issue VMGEXIT to change the page state */ >> + sev_es_wr_ghcb_msr(GHCB_SNP_PAGE_STATE_REQ_GFN(pfn, op)); >> + VMGEXIT(); >> + >> + /* Read the response of the VMGEXIT */ >> + val = sev_es_rd_ghcb_msr(); >> + if ((GHCB_SEV_GHCB_RESP_CODE(val) != GHCB_SNP_PAGE_STATE_CHANGE_RESP) || >> + (GHCB_SNP_PAGE_STATE_RESP_VAL(val) != 0)) > No need for the "!= 0" Noted. > >> + sev_es_terminate(GHCB_SEV_ES_REASON_GENERAL_REQUEST); > So what does that mean? > > *Any* and *all* page state changes which fail immediately terminate a > guest? Why? The hypervisor uses the RMPUPDATE instruction to add the pages in the RMP table. If RMPUPDATE fails, then it will be communicated to the guest. Now its up to guest on what it wants to do. I choose to terminate because guest can't resolve this step on its own. It needs help from the hypervisor and hypervisor has bailed on it. Depending on request type, the next step will either fail or we go into infinite loop. Lets consider an example: 1. Guest asked to add a page as a private in RMP table. 2. Hypervisor fail to add the page in the RMP table and return an error. 3. Guest ignored the error code and moved to the step to validate the page. 4. The page validation instruction expects that page must be added in the RMP table. In our case the page was not added in the RMP table. So it will cause #NPF (rmp violation). 5. On #NPF, hypervisor will try adding the page as private but it will fail (same as #2). This will keep repeating and guest will not make any progress. I choose to return "void" from page_state_change() because caller can't do anything with error code. Some of the failure may have security implication, terminate the guest as soon as we detect an error condition. > Then, how do we communicate this to the guest user what has happened? > > Can GHCB_SEV_ES_REASON_GENERAL_REQUEST be something special like > > GHCB_SEV_ES_REASON_PSC_FAILURE > > or so, so that users know what has happened? Current GHCB does not have special code for this. But I think Linux guest can define a special code which can be used to indicate the termination reason. Tom, Any other suggestion ? > >> + /* Restore the GHCB MSR value */ >> + sev_es_wr_ghcb_msr(old); >> +} >> + >> +static void sev_snp_issue_pvalidate(unsigned long paddr, bool validate) > That one you can call simply "pvalidate" and then the layering with > __pvalidate works too. Yes. >> +{ >> + unsigned long eflags; >> + int rc; >> + >> + rc = __pvalidate(paddr, RMP_PG_SIZE_4K, validate, &eflags); >> + if (rc) { >> + error("Failed to validate address"); >> + goto e_fail; >> + } >> + >> + /* Check for the double validation and assert on failure */ >> + if (eflags & X86_EFLAGS_CF) { >> + error("Double validation detected"); >> + goto e_fail; >> + } >> + >> + return; >> +e_fail: >> + sev_es_terminate(GHCB_SEV_ES_REASON_GENERAL_REQUEST); >> +} >> + >> +static void sev_snp_set_page_private_shared(unsigned long paddr, int op) > ... change_page_state() > >> +{ >> + if (!sev_snp_enabled()) >> + return; >> + >> + /* >> + * We are change the page state from private to shared, invalidate the pages before > s/We are// Noted. > >> + * making the page state change in the RMP table. >> + */ >> + if (op == SNP_PAGE_STATE_SHARED) >> + sev_snp_issue_pvalidate(paddr, false); > The new RMP Validated bit is specified in EDX[0]. The C standard defines > > false == 0 > true == 1 > > but make that explicit pls: > > pvalidate(paddr, 0); > pvalidate(paddr, 1); Noted. > >> + >> + /* Request the page state change in the RMP table. */ >> + sev_snp_pages_state_change(paddr, op); >> + >> + /* >> + * Now that pages are added in the RMP table as a private memory, validate the >> + * memory range so that it is consistent with the RMP entry. >> + */ >> + if (op == SNP_PAGE_STATE_PRIVATE) >> + sev_snp_issue_pvalidate(paddr, true); >> +} >> + >> +void sev_snp_set_page_private(unsigned long paddr) >> +{ >> + sev_snp_set_page_private_shared(paddr, SNP_PAGE_STATE_PRIVATE); >> +} >> + >> +void sev_snp_set_page_shared(unsigned long paddr) >> +{ >> + sev_snp_set_page_private_shared(paddr, SNP_PAGE_STATE_SHARED); >> +} >> diff --git a/arch/x86/boot/compressed/sev-snp.h b/arch/x86/boot/compressed/sev-snp.h >> new file mode 100644 >> index 000000000000..12fe9581a255 >> --- /dev/null >> +++ b/arch/x86/boot/compressed/sev-snp.h > A single sev.h I guess. > >> @@ -0,0 +1,25 @@ >> +/* SPDX-License-Identifier: GPL-2.0 */ >> +/* >> + * AMD SEV Secure Nested Paging Support >> + * >> + * Copyright (C) 2021 Advanced Micro Devices, Inc. >> + * >> + * Author: Brijesh Singh <brijesh.singh@xxxxxxx> >> + */ >> + >> +#ifndef __COMPRESSED_SECURE_NESTED_PAGING_H >> +#define __COMPRESSED_SECURE_NESTED_PAGING_H > Look at how other x86 headers define their guards' format. Noted. > Thx. >