Kernel memory is "encrypted" by default. Some callers may "decrypt" it in order to share it with things outside the kernel like a device or an untrusted VMM. There is nothing to stop set_memory_encrypted() from being passed memory that is already "encrypted" (aka. "private" on TDX). In fact, some callers do this because ... $REASONS. Unfortunately, part of the TDX decrypted=>encrypted transition is truly one way*. It can't handle being asked to encrypt an already encrypted page Allow __set_memory_enc_pgtable() to detect already-encrypted memory before it hits the TDX code. * The one way part is "page acceptance" [commit log written by Dave Hansen] Signed-off-by: Rick Edgecombe <rick.p.edgecombe@xxxxxxxxx> --- arch/x86/mm/pat/set_memory.c | 41 +++++++++++++++++++++++++++++++++++- 1 file changed, 40 insertions(+), 1 deletion(-) diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c index bda9f129835e..1238b0db3e33 100644 --- a/arch/x86/mm/pat/set_memory.c +++ b/arch/x86/mm/pat/set_memory.c @@ -2122,6 +2122,21 @@ int set_memory_global(unsigned long addr, int numpages) __pgprot(_PAGE_GLOBAL), 0); } +static bool kernel_vaddr_encryped(unsigned long addr, bool enc) +{ + unsigned int level; + pte_t *pte; + + pte = lookup_address(addr, &level); + if (!pte) + return false; + + if (enc) + return pte_val(*pte) == cc_mkenc(pte_val(*pte)); + + return pte_val(*pte) == cc_mkdec(pte_val(*pte)); +} + /* * __set_memory_enc_pgtable() is used for the hypervisors that get * informed about "encryption" status via page tables. @@ -2130,7 +2145,7 @@ static int __set_memory_enc_pgtable(unsigned long addr, int numpages, bool enc) { pgprot_t empty = __pgprot(0); struct cpa_data cpa; - int ret; + int ret, numpages_in_state = 0; /* Should not be working on unaligned addresses */ if (WARN_ONCE(addr & ~PAGE_MASK, "misaligned address: %#lx\n", addr)) @@ -2143,6 +2158,30 @@ static int __set_memory_enc_pgtable(unsigned long addr, int numpages, bool enc) cpa.mask_clr = enc ? pgprot_decrypted(empty) : pgprot_encrypted(empty); cpa.pgd = init_mm.pgd; + /* + * If any page is already in the right state, bail with an error + * because the code doesn't handled it. This is likely because + * something has gone wrong and isn't worth optimizing for. + * + * If all the memory pages are already in the desired state return + * success. + * + * kernel_vaddr_encryped() does not synchronize against huge page + * splits so take pgd_lock. A caller doing strange things could + * get a new PMD mid level PTE confused with a huge PMD entry. Just + * lock to tie up loose ends. + */ + spin_lock(&pgd_lock); + for (int i = 0; i < numpages; i++) { + if (kernel_vaddr_encryped(addr + (PAGE_SIZE * i), enc)) + numpages_in_state++; + } + spin_unlock(&pgd_lock); + if (numpages_in_state == numpages) + return 0; + else if (numpages_in_state) + return 1; + /* Must avoid aliasing mappings in the highmem code */ kmap_flush_unused(); vm_unmap_aliases(); -- 2.34.1