From: mhkelley58@xxxxxxxxx <mhkelley58@xxxxxxxxx> Sent: Monday, January 15, 2024 6:20 PM > > In a CoCo VM, when transitioning memory from encrypted to decrypted, or > vice versa, the caller of set_memory_encrypted() or set_memory_decrypted() > is responsible for ensuring the memory isn't in use and isn't referenced > while the transition is in progress. The transition has multiple steps, > and the memory is in an inconsistent state until all steps are complete. > A reference while the state is inconsistent could result in an exception > that can't be cleanly fixed up. > > However, the kernel load_unaligned_zeropad() mechanism could cause a stray > reference that can't be prevented by the caller of set_memory_encrypted() > or set_memory_decrypted(), so there's specific code to handle this case. > But a CoCo VM running on Hyper-V may be configured to run with a paravisor, > with the #VC or #VE exception routed to the paravisor. There's no > architectural way to forward the exceptions back to the guest kernel, and > in such a case, the load_unaligned_zeropad() specific code doesn't work. > > To avoid this problem, mark pages as "not present" while a transition > is in progress. If load_unaligned_zeropad() causes a stray reference, a > normal page fault is generated instead of #VC or #VE, and the > page-fault-based fixup handlers for load_unaligned_zeropad() resolve the > reference. When the encrypted/decrypted transition is complete, mark the > pages as "present" again. > > This version of the patch series marks transitioning pages "not present" > only when running as a Hyper-V guest with a paravisor. Previous > versions[1] marked transitioning pages "not present" regardless of the > hypervisor and regardless of whether a paravisor is in use. That more > general use had the benefit of decoupling the load_unaligned_zeropad() > fixup from CoCo VM #VE and #VC exception handling. But the implementation > was problematic for SEV-SNP because the SEV-SNP hypervisor callbacks > require a valid virtual address, not a physical address like with TDX and > the Hyper-V paravisor. Marking the transitioning pages "not present" > causes the virtual address to not be valid, and the PVALIDATE > instruction in the SEV-SNP callback fails. Constructing a temporary > virtual address for this purpose is slower and adds complexity that > negates the benefits of the more general use. So this version narrows > the applicability of the approach to just where it is required > because of the #VC and #VE exceptions being routed to a paravisor. > > The previous version minimized the TLB flushing done during page > transitions between encrypted and decrypted. Because this version > marks the pages "not present" in hypervisor specific callbacks and > not in __set_memory_enc_pgtable(), doing such optimization is more > difficult to coordinate. But the page transitions are not a hot path, > so this version eschews optimization of TLB flushing in favor of > simplicity. > > Since this version no longer touches __set_memory_enc_pgtable(), > I've also removed patches that add comments about error handling > in that function. Rick Edgecombe has proposed patches to improve > that error handling, and I'll leave those comments to Rick's > patches. > > Patch 1 handles implications of the hypervisor callbacks needing > to do virt-to-phys translations on pages that are temporarily > marked not present. > > Patch 2 makes the existing set_memory_p() function available for > use in the hypervisor callbacks. > > Patch 3 is the core change that marks the transitioning pages > as not present. > > This patch set is based on the linux-next20240103 code tree. > > Changes in v4: > * Patch 1: Updated comment in slow_virt_to_phys() to reduce the > likelihood of the comment becoming stale. The new comment > describes the requirement to work with leaf PTE not present, > but doesn't directly reference the CoCo hypervisor callbacks. > [Rick Edgecombe] > * Patch 1: Decomposed a complex line-wrapped statement into > multiple statements for ease of understanding. No functional > change compared with v3. [Kirill Shutemov] > * Patch 3: Fixed handling of memory allocation errors. [Rick > Edgecombe] > > Changes in v3: > * Major rework and simplification per discussion above. > > Changes in v2: > * Added Patches 3 and 4 to deal with the failure on SEV-SNP > [Tom Lendacky] > * Split the main change into two separate patches (Patch 5 and > Patch 6) to improve reviewability and to offer the option of > retaining both hypervisor callbacks. > * Patch 5 moves set_memory_p() out of an #ifdef CONFIG_X86_64 > so that the code builds correctly for 32-bit, even though it > is never executed for 32-bit [reported by kernel test robot] > > [1] https://lore.kernel.org/lkml/20231121212016.1154303-1- > mhklinux@xxxxxxxxxxx/ > > Michael Kelley (3): > x86/hyperv: Use slow_virt_to_phys() in page transition hypervisor > callback > x86/mm: Regularize set_memory_p() parameters and make non-static > x86/hyperv: Make encrypted/decrypted changes safe for > load_unaligned_zeropad() > > arch/x86/hyperv/ivm.c | 65 ++++++++++++++++++++++++++++--- > arch/x86/include/asm/set_memory.h | 1 + > arch/x86/mm/pat/set_memory.c | 24 +++++++----- > 3 files changed, 75 insertions(+), 15 deletions(-) > Wei -- Can this series go through the Hyper-V tree? It's mostly Hyper-V specific code, plus some comments and a minor tweak to a utility function in 'mm'. All comments on earlier versions have been addressed, and it would be good to get some mileage in linux-next before the 6.9 merge window. Michael